MyLife with HBase or HBase three flavors

MyLife with HBase
OR
HBase three flavors

HBase: In brief
I could talk about…
Operational HBase

HBase: In brief
ZooKeeper quorums

Source: aazk.org

HBase: In brief
Compaction

Source: www.wasteprousa.com

HBase: In brief
How HBase is Implemented
HDFS
Blocks
Regions
META table
Etc…

HBase: In brief
HBase VS
Cassandra
Redis
MySQL
Etc…

HBase: In brief
However none of those are my
primary view as a developer.
As a developer I want to talk about
what HBase can do for me. How it
can make MyLife (pun intended)
easier.

HBase: In brief
“I choose a lazy person to do a hard
job. Because a lazy person will find
an easy way to do it.”

HBase: In brief
“I choose a lazy person to do a hard
job. Because a lazy person will find
an easy way to do it.”
–Bill Gates

HBase: In brief
So what does HBase do for me the
developer?
TL;DR
IT STORES DATA!

HBase: In brief
How does HBase store data?

HBase: In brief
As a Map
Of Maps

HBase: In brief
As a Map
Of Maps
Of Maps

HBase: In brief
As a Map
Of Maps
Of Maps
Of Maps

A Data Structures Interlude
Key == Last Name, First Name,
Middle Initial
Value == Extension
I.e.
Example,Dude,X  x555

So now that we know what a map is
what would a map of maps looks
like? An HBase like analogy.

An analogy ( a dated analogy if someone can
think of a current one please please let me
know) to HBase is an index file in a library by
ISBN. You look up the a book by ISBN. The
ISBN is your key. The value in this case is a
book that contains a list of books!
Key == ISBN
Value == Book that lists other books!
0786704810 Author, Title, Publisher, Year

HBase: In brief
SortedMap[RowKey,
SortedMap[ColumnFamilyName,
SortedMap[Qualifier,
SortedMap[Timestamp,Value]]]]

HBase: In brief
Some quick facts:
Column families are defined ahead of time and
require the table to disabled to be altered.
Only Column families are fixed. Everything
under that level of maps in flexible.
 Qualifiers can be added or removed on the fly.
 Along with their versions

“The Map” itself is also defined ahead of time

HBase: In brief
What does this look like?
DEMO TIME!

HBase: Implementations
The Test Case
The Ideal Case
The Awesome Case

HBase: The Test Case
One of the services we provide to our users is a
message stream. This stream can include
email. Which works like an email client (i.e.
outlook or mail.app or on your phone) storing
your email messages so you can get them
quickly.
We found ourselves storing 100’s of gigabytes
of email contents in our Oracle RAC database.

Since this data is only accessed by key it made
sense to move out of Oracle and into HBase.

Key ==
accountId_providerAccountId_messageId_bodyId

Key ==
accountId_providerAccountId_messageId_bodyId
This is is a nice key because all the messages for a
particular user are together by prefix.
Since HBase maintains the keys sorted we can use
a Scan to grab them all quickly at one time.

That’s it!

Advantages vs Previous solution:
Faster
Cheaper
Less DB load

HBase: The ideal case
Another service we offer our users is the ability
to import their social and email connections so
they can have one unified view of all their
connections across providers. Allowing users to
manage data by person rather than by
account.

This has two main pieces of data:
1.The social profile information
2.The relationship between that profile and an
Identity

What makes this ideal for HBase?
1. The profile is sparse data that is only
accessed by key!

What makes this ideal for HBase?
2. The relationship between a profile and its
identity is only a key-value pair and it reverse!

Key == Extension
Value == Last Name, First Name,
Middle Initial
I.e.
x555 Example,Dude,X


Dataflow
1.Get profile from provider
2.Check if the profile maps to an existing Identity
in HBase
1. If it doesn’t exist store a version of the profile in
HBase with providerId as key and profile
information as values

3.Associate profile with identity

1. create row in HBase with identityId_providerId as
key

4.Update profile with the identity it is associated
with

Coprocessors!
What are Coprocessors?
Another feature of HBase which work like
triggers.
A coprocessor is a piece of logic attached to an
HBase put that is executed on the HBase
cluster.

HBase: The Awesome Case
User stream availability

Originally this system used local caching to store
user stream data but has the stream grew this
became impractical.
The solution here was a distributed cache great!

Distributed cache allows us to scale but unless we
have a huge grid some user streams will still get
evicted from the cache. Which means when the
user visits again we have to fetch their streams
from the source which is slow…

Enter HBase from great to awesome!
To fix the latency associated with eviction we
added HBase as a backing store to our distributed
cache. This means that records in our cache are
periodically written to HBase and are written
HBase before being evicted from the cache.

Distributed cache + HBase == Awesome!
Why?
Persistence – user streams now live in HBase for
as long as we want them to.
Speed – read through from HBase are fast
Transparency – as far as application is concerned
everything is just in the cache

Distributed cache + HBase == Awesome!
Why?
Reliability – HBase been solid and all the data is
stored redundantly

MyLife with HBase or HBase three flavors

Recomendados

Recomendados

Más contenido relacionado

Similar a MyLife with HBase or HBase three flavors

Similar a MyLife with HBase or HBase three flavors (20)

Último

Último (20)

MyLife with HBase or HBase three flavors

Notas del editor