SlideShare una empresa de Scribd logo
1 de 45
MyLife with HBase
OR
HBase three flavors
HBase: In brief
I could talk about…
Operational HBase
HBase: In brief
I could talk about…
ZooKeeper quorums

Source: aazk.org
HBase: In brief
I could talk about…
Compaction

Source: www.wasteprousa.com
HBase: In brief
I could talk about…
How HBase is Implemented
HDFS
Blocks
Regions
META table
Etc…
HBase: In brief
I could talk about…
HBase VS
Cassandra
Redis
MySQL
Etc…
HBase: In brief
However none of those are my
primary view as a developer.
As a developer I want to talk about
what HBase can do for me. How it
can make MyLife (pun intended)
easier.
HBase: In brief
“I choose a lazy person to do a hard
job. Because a lazy person will find
an easy way to do it.”
HBase: In brief
“I choose a lazy person to do a hard
job. Because a lazy person will find
an easy way to do it.”
–Bill Gates
HBase: In brief
So what does HBase do for me the
developer?
TL;DR
IT STORES DATA!
HBase: In brief
How does HBase store data?
HBase: In brief
As a Map
HBase: In brief
As a Map
Of Maps
HBase: In brief
As a Map
Of Maps
Of Maps
HBase: In brief
As a Map
Of Maps
Of Maps
Of Maps
A Data Structures Interlude
Key == Last Name, First Name,
Middle Initial
Value == Extension
I.e.
Example,Dude,X  x555
A Data Structures Interlude
So now that we know what a map is
what would a map of maps looks
like? An HBase like analogy.
A Data Structures Interlude
An analogy ( a dated analogy if someone can
think of a current one please please let me
know) to HBase is an index file in a library by
ISBN. You look up the a book by ISBN. The
ISBN is your key. The value in this case is a
book that contains a list of books!
Key == ISBN
Value == Book that lists other books!
0786704810 Author, Title, Publisher, Year
HBase: In brief
SortedMap[RowKey,
SortedMap[ColumnFamilyName,
SortedMap[Qualifier,
SortedMap[Timestamp,Value]]]]
HBase: In brief
Some quick facts:
Column families are defined ahead of time and
require the table to disabled to be altered.
Only Column families are fixed. Everything
under that level of maps in flexible.
 Qualifiers can be added or removed on the fly.
 Along with their versions

“The Map” itself is also defined ahead of time
HBase: In brief
What does this look like?
DEMO TIME!
HBase: Implementations
The Test Case
The Ideal Case
The Awesome Case
HBase: The Test Case
One of the services we provide to our users is a
message stream. This stream can include
email. Which works like an email client (i.e.
outlook or mail.app or on your phone) storing
your email messages so you can get them
quickly.
We found ourselves storing 100’s of gigabytes
of email contents in our Oracle RAC database.
HBase: The Test Case
Since this data is only accessed by key it made
sense to move out of Oracle and into HBase.
HBase: The Test Case
Key ==
accountId_providerAccountId_messageId_bodyId
HBase: The Test Case
Key ==
accountId_providerAccountId_messageId_bodyId
This is is a nice key because all the messages for a
particular user are together by prefix.
Since HBase maintains the keys sorted we can use
a Scan to grab them all quickly at one time.
HBase: The Test Case
That’s it!
HBase: The Test Case
Advantages vs Previous solution:
Faster
Cheaper
Less DB load
HBase: The ideal case
Another service we offer our users is the ability
to import their social and email connections so
they can have one unified view of all their
connections across providers. Allowing users to
manage data by person rather than by
account.
HBase: The ideal case
This has two main pieces of data:
1.The social profile information
2.The relationship between that profile and an
Identity
HBase: The ideal case
What makes this ideal for HBase?
1. The profile is sparse data that is only
accessed by key!
HBase: The ideal case
What makes this ideal for HBase?
2. The relationship between a profile and its
identity is only a key-value pair and it reverse!
A Data Structures Interlude
Key == Last Name, First Name,
Middle Initial
Value == Extension
I.e.
Example,Dude,X  x555
A Data Structures Interlude
Key == Extension
Value == Last Name, First Name,
Middle Initial
I.e.
x555 Example,Dude,X
HBase: The ideal case

Dataflow
1.Get profile from provider
2.Check if the profile maps to an existing Identity
in HBase
1. If it doesn’t exist store a version of the profile in
HBase with providerId as key and profile
information as values

3.Associate profile with identity

1. create row in HBase with identityId_providerId as
key

4.Update profile with the identity it is associated
with
HBase: The ideal case
Coprocessors!
What are Coprocessors?
Another feature of HBase which work like
triggers.
A coprocessor is a piece of logic attached to an
HBase put that is executed on the HBase
cluster.
HBase: The Awesome Case
User stream availability
HBase: The Awesome Case
Originally this system used local caching to store
user stream data but has the stream grew this
became impractical.
The solution here was a distributed cache great!
HBase: The Awesome Case
Distributed cache allows us to scale but unless we
have a huge grid some user streams will still get
evicted from the cache. Which means when the
user visits again we have to fetch their streams
from the source which is slow…
HBase: The Awesome Case
Enter HBase from great to awesome!
To fix the latency associated with eviction we
added HBase as a backing store to our distributed
cache. This means that records in our cache are
periodically written to HBase and are written
HBase before being evicted from the cache.
HBase: The Awesome Case
Distributed cache + HBase == Awesome!
Why?
Persistence – user streams now live in HBase for
as long as we want them to.
Speed – read through from HBase are fast
Transparency – as far as application is concerned
everything is just in the cache
HBase: The Awesome Case
Distributed cache + HBase == Awesome!
Why?
Reliability – HBase been solid and all the data is
stored redundantly
That’s all folk!
Questions?

Más contenido relacionado

Similar a MyLife with HBase or HBase three flavors

HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptxSadhik7
 
SDMA-FDMA-TDMA-fixed TDM
SDMA-FDMA-TDMA-fixed TDMSDMA-FDMA-TDMA-fixed TDM
SDMA-FDMA-TDMA-fixed TDMSanSan149
 
H-Base in Data Base Mangement System
H-Base in Data Base Mangement SystemH-Base in Data Base Mangement System
H-Base in Data Base Mangement SystemPreetham Devisetty
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
 
rhbase_tutorial
rhbase_tutorialrhbase_tutorial
rhbase_tutorialAaron Benz
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designphanleson
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseRishabh Dugar
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Hive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionHive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionXplenty
 

Similar a MyLife with HBase or HBase three flavors (20)

HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Hbase
HbaseHbase
Hbase
 
SDMA-FDMA-TDMA-fixed TDM
SDMA-FDMA-TDMA-fixed TDMSDMA-FDMA-TDMA-fixed TDM
SDMA-FDMA-TDMA-fixed TDM
 
Apache h base
Apache h baseApache h base
Apache h base
 
H-Base in Data Base Mangement System
H-Base in Data Base Mangement SystemH-Base in Data Base Mangement System
H-Base in Data Base Mangement System
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
rhbase_tutorial
rhbase_tutorialrhbase_tutorial
rhbase_tutorial
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
 
Hbase
HbaseHbase
Hbase
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
HBASE Overview
HBASE OverviewHBASE Overview
HBASE Overview
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Hive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionHive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly Competition
 

Último

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

MyLife with HBase or HBase three flavors

  • 2. HBase: In brief I could talk about… Operational HBase
  • 3. HBase: In brief I could talk about… ZooKeeper quorums Source: aazk.org
  • 4. HBase: In brief I could talk about… Compaction Source: www.wasteprousa.com
  • 5. HBase: In brief I could talk about… How HBase is Implemented HDFS Blocks Regions META table Etc…
  • 6. HBase: In brief I could talk about… HBase VS Cassandra Redis MySQL Etc…
  • 7. HBase: In brief However none of those are my primary view as a developer. As a developer I want to talk about what HBase can do for me. How it can make MyLife (pun intended) easier.
  • 8. HBase: In brief “I choose a lazy person to do a hard job. Because a lazy person will find an easy way to do it.”
  • 9. HBase: In brief “I choose a lazy person to do a hard job. Because a lazy person will find an easy way to do it.” –Bill Gates
  • 10. HBase: In brief So what does HBase do for me the developer? TL;DR IT STORES DATA!
  • 11. HBase: In brief How does HBase store data?
  • 13. HBase: In brief As a Map Of Maps
  • 14. HBase: In brief As a Map Of Maps Of Maps
  • 15. HBase: In brief As a Map Of Maps Of Maps Of Maps
  • 16. A Data Structures Interlude Key == Last Name, First Name, Middle Initial Value == Extension I.e. Example,Dude,X  x555
  • 17. A Data Structures Interlude So now that we know what a map is what would a map of maps looks like? An HBase like analogy.
  • 18. A Data Structures Interlude An analogy ( a dated analogy if someone can think of a current one please please let me know) to HBase is an index file in a library by ISBN. You look up the a book by ISBN. The ISBN is your key. The value in this case is a book that contains a list of books! Key == ISBN Value == Book that lists other books! 0786704810 Author, Title, Publisher, Year
  • 20.
  • 21. HBase: In brief Some quick facts: Column families are defined ahead of time and require the table to disabled to be altered. Only Column families are fixed. Everything under that level of maps in flexible.  Qualifiers can be added or removed on the fly.  Along with their versions “The Map” itself is also defined ahead of time
  • 22. HBase: In brief What does this look like? DEMO TIME!
  • 23.
  • 24. HBase: Implementations The Test Case The Ideal Case The Awesome Case
  • 25. HBase: The Test Case One of the services we provide to our users is a message stream. This stream can include email. Which works like an email client (i.e. outlook or mail.app or on your phone) storing your email messages so you can get them quickly. We found ourselves storing 100’s of gigabytes of email contents in our Oracle RAC database.
  • 26. HBase: The Test Case Since this data is only accessed by key it made sense to move out of Oracle and into HBase.
  • 27. HBase: The Test Case Key == accountId_providerAccountId_messageId_bodyId
  • 28. HBase: The Test Case Key == accountId_providerAccountId_messageId_bodyId This is is a nice key because all the messages for a particular user are together by prefix. Since HBase maintains the keys sorted we can use a Scan to grab them all quickly at one time.
  • 29. HBase: The Test Case That’s it!
  • 30. HBase: The Test Case Advantages vs Previous solution: Faster Cheaper Less DB load
  • 31. HBase: The ideal case Another service we offer our users is the ability to import their social and email connections so they can have one unified view of all their connections across providers. Allowing users to manage data by person rather than by account.
  • 32. HBase: The ideal case This has two main pieces of data: 1.The social profile information 2.The relationship between that profile and an Identity
  • 33. HBase: The ideal case What makes this ideal for HBase? 1. The profile is sparse data that is only accessed by key!
  • 34. HBase: The ideal case What makes this ideal for HBase? 2. The relationship between a profile and its identity is only a key-value pair and it reverse!
  • 35. A Data Structures Interlude Key == Last Name, First Name, Middle Initial Value == Extension I.e. Example,Dude,X  x555
  • 36. A Data Structures Interlude Key == Extension Value == Last Name, First Name, Middle Initial I.e. x555 Example,Dude,X
  • 37. HBase: The ideal case Dataflow 1.Get profile from provider 2.Check if the profile maps to an existing Identity in HBase 1. If it doesn’t exist store a version of the profile in HBase with providerId as key and profile information as values 3.Associate profile with identity 1. create row in HBase with identityId_providerId as key 4.Update profile with the identity it is associated with
  • 38. HBase: The ideal case Coprocessors! What are Coprocessors? Another feature of HBase which work like triggers. A coprocessor is a piece of logic attached to an HBase put that is executed on the HBase cluster.
  • 39. HBase: The Awesome Case User stream availability
  • 40. HBase: The Awesome Case Originally this system used local caching to store user stream data but has the stream grew this became impractical. The solution here was a distributed cache great!
  • 41. HBase: The Awesome Case Distributed cache allows us to scale but unless we have a huge grid some user streams will still get evicted from the cache. Which means when the user visits again we have to fetch their streams from the source which is slow…
  • 42. HBase: The Awesome Case Enter HBase from great to awesome! To fix the latency associated with eviction we added HBase as a backing store to our distributed cache. This means that records in our cache are periodically written to HBase and are written HBase before being evicted from the cache.
  • 43. HBase: The Awesome Case Distributed cache + HBase == Awesome! Why? Persistence – user streams now live in HBase for as long as we want them to. Speed – read through from HBase are fast Transparency – as far as application is concerned everything is just in the cache
  • 44. HBase: The Awesome Case Distributed cache + HBase == Awesome! Why? Reliability – HBase been solid and all the data is stored redundantly

Notas del editor

  1. I could talk about HBase operationally.
  2. HBase vs other data stores
  3. My personal mantra
  4. Said by someone far more quotable
  5. Like a RDMS or a file stored data but in different ways
  6. Again from a functional POV
  7. That’s it. remember that. The rest of the terminology just tells you where you are in that nest of maps.
  8. Before we get to far since HBase stores data in maps lets take a brief step back here and let me describe a map as quickly as I can since it fundamental to HBase. A map is away of storing data so it can be retrieved by a key. This is concept most people are familiar with like finding a co-worker’s extension on the company directory. Here we have a key Last name, first name, middle initial which MAPS to the extension. BTW we are going to talk about keys A LOT!
  9. To be precise it would if that other book also listed other books but you get the idea. BTW that is a real example you can look it up. Also I think you all just passed CS 201
  10. SOOO… back to HBase. This is HBase in a nutshell. To quote from the HBase documentation: “All other map returning methods make use of this map internally.” So they even say this pretty much all there is to it functionally. But this structure allows for some very cool things. I get some freebies here. Quick looks up by key or key prefixes (more on that later). Flexibility. Versioning. These are things we have used and will be looking at in our implementations later.
  11. SOOO… back to HBase. This is HBase in a nutshell. To quote from the HBase documentation: “All other map returning methods make use of this map internally.” So they even say this pretty much all there is to it functionally. But this structure allows for some very cool things. I get some freebies here. Quick looks up by key or key prefixes (more on that later). Flexibility. Versioning. These are things we have used and will be looking at in our implementations later.
  12. From this map structure we get flexibility since we can add or remove items from the Map with one caveat column families are fixed. “The Map” in HBase terms is the table
  13. /hbase-identity-secondary-index-migrator/src/test/java/PresentationUnitTest.java HBase shell: create 'PRESENTATION_TABLE', {NAME => 'CONTENT', REPLICATION_SCOPE => '0', VERSIONS => '1'} put 'PRESENTATION_TABLE', 1,'CONTENT:firstname','mike’ scan 'PRESENTATION_TABLE'
  14. SOOO… back to HBase. This is HBase in a nutshell. To quote from the HBase documentation: “All other map returning methods make use of this map internally.” So they even say this pretty much all there is to it functionally. But this structure allows for some very cool things. I get some freebies here. Quick looks up by key or key prefixes (more on that later). Flexibility. Versioning. These are things we have used and will be looking at in our implementations later.
  15. We currently have three production solutions implemented using HBase The Test case our first production use of HBase The ideal case which an almost perfect match for HBase And finally the awesome case where we added HBase to something great to make it awesome
  16. This (to say the least was) not ideal for several reasons including cost and scalability
  17. accountId is mylife.com accountid providerAccountId is the id we give to the relation between a mylife account and a provider account ie this users gmail account messageId is the unique id each email message is given bodyId is a reverse timestamp given to each body (html or text)
  18. Like our example of a company directory you can easily find everyone with the same last name
  19. Our first use of HBase was very straight forward but it works ! And it works well.
  20. This implementation is faster, cheaper and saves precious DB resources for where they are needed most. Things that need query and transaction capability
  21. What we call an Identity here is really a person. One person probably has many social profiles like a linkedin and a Facebook profile.
  22. What is sparse data? That is when the record you store that is mostly empty fields. Like the contact page in your phone has a name and phone number but probably not much else even though there is a place for home address, company name, birthday, anniversary and bunch of other stuff. That is also sparse data. Remember I said HBase is flexible? Well this is how you use that flexibility. Social profiles are similarly sparse. There is a lot of potential data in social profiles but usually only a few items of data will be there most of the time and the potential fields vary from provider to provider. For example first name is almost always in a social profile but middle name probably not. HBase is great for this since it only stores that data that is there no wasting space storing empty cells or time transferring them over the network. It also allows us to store fields for different social providers together or add new fields as we add providers without having to update the storage just the code that needs the data. The only accessed by key bit is important also but we have already covered that.
  23. Exciting no? its all fitting together. So we know about key-value pairs but what is the reverse part about?
  24. Time for another data structures interlude! Last time we had this.
  25. The reverse index is simply the same data REVERSED! So you get a call from an extension you don’t know you go look up the name it belongs to. This has been another data structures interlude!
  26. A simplified data flow. For social connections. In step 2 we are using HBase’s versioning to keep versions of the social profile so we get a history of changes Step 4 is where we are doing our reverse index. So we can find the identity of a social profile. So how did we implement number 4 and make this part of the ideal HBase use case?
  27. A coprocessor is an HBase feature we have not touched on till now (have to save a few surprises) In our case we built a coprocessor to update the profile record when we are associating it with its identity. This has several advantages: The reverse index is built at the same time as the primary index The reverse index gets created no matter the source of the put Any application can rely on the primary and reverse indexes always existing together
  28. I mentioned briefly in our first case about message streams here is another part of that same system that uses HBase. Once we have a users provider streams and have homogenized them we need them available to build the users personal aggregated stream.
  29. Persistence – in this case we are using another HBase feature TTLs so that streams that have not been updated in 4 weeks gets removed automatically. Speed- read through (times when a stream is gone from the in memory cache and has to be fetched from HBase) are basically as fast as the network since we are getting by key.