SlideShare una empresa de Scribd logo
1 de 23
1
HBase: Just the Basics
Jesse Anderson – Curriculum Developer and Instructor
v2
2
What Is HBase?
©2014 Cloudera, Inc. All rights reserved.2
• NoSQL datastore built on top of HDFS (Hadoop)
• An Apache Top Level Project
• Handles the various manifestations of Big Data
• Based on Google’s BigTable paper
3
Why Use HBase?
©2014 Cloudera, Inc. All rights reserved.3
• Storing large amounts of data (TB/PB)
• High throughput for a large number of requests
• Storing unstructured or variable column data
• Big Data with random read and writes
4
When to Consider Not Using HBase?
©2014 Cloudera, Inc. All rights reserved.4
• Only use with Big Data problems
• Read straight through files
• Write all at once or append new files
• Not random reads or writes
• Access patterns of the data are ill-defined
5
HBase Architecture
How it works
6
Meet the Daemons
©2014 Cloudera, Inc. All rights reserved.6
• HBase Master
• RegionServer
• ZooKeeper
• HDFS
• NameNode/Standby NameNode
• DataNode
7
Daemon Locations
©2014 Cloudera, Inc. All rights reserved.7
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
Master Nodes
Slave Nodes
8
Tables and Column Families
©2014 Cloudera, Inc. All rights reserved.8
Column Family “contactinfo” Column Family “profilephoto”
Tables are broken into groupings called Column Families.
Group data frequently
accessed together and
compress it Group photos with different settings
9
Rows and Columns
©2014 Cloudera, Inc. All rights reserved.9
Row key Column Family “contactinfo” Column Family “profilephoto”
adupont fname: Andre lname: Dupont
jsmith fname: John lname: Smith image: <smith.jpg>
mrossi fname: Mario lname: Rossi image: <mario.jpg>
Row keys identify a row
No storage penalty for unused columns
Each Column Family can have many columns
10
Regions
©2014 Cloudera, Inc. All rights reserved.10
Row key Column Family “contactinfo”
adupont fname: Andre lname: Dupont
jsmith fname: John lname: Smith
A table is broken into regions
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
Row key Column Family “contactinfo”
mrossi fname: Mario lname: Rossi
zstevens fname: Zack lname: Stevens
Regions are served by
RegionServers
11
Client
Write Path
©2014 Cloudera, Inc. All rights reserved.11
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
1. Which
RegionServer is
serving the Region?
2. Write to
RegionServer
12
Client
Read Path
©2014 Cloudera, Inc. All rights reserved.12
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
1. Which
RegionServer is
serving the Region?
2. Read from
RegionServer
13
HBase API
How to access the data
14
No SQL Means No SQL
©2014 Cloudera, Inc. All rights reserved.14
• Data is not accessed over SQL
• You must:
• Create your own connections
• Keep track of the type of data in a column
• Give each row a key
• Access a row by its key
15
Types of Access
©2014 Cloudera, Inc. All rights reserved.15
• Gets
• Gets a row’s data based on the row key
• Puts
• Upserts a row with data based on the row key
• Scans
• Finds all matching rows based on the row key
• Scan logic can be increased by using filters
16
Gets
©2014 Cloudera, Inc. All rights reserved.16
1
2
3
4
Get g = new Get(ROW_KEY_BYTES);
Result r= table.get(g);
byte[] byteArray =
r.getValue(COLFAM_BYTS,COLDESC_BYTS);
String columnValue =
Bytes.toString(byteArray);
17
Puts
©2014 Cloudera, Inc. All rights reserved.17
1
2
3
4
Put p = new Put(ROW_KEY_BYTES);
p.add(COLFAM_BYTES, COLDESC_BYTES,
Bytes.toBytes("value"));
table.put(p);
18
HBase Schema Design
How to design
19
No SQL Means No SQL
©2014 Cloudera, Inc. All rights reserved.19
• Designing schemas for HBase requires an in-depth
knowledge
• Schema Design is ‘data-centric’ not ‘relationship-
centric’
• You design around how data is accessed
• Row keys are engineered
20
Treating HBase like a traditional RDBMS will lead
to abject failure!
Captain Picard
21
Row Keys
©2014 Cloudera, Inc. All rights reserved.21
• A row key is more than the glue between two tables
• Engineering time is spent just on constructing a row
key
• Contents of a row key vary by access pattern
• Often made up of several pieces of data:
<group_id><email>
22
Schema Design
©2014 Cloudera, Inc. All rights reserved.22
• Schema design does not start in an ERD
• Access pattern must be known and ascertained
• Denormalize to improve performance
• Fewer, bigger tables
23 ©2014 Cloudera, Inc. All rights reserved.
Jesse Anderson
@jessetanderson

Más contenido relacionado

Similar a HBaseCon 2014-Just the Basics

HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 
Kite SDK: Working with Datasets
Kite SDK: Working with DatasetsKite SDK: Working with Datasets
Kite SDK: Working with DatasetsCloudera, Inc.
 
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Timothy Spann
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
Big Data Conference April 2015
Big Data Conference April 2015Big Data Conference April 2015
Big Data Conference April 2015Aaron Benz
 
Kite SDK introduction for Portland Big Data
Kite SDK introduction for Portland Big DataKite SDK introduction for Portland Big Data
Kite SDK introduction for Portland Big Data_blue
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...HBaseCon
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...Chris Huang
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionMapR Technologies
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Jeremy Walsh
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2Wes Floyd
 
Analyzing twitter data with hadoop
Analyzing twitter data with hadoopAnalyzing twitter data with hadoop
Analyzing twitter data with hadoopJoey Echeverria
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeMapR Technologies
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeDataWorks Summit
 

Similar a HBaseCon 2014-Just the Basics (20)

HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Kite SDK: Working with Datasets
Kite SDK: Working with DatasetsKite SDK: Working with Datasets
Kite SDK: Working with Datasets
 
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
Big Data Conference April 2015
Big Data Conference April 2015Big Data Conference April 2015
Big Data Conference April 2015
 
Kite SDK introduction for Portland Big Data
Kite SDK introduction for Portland Big DataKite SDK introduction for Portland Big Data
Kite SDK introduction for Portland Big Data
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
 
Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
 
Analyzing twitter data with hadoop
Analyzing twitter data with hadoopAnalyzing twitter data with hadoop
Analyzing twitter data with hadoop
 
Couchbase 101
Couchbase 101 Couchbase 101
Couchbase 101
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 

Más de Jesse Anderson

Managing Real-Time Data Teams
Managing Real-Time Data TeamsManaging Real-Time Data Teams
Managing Real-Time Data TeamsJesse Anderson
 
Pulsar for Kafka People
Pulsar for Kafka PeoplePulsar for Kafka People
Pulsar for Kafka PeopleJesse Anderson
 
Big Data and Analytics in the COVID-19 Era
Big Data and Analytics in the COVID-19 EraBig Data and Analytics in the COVID-19 Era
Big Data and Analytics in the COVID-19 EraJesse Anderson
 
Working Together As Data Teams V1
Working Together As Data Teams V1Working Together As Data Teams V1
Working Together As Data Teams V1Jesse Anderson
 
What Does an Exec Need to About Architecture and Why
What Does an Exec Need to About Architecture and WhyWhat Does an Exec Need to About Architecture and Why
What Does an Exec Need to About Architecture and WhyJesse Anderson
 
The Five Dysfunctions of a Data Engineering Team
The Five Dysfunctions of a Data Engineering TeamThe Five Dysfunctions of a Data Engineering Team
The Five Dysfunctions of a Data Engineering TeamJesse Anderson
 
Million Monkeys User Group
Million Monkeys User GroupMillion Monkeys User Group
Million Monkeys User GroupJesse Anderson
 
Strata 2012 Million Monkeys
Strata 2012 Million MonkeysStrata 2012 Million Monkeys
Strata 2012 Million MonkeysJesse Anderson
 
EC2 Performance, Spot Instance ROI and EMR Scalability
EC2 Performance, Spot Instance ROI and EMR ScalabilityEC2 Performance, Spot Instance ROI and EMR Scalability
EC2 Performance, Spot Instance ROI and EMR ScalabilityJesse Anderson
 
Introduction to Regular Expressions
Introduction to Regular ExpressionsIntroduction to Regular Expressions
Introduction to Regular ExpressionsJesse Anderson
 
Introduction to Android
Introduction to AndroidIntroduction to Android
Introduction to AndroidJesse Anderson
 

Más de Jesse Anderson (13)

Managing Real-Time Data Teams
Managing Real-Time Data TeamsManaging Real-Time Data Teams
Managing Real-Time Data Teams
 
Pulsar for Kafka People
Pulsar for Kafka PeoplePulsar for Kafka People
Pulsar for Kafka People
 
Big Data and Analytics in the COVID-19 Era
Big Data and Analytics in the COVID-19 EraBig Data and Analytics in the COVID-19 Era
Big Data and Analytics in the COVID-19 Era
 
Working Together As Data Teams V1
Working Together As Data Teams V1Working Together As Data Teams V1
Working Together As Data Teams V1
 
What Does an Exec Need to About Architecture and Why
What Does an Exec Need to About Architecture and WhyWhat Does an Exec Need to About Architecture and Why
What Does an Exec Need to About Architecture and Why
 
The Five Dysfunctions of a Data Engineering Team
The Five Dysfunctions of a Data Engineering TeamThe Five Dysfunctions of a Data Engineering Team
The Five Dysfunctions of a Data Engineering Team
 
Million Monkeys User Group
Million Monkeys User GroupMillion Monkeys User Group
Million Monkeys User Group
 
Strata 2012 Million Monkeys
Strata 2012 Million MonkeysStrata 2012 Million Monkeys
Strata 2012 Million Monkeys
 
EC2 Performance, Spot Instance ROI and EMR Scalability
EC2 Performance, Spot Instance ROI and EMR ScalabilityEC2 Performance, Spot Instance ROI and EMR Scalability
EC2 Performance, Spot Instance ROI and EMR Scalability
 
Introduction to Regular Expressions
Introduction to Regular ExpressionsIntroduction to Regular Expressions
Introduction to Regular Expressions
 
Why Use MVC?
Why Use MVC?Why Use MVC?
Why Use MVC?
 
How to Use MVC
How to Use MVCHow to Use MVC
How to Use MVC
 
Introduction to Android
Introduction to AndroidIntroduction to Android
Introduction to Android
 

Último

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Último (20)

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

HBaseCon 2014-Just the Basics

  • 1. 1 HBase: Just the Basics Jesse Anderson – Curriculum Developer and Instructor v2
  • 2. 2 What Is HBase? ©2014 Cloudera, Inc. All rights reserved.2 • NoSQL datastore built on top of HDFS (Hadoop) • An Apache Top Level Project • Handles the various manifestations of Big Data • Based on Google’s BigTable paper
  • 3. 3 Why Use HBase? ©2014 Cloudera, Inc. All rights reserved.3 • Storing large amounts of data (TB/PB) • High throughput for a large number of requests • Storing unstructured or variable column data • Big Data with random read and writes
  • 4. 4 When to Consider Not Using HBase? ©2014 Cloudera, Inc. All rights reserved.4 • Only use with Big Data problems • Read straight through files • Write all at once or append new files • Not random reads or writes • Access patterns of the data are ill-defined
  • 6. 6 Meet the Daemons ©2014 Cloudera, Inc. All rights reserved.6 • HBase Master • RegionServer • ZooKeeper • HDFS • NameNode/Standby NameNode • DataNode
  • 7. 7 Daemon Locations ©2014 Cloudera, Inc. All rights reserved.7 NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer Master Nodes Slave Nodes
  • 8. 8 Tables and Column Families ©2014 Cloudera, Inc. All rights reserved.8 Column Family “contactinfo” Column Family “profilephoto” Tables are broken into groupings called Column Families. Group data frequently accessed together and compress it Group photos with different settings
  • 9. 9 Rows and Columns ©2014 Cloudera, Inc. All rights reserved.9 Row key Column Family “contactinfo” Column Family “profilephoto” adupont fname: Andre lname: Dupont jsmith fname: John lname: Smith image: <smith.jpg> mrossi fname: Mario lname: Rossi image: <mario.jpg> Row keys identify a row No storage penalty for unused columns Each Column Family can have many columns
  • 10. 10 Regions ©2014 Cloudera, Inc. All rights reserved.10 Row key Column Family “contactinfo” adupont fname: Andre lname: Dupont jsmith fname: John lname: Smith A table is broken into regions NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer Row key Column Family “contactinfo” mrossi fname: Mario lname: Rossi zstevens fname: Zack lname: Stevens Regions are served by RegionServers
  • 11. 11 Client Write Path ©2014 Cloudera, Inc. All rights reserved.11 NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer 1. Which RegionServer is serving the Region? 2. Write to RegionServer
  • 12. 12 Client Read Path ©2014 Cloudera, Inc. All rights reserved.12 NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer 1. Which RegionServer is serving the Region? 2. Read from RegionServer
  • 13. 13 HBase API How to access the data
  • 14. 14 No SQL Means No SQL ©2014 Cloudera, Inc. All rights reserved.14 • Data is not accessed over SQL • You must: • Create your own connections • Keep track of the type of data in a column • Give each row a key • Access a row by its key
  • 15. 15 Types of Access ©2014 Cloudera, Inc. All rights reserved.15 • Gets • Gets a row’s data based on the row key • Puts • Upserts a row with data based on the row key • Scans • Finds all matching rows based on the row key • Scan logic can be increased by using filters
  • 16. 16 Gets ©2014 Cloudera, Inc. All rights reserved.16 1 2 3 4 Get g = new Get(ROW_KEY_BYTES); Result r= table.get(g); byte[] byteArray = r.getValue(COLFAM_BYTS,COLDESC_BYTS); String columnValue = Bytes.toString(byteArray);
  • 17. 17 Puts ©2014 Cloudera, Inc. All rights reserved.17 1 2 3 4 Put p = new Put(ROW_KEY_BYTES); p.add(COLFAM_BYTES, COLDESC_BYTES, Bytes.toBytes("value")); table.put(p);
  • 19. 19 No SQL Means No SQL ©2014 Cloudera, Inc. All rights reserved.19 • Designing schemas for HBase requires an in-depth knowledge • Schema Design is ‘data-centric’ not ‘relationship- centric’ • You design around how data is accessed • Row keys are engineered
  • 20. 20 Treating HBase like a traditional RDBMS will lead to abject failure! Captain Picard
  • 21. 21 Row Keys ©2014 Cloudera, Inc. All rights reserved.21 • A row key is more than the glue between two tables • Engineering time is spent just on constructing a row key • Contents of a row key vary by access pattern • Often made up of several pieces of data: <group_id><email>
  • 22. 22 Schema Design ©2014 Cloudera, Inc. All rights reserved.22 • Schema design does not start in an ERD • Access pattern must be known and ascertained • Denormalize to improve performance • Fewer, bigger tables
  • 23. 23 ©2014 Cloudera, Inc. All rights reserved. Jesse Anderson @jessetanderson