SlideShare a Scribd company logo
1 of 18
Introduction to HBase
Anil Gupta
@bigdatanoob
What is NoSql?
RDBMS vs NoSql
HBase
HBase Components
Architecture
HBase Cluster
HBase Data Model
Key -> Value
Region
Outline
NoSQL is acronym for Not Only SQL. These databases are
non-relational. This term was coined in 1998.
They do not use SQL as their primary language.
NoSQL is not a replacement of Relational
Database.
NoSQL is designed for distributed data stores
NoSQL was designed to store semi-structured
and sparse data
NoSQL RDBMS
Hardware Farm of Commodity(upto
several thousand)
1-3 High End or
Proprietary(costly)
Data Type Semi-structured and
Sparse
Structured and dense
Data Size PetaBytes(1015) TeraBytes(1012 bytes)
Auto-Sharding Yes No
Flexible Schema Yes No
Referential Integrity No Yes
Support for Joins No Yes
Support for Aggregations Basic Advance
HBase is an open-source, distributed, versioned,
key-value database modeled after Google's
Bigtable.
is optional for
HBase has real-time read/writes(in milliseconds)
HBase is highly fault tolerant(HA) and scalable
+ Random Read/Write
access= + Apache
Zookeeper
Selling Points of HBase
Highly Scalable
Auto-sharding
Strongly Consistent
Out of the box support for Historical Data
Very high read throughput
Readily compatible with Hadoop
Highly Fault-tolerant(HA)
HBase Components
1. HBase Master(HMaster): HMaster is the
Master Server.
 HMaster is responsible for monitoring all
RegionServers
 Performs load balancing a.k.a sharding
 Assigns regions to RegionServers
 All the metadata changes go through Master
 Periodically checks and cleans up the .META.
table
 Multiple HMaster can run in cluster but only one
HMaster will be active at any time.
HBase Components(cont.)
2. RegionServer(HRegionServer):
HRegionServer is the implementation of the
worker module.
 Runs as Java Service on worker nodes.
 Machine running a RegionServer is considered
a worker node.
 Serves get/put/scan requests
 Responsible for splitting and compacting regions
 Runs on DataNode
 Multiple RegionServers run in a cluster
Zookeeper in HBase
ZooKeeper: It allows distributed processes to
coordinate with each other through a shared
hierarchical name space. It is distributed and
highly reliable service.
In HBase it is responsible for following:
 Provide availability status of RegionServers
 To ensure single active HMaster in the cluster
 Provide location of “-ROOT-” table
 Selection of new HMaster in case of failure of
an active HMaster
HBase Architecture
HBase Cluster
Worker
Node
Worker
Node
Worker Node
DataNodeDataNode
TaskTracker
HRegionServe
r
DataNode
TaskTracker
HRegionServe
r
Worker Node
DataNode
Worker Node
DataNode
RegionServer
Worker Node
DataNode
Worker Node
DataNode
Worker Node
DataNode
Worker Node
DataNode
Name
Node
HMaster
Zoo
keeper
HMaster
RegionServer
RegionServer
RegionServerRegionServer
RegionServerRegionServer
Name
Node
Column Family and Column Qualifier
Column Family: Columns Qualifiers in HBase are grouped
into column families.
The colon character (:) delimits the column qualifier family
from the column family.
Combination of <Column Family>: <Column Qualifier> is
equivalent to a Column name.
Physically, all column qualifiers of a column family are stored
together on the file system.
• Column Qualifiers within a family are sorted lexicographically and
stored together
Example: txn:amt , Here “txn” is the Column Family and “amt” is
the Column Qualifier.
HBase Data Model
• Table maintains data in lexicographic order by RowKey.
• Everything except table names are stored as byte array
• Only column families are defined at the creation time of table
 Each family can have any number of columns(to a
maximum of few millions)
 Each row can have different columns in a column family
 Each column consists of any number of versions
 Columns only exist when inserted because HBase does
not have NULL values
(RowKey, Column Family:Column Qualifier,
Timestamp) is a “Key” in HBase.
“Value” is stored corresponding to a “Key”
Timestamp is used to support storing of Historical
Data
Table is always indexed on RowKey
Key -> Value in HBase
Region
Tables in HBase are divided into multiple Regions.
1 Region = 1 Partition of Table
Regions are hosted by RegionServers
1 RegionServer can host 100’s of Regions
RegionServer can host Regions from multiple
tables.
After a major compaction, every region has 1 HFile
for each column family.
Random Facts About
HBase
Data in HBase is stored in HFile Format
Values are stored as Byte Array in HFiles
HLog is the file format used for storing “Write
Ahead Logging” in HBase.
References
http://hbase.apache.org/
https://hadoop.apache.org/
http://www.larsgeorge.com/2009/10/hbase-
architecture-101-storage.html
Questions?

More Related Content

What's hot

Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets
robertlz
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 

What's hot (20)

Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Apache hive
Apache hiveApache hive
Apache hive
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
Sqoop
SqoopSqoop
Sqoop
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 

Viewers also liked (6)

HiveServer2
HiveServer2HiveServer2
HiveServer2
 
HiveServer2 for Apache Hive
HiveServer2 for Apache HiveHiveServer2 for Apache Hive
HiveServer2 for Apache Hive
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
GFS
GFSGFS
GFS
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 

Similar to Introduction To HBase

Hbase introduction
Hbase introductionHbase introduction
Hbase introduction
yangwm
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Yiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
yongboy
 

Similar to Introduction To HBase (20)

HBASE Overview
HBASE OverviewHBASE Overview
HBASE Overview
 
Hbase
HbaseHbase
Hbase
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Hbase.pptx
Hbase.pptxHbase.pptx
Hbase.pptx
 
Hbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsHbase Quick Review Guide for Interviews
Hbase Quick Review Guide for Interviews
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Hbase introduction
Hbase introductionHbase introduction
Hbase introduction
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
01 hbase
01 hbase01 hbase
01 hbase
 
Big data hbase
Big data hbase Big data hbase
Big data hbase
 
Hbase
HbaseHbase
Hbase
 
Hbase
HbaseHbase
Hbase
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 

Recently uploaded

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Recently uploaded (20)

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 

Introduction To HBase

  • 1. Introduction to HBase Anil Gupta @bigdatanoob
  • 2. What is NoSql? RDBMS vs NoSql HBase HBase Components Architecture HBase Cluster HBase Data Model Key -> Value Region Outline
  • 3. NoSQL is acronym for Not Only SQL. These databases are non-relational. This term was coined in 1998. They do not use SQL as their primary language. NoSQL is not a replacement of Relational Database. NoSQL is designed for distributed data stores NoSQL was designed to store semi-structured and sparse data
  • 4. NoSQL RDBMS Hardware Farm of Commodity(upto several thousand) 1-3 High End or Proprietary(costly) Data Type Semi-structured and Sparse Structured and dense Data Size PetaBytes(1015) TeraBytes(1012 bytes) Auto-Sharding Yes No Flexible Schema Yes No Referential Integrity No Yes Support for Joins No Yes Support for Aggregations Basic Advance
  • 5. HBase is an open-source, distributed, versioned, key-value database modeled after Google's Bigtable. is optional for HBase has real-time read/writes(in milliseconds) HBase is highly fault tolerant(HA) and scalable + Random Read/Write access= + Apache Zookeeper
  • 6. Selling Points of HBase Highly Scalable Auto-sharding Strongly Consistent Out of the box support for Historical Data Very high read throughput Readily compatible with Hadoop Highly Fault-tolerant(HA)
  • 7. HBase Components 1. HBase Master(HMaster): HMaster is the Master Server.  HMaster is responsible for monitoring all RegionServers  Performs load balancing a.k.a sharding  Assigns regions to RegionServers  All the metadata changes go through Master  Periodically checks and cleans up the .META. table  Multiple HMaster can run in cluster but only one HMaster will be active at any time.
  • 8. HBase Components(cont.) 2. RegionServer(HRegionServer): HRegionServer is the implementation of the worker module.  Runs as Java Service on worker nodes.  Machine running a RegionServer is considered a worker node.  Serves get/put/scan requests  Responsible for splitting and compacting regions  Runs on DataNode  Multiple RegionServers run in a cluster
  • 9. Zookeeper in HBase ZooKeeper: It allows distributed processes to coordinate with each other through a shared hierarchical name space. It is distributed and highly reliable service. In HBase it is responsible for following:  Provide availability status of RegionServers  To ensure single active HMaster in the cluster  Provide location of “-ROOT-” table  Selection of new HMaster in case of failure of an active HMaster
  • 11. HBase Cluster Worker Node Worker Node Worker Node DataNodeDataNode TaskTracker HRegionServe r DataNode TaskTracker HRegionServe r Worker Node DataNode Worker Node DataNode RegionServer Worker Node DataNode Worker Node DataNode Worker Node DataNode Worker Node DataNode Name Node HMaster Zoo keeper HMaster RegionServer RegionServer RegionServerRegionServer RegionServerRegionServer Name Node
  • 12. Column Family and Column Qualifier Column Family: Columns Qualifiers in HBase are grouped into column families. The colon character (:) delimits the column qualifier family from the column family. Combination of <Column Family>: <Column Qualifier> is equivalent to a Column name. Physically, all column qualifiers of a column family are stored together on the file system. • Column Qualifiers within a family are sorted lexicographically and stored together Example: txn:amt , Here “txn” is the Column Family and “amt” is the Column Qualifier.
  • 13. HBase Data Model • Table maintains data in lexicographic order by RowKey. • Everything except table names are stored as byte array • Only column families are defined at the creation time of table  Each family can have any number of columns(to a maximum of few millions)  Each row can have different columns in a column family  Each column consists of any number of versions  Columns only exist when inserted because HBase does not have NULL values
  • 14. (RowKey, Column Family:Column Qualifier, Timestamp) is a “Key” in HBase. “Value” is stored corresponding to a “Key” Timestamp is used to support storing of Historical Data Table is always indexed on RowKey Key -> Value in HBase
  • 15. Region Tables in HBase are divided into multiple Regions. 1 Region = 1 Partition of Table Regions are hosted by RegionServers 1 RegionServer can host 100’s of Regions RegionServer can host Regions from multiple tables. After a major compaction, every region has 1 HFile for each column family.
  • 16. Random Facts About HBase Data in HBase is stored in HFile Format Values are stored as Byte Array in HFiles HLog is the file format used for storing “Write Ahead Logging” in HBase.