SlideShare a Scribd company logo
1 of 24
Dancing With The Elephant
We will discuss
• Introduction to Hadoop
• HBase: Definition, Storage Model, Usecases
• Basic Data Access from shell
• Hands-on with HBase API
What is Hadoop
• Framework for distributed processing of large
datasets(BigData)
• HDFS+MapReduce
• HDFS: (Data)
 Distributed Filesystem responsible for storing data across
cluster
 Provides replication on cheap commodity hardware
 Namenode and DataNode processes
• MapReduce: (Processing)
 May be a future session
HBase: What
• a sparse, distributed, persistent, multidimensional, sorted
map ( defined by Google’s paper on BigTable)
• Distributed NoSQL Database designed on top of HDFS
RDBMS Woes (with massive data)
• Scaling is Hard and Expensive
• Turn off relational features/secondary indexes.. to scale
• Hard to do quick reads at larger tables sizes(500 GB)
• Single point of failures
• Schema changes
HBase: Why
• Scalable: Just add nodes as your data grows
• Distributed: Leveraging Hadoop’s HDFS advantages
• Built on top of Hadoop : Being part of the
ecosystem, can be integrated to multiple tools
• High performance for read/write
 Short-Circuit reads
 Single reads: 1 to 10 ms, Scan for: 100s of rows in 10ms
• Schema less
• Production-Ready where data is in order of petabytes
HBase: Storage Model 1
HTable
• Tables are split into regions
• Region: Data with continuous range of RowKeys from
[Start to End) sorted Order
• Regions split as Table grows (Region size can be
configured)
• Table Schema defines Column Families
• (Table, RowKey, ColumnFamily, ColumnName, Timestamp) 
Value
HTable(Data Structure)
• SortedMap(
RowKey, List(
SortedMap(
Column, List(
Value, Timestamp
)
)
)
)
HBase: Data Read/Write
• Get: Random read
• Scan: Sequential read
• Put: Write/Update
HBase: Data Access Clients
• Demo of HBase shell
• Java API
HBase: API
• Connection
• DDL
• DML
• Filters
• Hands-On
HBase: API
• Configuration: holds details where to find the cluster
and tunable setting .
• Hconnection : represent connection to the cluster.
• HBaseAdmin: handles DDL
operations(create, list,drop,alter).
• Htable (HTableInterface) :is a handle on a single Hbase
table. Send “command” to the table (Put , Get , Scan
, Delete , Increment)
HBase: API:DDL
Group name: ddl (Data Defination Language)
Commands:
alter, create, describe, disable, drop, enable, exists, is_di
sabled, is_enabled, list
HBase: API:DDL
HBaseConfiguration conf = new HBaseConfiguration();
conf.set("hbase.master","localhost:60010");
HBaseAdmin hbase = new HBaseAdmin(conf);
HTableDescriptor desc = new HTableDescriptor(" testtable ");
HColumnDescriptor meta = new HColumnDescriptor(" colfam1
".getBytes());
HColumnDescriptor prefix = new HColumnDescriptor(" colfam2
".getBytes());
desc.addFamily(meta);
desc.addFamily(prefix);
hbase.createTable(desc);
HBase: API:DML
Group name: dml (Data Manipulation Language)
Commands:
count, delete, deleteall, get, get_counter, incr, put, scan,
truncate
HBase: API:DML PUT
HTable table = new HTable(conf, "testtable");
Put put = new Put(Bytes.toBytes("row1"));
put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"),
Bytes.toBytes("val1"));
put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"),
Bytes.toBytes("val2"));
table.put(put);
HBase: API:DML GET
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "testtable");
Get get = new Get(Bytes.toBytes("row1"));
get.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("q
ual1"));
Result result = table.get(get);
byte[] val = result.getValue(Bytes.toBytes("colfam1"),
Bytes.toBytes("qual1"));
System.out.println("Value: " + Bytes.toString(val));
HBase: API:DML SCAN
Scan scan1 = new Scan();
ResultScanner scanner1 = table.getScanner(scan1);
for (Result res : scanner1) {
System.out.println(res);
}
scanner1.close();
Other Projects around HBase
• SQL Layer: Phoenix, Hive, Impala
• Object Persistence: Lily, Kundera
FollowUp
• Part2:
 Building KeyValue Data store in HBase
 Challenges we faced in SMART
• {Rahul, vinay}@briotribes.com
Shoutout To
HBase: Usecase (Facebook)
• Facebook Messaging:
 Titan
 1.5 M ops per second at peak
 6B+ messages per day
 16 columns per operation across diff. families
• Facebook insights:
 Puma
 provides developers and Page owners with metrics about their
content
 > 1 M counter increments per second
Dancing with the elephant   h base1_final

More Related Content

What's hot

Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clustermas4share
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache HiveHBaseCon
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesHBaseCon
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseenissoz
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseCloudera, Inc.
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduceFARUK BERKSÖZ
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataCloudera, Inc.
 
Data Evolution in HBase
Data Evolution in HBaseData Evolution in HBase
Data Evolution in HBaseHBaseCon
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...DataWorks Summit/Hadoop Summit
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtMichael Stack
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseDataWorks Summit
 

What's hot (20)

Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
Data Evolution in HBase
Data Evolution in HBaseData Evolution in HBase
Data Evolution in HBase
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 

Viewers also liked

Ppdb 2013 2014 4
Ppdb 2013 2014 4Ppdb 2013 2014 4
Ppdb 2013 2014 4johanstupa
 
Ppdb 2013 2014 3
Ppdb 2013 2014 3Ppdb 2013 2014 3
Ppdb 2013 2014 3johanstupa
 
Ppdb 2013 2014 1
Ppdb 2013 2014 1Ppdb 2013 2014 1
Ppdb 2013 2014 1johanstupa
 
Ppdb 2013 2014 2 juli 2013
Ppdb 2013 2014 2 juli 2013Ppdb 2013 2014 2 juli 2013
Ppdb 2013 2014 2 juli 2013johanstupa
 
Ppdb 2013 2014 5
Ppdb 2013 2014 5Ppdb 2013 2014 5
Ppdb 2013 2014 5johanstupa
 
Trabajo de investigación 1 diurno nocturno
Trabajo de investigación 1 diurno nocturnoTrabajo de investigación 1 diurno nocturno
Trabajo de investigación 1 diurno nocturnodarwinproyectoilustrador
 
Stcw cir. no. 02 s2014 Marina
Stcw cir. no. 02 s2014 MarinaStcw cir. no. 02 s2014 Marina
Stcw cir. no. 02 s2014 MarinaGello Hembz
 
Методичний супровід проектної діяльності вчителя та учнів
Методичний супровід проектної діяльності вчителя та учнівМетодичний супровід проектної діяльності вчителя та учнів
Методичний супровід проектної діяльності вчителя та учнівВиктория Грига
 
Symposium koha 2016 nouveautés 16.05
Symposium koha 2016 nouveautés 16.05Symposium koha 2016 nouveautés 16.05
Symposium koha 2016 nouveautés 16.05PaulPoulain
 
Cardiac development & fetal circulation (2)
Cardiac development & fetal  circulation (2)Cardiac development & fetal  circulation (2)
Cardiac development & fetal circulation (2)Deeptha Premnath
 
Презентація до семінару дистанційне навчання (1)
Презентація до семінару дистанційне навчання (1)Презентація до семінару дистанційне навчання (1)
Презентація до семінару дистанційне навчання (1)Виктория Грига
 
Electric Servo Motor
Electric Servo MotorElectric Servo Motor
Electric Servo MotorGello Hembz
 

Viewers also liked (16)

Ppdb 2013 2014 4
Ppdb 2013 2014 4Ppdb 2013 2014 4
Ppdb 2013 2014 4
 
2013 RFS AMCLC
2013 RFS AMCLC2013 RFS AMCLC
2013 RFS AMCLC
 
Ppdb 2013 2014 3
Ppdb 2013 2014 3Ppdb 2013 2014 3
Ppdb 2013 2014 3
 
Ppdb 2013 2014 1
Ppdb 2013 2014 1Ppdb 2013 2014 1
Ppdb 2013 2014 1
 
Ppdb 2013 2014 2 juli 2013
Ppdb 2013 2014 2 juli 2013Ppdb 2013 2014 2 juli 2013
Ppdb 2013 2014 2 juli 2013
 
Ppdb 2013 2014 5
Ppdb 2013 2014 5Ppdb 2013 2014 5
Ppdb 2013 2014 5
 
ACR RFS Overview
ACR RFS OverviewACR RFS Overview
ACR RFS Overview
 
Trabajo de investigación 1 diurno nocturno
Trabajo de investigación 1 diurno nocturnoTrabajo de investigación 1 diurno nocturno
Trabajo de investigación 1 diurno nocturno
 
Stcw cir. no. 02 s2014 Marina
Stcw cir. no. 02 s2014 MarinaStcw cir. no. 02 s2014 Marina
Stcw cir. no. 02 s2014 Marina
 
Методичний супровід проектної діяльності вчителя та учнів
Методичний супровід проектної діяльності вчителя та учнівМетодичний супровід проектної діяльності вчителя та учнів
Методичний супровід проектної діяльності вчителя та учнів
 
Symposium koha 2016 nouveautés 16.05
Symposium koha 2016 nouveautés 16.05Symposium koha 2016 nouveautés 16.05
Symposium koha 2016 nouveautés 16.05
 
студия 2016
студия  2016студия  2016
студия 2016
 
Cardiac development & fetal circulation (2)
Cardiac development & fetal  circulation (2)Cardiac development & fetal  circulation (2)
Cardiac development & fetal circulation (2)
 
Презентація до семінару дистанційне навчання (1)
Презентація до семінару дистанційне навчання (1)Презентація до семінару дистанційне навчання (1)
Презентація до семінару дистанційне навчання (1)
 
Electric Servo Motor
Electric Servo MotorElectric Servo Motor
Electric Servo Motor
 
Интегрированный урок
Интегрированный урокИнтегрированный урок
Интегрированный урок
 

Similar to Dancing with the elephant h base1_final

HBase.pptx
HBase.pptxHBase.pptx
HBase.pptxSadhik7
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbaseRavi Veeramachaneni
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalMichael Rainey
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache HadoopOleksiy Krotov
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of HadoopNam Nham
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comEdward D. Kim
 
hive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxhive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxvishwasgarade1
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondGruter
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
Dynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDataWorks Summit
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 

Similar to Dancing with the elephant h base1_final (20)

HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
מיכאל
מיכאלמיכאל
מיכאל
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle Professional
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Hbase
HbaseHbase
Hbase
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of Hadoop
 
Hive and querying data
Hive and querying dataHive and querying data
Hive and querying data
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.com
 
hive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxhive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptx
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its Beyond
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Dynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File System
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 

Recently uploaded

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 

Recently uploaded (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Dancing with the elephant h base1_final

  • 1. Dancing With The Elephant
  • 2. We will discuss • Introduction to Hadoop • HBase: Definition, Storage Model, Usecases • Basic Data Access from shell • Hands-on with HBase API
  • 3. What is Hadoop • Framework for distributed processing of large datasets(BigData) • HDFS+MapReduce • HDFS: (Data)  Distributed Filesystem responsible for storing data across cluster  Provides replication on cheap commodity hardware  Namenode and DataNode processes • MapReduce: (Processing)  May be a future session
  • 4. HBase: What • a sparse, distributed, persistent, multidimensional, sorted map ( defined by Google’s paper on BigTable) • Distributed NoSQL Database designed on top of HDFS
  • 5. RDBMS Woes (with massive data) • Scaling is Hard and Expensive • Turn off relational features/secondary indexes.. to scale • Hard to do quick reads at larger tables sizes(500 GB) • Single point of failures • Schema changes
  • 6. HBase: Why • Scalable: Just add nodes as your data grows • Distributed: Leveraging Hadoop’s HDFS advantages • Built on top of Hadoop : Being part of the ecosystem, can be integrated to multiple tools • High performance for read/write  Short-Circuit reads  Single reads: 1 to 10 ms, Scan for: 100s of rows in 10ms • Schema less • Production-Ready where data is in order of petabytes
  • 8. HTable • Tables are split into regions • Region: Data with continuous range of RowKeys from [Start to End) sorted Order • Regions split as Table grows (Region size can be configured) • Table Schema defines Column Families • (Table, RowKey, ColumnFamily, ColumnName, Timestamp)  Value
  • 9. HTable(Data Structure) • SortedMap( RowKey, List( SortedMap( Column, List( Value, Timestamp ) ) ) )
  • 10. HBase: Data Read/Write • Get: Random read • Scan: Sequential read • Put: Write/Update
  • 11. HBase: Data Access Clients • Demo of HBase shell • Java API
  • 12. HBase: API • Connection • DDL • DML • Filters • Hands-On
  • 13. HBase: API • Configuration: holds details where to find the cluster and tunable setting . • Hconnection : represent connection to the cluster. • HBaseAdmin: handles DDL operations(create, list,drop,alter). • Htable (HTableInterface) :is a handle on a single Hbase table. Send “command” to the table (Put , Get , Scan , Delete , Increment)
  • 14. HBase: API:DDL Group name: ddl (Data Defination Language) Commands: alter, create, describe, disable, drop, enable, exists, is_di sabled, is_enabled, list
  • 15. HBase: API:DDL HBaseConfiguration conf = new HBaseConfiguration(); conf.set("hbase.master","localhost:60010"); HBaseAdmin hbase = new HBaseAdmin(conf); HTableDescriptor desc = new HTableDescriptor(" testtable "); HColumnDescriptor meta = new HColumnDescriptor(" colfam1 ".getBytes()); HColumnDescriptor prefix = new HColumnDescriptor(" colfam2 ".getBytes()); desc.addFamily(meta); desc.addFamily(prefix); hbase.createTable(desc);
  • 16. HBase: API:DML Group name: dml (Data Manipulation Language) Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate
  • 17. HBase: API:DML PUT HTable table = new HTable(conf, "testtable"); Put put = new Put(Bytes.toBytes("row1")); put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val1")); put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"), Bytes.toBytes("val2")); table.put(put);
  • 18. HBase: API:DML GET Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "testtable"); Get get = new Get(Bytes.toBytes("row1")); get.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("q ual1")); Result result = table.get(get); byte[] val = result.getValue(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1")); System.out.println("Value: " + Bytes.toString(val));
  • 19. HBase: API:DML SCAN Scan scan1 = new Scan(); ResultScanner scanner1 = table.getScanner(scan1); for (Result res : scanner1) { System.out.println(res); } scanner1.close();
  • 20. Other Projects around HBase • SQL Layer: Phoenix, Hive, Impala • Object Persistence: Lily, Kundera
  • 21. FollowUp • Part2:  Building KeyValue Data store in HBase  Challenges we faced in SMART • {Rahul, vinay}@briotribes.com
  • 23. HBase: Usecase (Facebook) • Facebook Messaging:  Titan  1.5 M ops per second at peak  6B+ messages per day  16 columns per operation across diff. families • Facebook insights:  Puma  provides developers and Page owners with metrics about their content  > 1 M counter increments per second