SlideShare una empresa de Scribd logo
1 de 30
© Hortonworks Inc. 2011
Integration of Apache Hive
and HBase
Enis Soztutar
enis [at] apache [dot] org
Page 1
Ashutosh Chauhan
hashutosh [at] apache [dot] org
© Hortonworks Inc. 2011
About Us
Page 2
Architecting the Future of Big Data
Enis Soztutar
•  In the Hadoop space since 2007
•  Committer and PMC Member in Apache HBase and Hadoop
•  Twitter: @enissoz
Ashutosh Chauhan
•  In the Hadoop space since 2009
•  Committer and PMC Member in Apache Hive and Pig
© Hortonworks Inc. 2011
Agenda
Page 3
Architecting the Future of Big Data
•  Overview of Hive
•  Hive + HBase Features and Improvements
•  Future of Hive and HBase
•  Q&A
© Hortonworks Inc. 2011
Apache Hive Overview
• Apache Hive is a data warehouse system for Hadoop
• SQL-like query language called HiveQL
• Built for PB scale data
• Main purpose is analysis and ad hoc querying
• Database / table / partition / bucket – DDL Operations
• SQL Types + Complex Types (ARRAY, MAP, etc)
• Very extensible
• Not for : small data sets, OLTP
Page 4
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Apache Hive Architecture
Page 5
Architecting the Future of Big Data
Metastore
RDBMS
Hive Thrift
Server
Driver
CLI
JDBC/ODBC
Hive Web
Interface
HDFS
MapReduce
Execution
Parser Planner
Optimizer
M
S
C
l
i
e
n
t
© Hortonworks Inc. 2011
Hive + HBase Features and
Improvements
Architecting the Future of Big Data
Page 6
© Hortonworks Inc. 2011
Hive + HBase Motivation
• Hive over HDFS and HBase has different characteristics
–  Batch Online
–  Structured vs Unstructured
– Analysts Programmers
• Hive datawarehouses on HDFS are
– Long ETL times
– Access to real time data
• Analyzing HBase data with MapReduce requires
custom coding
• Hive and SQL are already known by many analysts
Page 7
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Use Case 1: HBase as ETL Data Sink
Page 8
Architecting the Future of Big Data
From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook
http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010
HDFS
Tables
INSERT …
SELECT …!
FROM … !
HBase
Online Queries
© Hortonworks Inc. 2011
Use Case 2: HBase as Data Source
Page 9
Architecting the Future of Big Data
From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook
http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010
HDFS
Tables
SELECT …
JOIN …!
GROUP BY … !
HBase
Query
Result
© Hortonworks Inc. 2011
Use Case 3: Low Latency Warehouse
Page 10
Architecting the Future of Big Data
From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook
http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010
HDFS
Tables
HBase
Continuous
Updates
HIVE QUERIES!
Periodic
Dump
© Hortonworks Inc. 2011
Hive + HBase Example (HBase table)
hbase(main):001:0> create 'short_urls', {NAME => 'u'},
{NAME=>'s'}
hbase(main):014:0> scan 'short_urls'
ROW COLUMN+CELL
bit.ly/aaaa column=s:hits, value=100
bit.ly/aaaa column=u:url, value=hbase.apache.org/
bit.ly/abcd column=s:hits, value=123
bit.ly/abcd column=u:url, value=example.com/foo
Page 11
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Hive + HBase Example (Hive table)
CREATE TABLE short_urls(
short_url string,
url string,
hit_count int
)
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES
("hbase.columns.mapping" = ":key, u:url, s:hits")
TBLPROPERTIES
("hbase.table.name" = ”short_urls");
Page 12
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Storage Handler
• Hive defines HiveStorageHandler class for different storage
backends: HBase/ Cassandra / MongoDB/ etc
• Storage Handler has hooks for
–  Getting input / output formats
–  Meta data operations hook: CREATE TABLE, DROP TABLE,
etc
• Storage Handler is a table level concept
–  Does not support Hive partitions, and buckets
Page 13
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Apache Hive + HBase Architecture
Page 14
Architecting the Future of Big Data
Metastore
RDBMS
Hive Thrift
Server
Driver
CLI
Hive Web
Interface
HDFS
MapReduce
Execution
Parser Planner
Optimizer
M
S
C
l
i
e
n
t
HBase
StorageHandler
© Hortonworks Inc. 2011
Hive + HBase Integration
• For Input/OutputFormat, getSplits(), etc underlying HBase
classes are used
• Column selection and certain filters can be pushed down
• HBase tables can be used with other(Hadoop native) tables
and SQL constructs
• Hive DDL operations are converted to HBase DDL
operations via the client hook.
– All operations are performed by the client
– No two phase commit
Page 15
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Schema / Type Mapping
Architecting the Future of Big Data
Page 16
© Hortonworks Inc. 2011
Schema Mapping
• Hive table + columns + column types <=> HBase table + column
families (+ column qualifiers)
• Every field in Hive table is mapped to either
– The table key (using :key as selector)
– A column family (cf:) -> MAP fields in Hive
– A column (cf:cq)
•  Hive table does not need to include all columns in Hbase
Page 17
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Schema Mapping - Example
Page 18
Architecting the Future of Big Data
CREATE TABLE short_urls(
short_url string,
url string,
hit_count int,
props, map<string,string>
)
WITH SERDEPROPERTIES
("hbase.columns.mapping" = ":key, u:url, s:hits, p:")
© Hortonworks Inc. 2011
Schema Mapping - Example
Page 19
Architecting the Future of Big Data
CREATE TABLE short_urls(
short_url string,
url string,
hit_count int,
props map<string,string>
)
WITH SERDEPROPERTIES
("hbase.columns.mapping" = ":key, u:url, s:hits, p:")
© Hortonworks Inc. 2011
Type Mapping
• Added in Hive (0.9.0)
• Previously all types were being converted to strings in HBase
• Hive has:
– Primitive types: INT, STRING, BINARY, DOUBLE etc
– ARRAY<Type>
– MAP<PrimitiveType, Type>
– STRUCT<a:INT, b:STRING, c:STRING>
• HBase does not have types
– Bytes.toBytes()
Page 20
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Type Mapping
• Table level property
"hbase.table.default.storage.type” = “binary”
• Type mapping can be given per column after #
– Any prefix of “binary” , eg u:url#b
– Any prefix of “string” , eg u:url#s
– The dash char “-” , eg u:url#-
Page 21
© Hortonworks Inc. 2011
Type Mapping - Example
Page 22
Architecting the Future of Big Data
CREATE TABLE short_urls(
short_url string,
url string,
hit_count int,
props, map<string,string>
)
WITH SERDEPROPERTIES
("hbase.columns.mapping" = ":key#b,u:url#b,s:hits#b,p:#s")
© Hortonworks Inc. 2011
Type Mapping
• If the type is not a primitive or Map, it is converted to a JSON
string and serialized
• Still a few rough edges for schema and type mapping:
– No support for DECIMAL, BINARY Hive types
– No mapping of HBase timestamp (can only provide put
timestamp)
– No arbitrary mapping of Structs / Arrays into HBase schema
Page 23
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Bulk Load
• Steps to bulk load:
– Sample source data for range partitioning
– Save sampling results to a file
– Run CLUSTER BY query using HiveHFileOutputFormat and
TotalOrderPartitioner
– Import Hfiles into HBase table
• Ideal setup should be
SET hive.hbase.bulk=true
INSERT OVERWRITE TABLE web_table SELECT ….
Page 24
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Filter Pushdown
Architecting the Future of Big Data
Page 25
© Hortonworks Inc. 2011
Filter Pushdown
• Idea is to pass down filter expressions to the storage layer to
minimize scanned data
• To access indexes at hdfs or hbase
• Example:
CREATE EXTERNAL TABLE users (userid LONG, email STRING, … )
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,…")
SELECT ... FROM users WHERE userid > 1000000 and email LIKE
‘%@gmail.com’;
-> scan.setStartRow(Bytes.toBytes(1000000))
Page 26
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Filter Decomposition
• Optimizer pushes down the predicates to the query plan
• Storage handlers can negotiate with the Hive optimizer to
decompose the filter
x > 3 AND upper(y) = 'XYZ’
• Handle x > 3, send upper(y) = ’XYZ’ as residual for Hive
• Works with:
key = 3, key > 3, etc
key > 3 AND key < 100
• Only works against constant expressions
Page 27
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Future of Hive + HBase
• Improve on schema / type mapping
• Fully secure Hive deployment options
• HBase bulk import improvements
• Filter pushdown: non key column filters
• Sortable signed numeric types in HBase
• Use HBase’s new typing API’s (upcoming in HBase)
• Integration with Phoenix / extract common modules, hbase-
sql ?
Page 28
Architecting the Future of Big Data
© Hortonworks Inc. 2011
References
• Type mapping / Filter Pushdown
– https://issues.apache.org/jira/browse/HIVE-1634
– https://issues.apache.org/jira/browse/HIVE-1226
– https://issues.apache.org/jira/browse/HIVE-1643
– https://issues.apache.org/jira/browse/HIVE-2815
Page 29
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Thanks
Questions?
Architecting the Future of Big Data
Page 30

Más contenido relacionado

La actualidad más candente

HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataCloudera, Inc.
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBaseCon
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase Cloudera, Inc.
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future HBaseCon
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon
 
HBase Backups
HBase BackupsHBase Backups
HBase BackupsHBaseCon
 
HBaseCon 2015: Just the Basics
HBaseCon 2015: Just the BasicsHBaseCon 2015: Just the Basics
HBaseCon 2015: Just the BasicsHBaseCon
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
Batch is Back: Critical for Agile Application Adoption
Batch is Back: Critical for Agile Application AdoptionBatch is Back: Critical for Agile Application Adoption
Batch is Back: Critical for Agile Application AdoptionDataWorks Summit/Hadoop Summit
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaselarsgeorge
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster Cloudera, Inc.
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan
 

La actualidad más candente (20)

HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
 
HBaseCon 2015: Just the Basics
HBaseCon 2015: Just the BasicsHBaseCon 2015: Just the Basics
HBaseCon 2015: Just the Basics
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Batch is Back: Critical for Agile Application Adoption
Batch is Back: Critical for Agile Application AdoptionBatch is Back: Critical for Agile Application Adoption
Batch is Back: Critical for Agile Application Adoption
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
 

Destacado

HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsCloudera, Inc.
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low LatencyNick Dimiduk
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Hadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and RepairHadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and RepairCloudera, Inc.
 
Agile project management with green hopper 6 blueprints
Agile project management with green hopper 6 blueprintsAgile project management with green hopper 6 blueprints
Agile project management with green hopper 6 blueprintsJaibeer Malik
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
Sentiment Analysis Using Solr
Sentiment Analysis Using SolrSentiment Analysis Using Solr
Sentiment Analysis Using SolrPradeep Pujari
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill Carol McDonald
 
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storagehive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata StorageDataWorks Summit/Hadoop Summit
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache SqoopAvkash Chauhan
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityMapR Technologies
 
Apache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopApache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopDataWorks Summit
 
How To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and HadoopHow To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and HadoopHortonworks
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Applicationsupertom
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsCloudera, Inc.
 

Destacado (20)

HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table Snapshots
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Hadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and RepairHadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and Repair
 
Agile project management with green hopper 6 blueprints
Agile project management with green hopper 6 blueprintsAgile project management with green hopper 6 blueprints
Agile project management with green hopper 6 blueprints
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
Sentiment Analysis Using Solr
Sentiment Analysis Using SolrSentiment Analysis Using Solr
Sentiment Analysis Using Solr
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storagehive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
Apache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopApache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on Hadoop
 
How To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and HadoopHow To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and Hadoop
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
 

Similar a HBaseCon 2013: Integration of Apache Hive and HBase

HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Hive and Hbase inegration
Hive and Hbase inegrationHive and Hbase inegration
Hive and Hbase inegrationHumoyun Ahmedov
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...DataWorks Summit/Hadoop Summit
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
 
Whats newinhive090hadoopsummit2012bof
Whats newinhive090hadoopsummit2012bofWhats newinhive090hadoopsummit2012bof
Whats newinhive090hadoopsummit2012bofGopi Krishna
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0DataWorks Summit
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizonArtem Ervits
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Ankit Singhal
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseRishabh Dugar
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitData Con LA
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandJosh Elser
 
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善HortonworksJapan
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep DiveHortonworks
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 

Similar a HBaseCon 2013: Integration of Apache Hive and HBase (20)

HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Hive and Hbase inegration
Hive and Hbase inegrationHive and Hbase inegration
Hive and Hbase inegration
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
 
Whats newinhive090hadoopsummit2012bof
Whats newinhive090hadoopsummit2012bofWhats newinhive090hadoopsummit2012bof
Whats newinhive090hadoopsummit2012bof
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
 
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
מיכאל
מיכאלמיכאל
מיכאל
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Último (20)

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

HBaseCon 2013: Integration of Apache Hive and HBase

  • 1. © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis Soztutar enis [at] apache [dot] org Page 1 Ashutosh Chauhan hashutosh [at] apache [dot] org
  • 2. © Hortonworks Inc. 2011 About Us Page 2 Architecting the Future of Big Data Enis Soztutar •  In the Hadoop space since 2007 •  Committer and PMC Member in Apache HBase and Hadoop •  Twitter: @enissoz Ashutosh Chauhan •  In the Hadoop space since 2009 •  Committer and PMC Member in Apache Hive and Pig
  • 3. © Hortonworks Inc. 2011 Agenda Page 3 Architecting the Future of Big Data •  Overview of Hive •  Hive + HBase Features and Improvements •  Future of Hive and HBase •  Q&A
  • 4. © Hortonworks Inc. 2011 Apache Hive Overview • Apache Hive is a data warehouse system for Hadoop • SQL-like query language called HiveQL • Built for PB scale data • Main purpose is analysis and ad hoc querying • Database / table / partition / bucket – DDL Operations • SQL Types + Complex Types (ARRAY, MAP, etc) • Very extensible • Not for : small data sets, OLTP Page 4 Architecting the Future of Big Data
  • 5. © Hortonworks Inc. 2011 Apache Hive Architecture Page 5 Architecting the Future of Big Data Metastore RDBMS Hive Thrift Server Driver CLI JDBC/ODBC Hive Web Interface HDFS MapReduce Execution Parser Planner Optimizer M S C l i e n t
  • 6. © Hortonworks Inc. 2011 Hive + HBase Features and Improvements Architecting the Future of Big Data Page 6
  • 7. © Hortonworks Inc. 2011 Hive + HBase Motivation • Hive over HDFS and HBase has different characteristics –  Batch Online –  Structured vs Unstructured – Analysts Programmers • Hive datawarehouses on HDFS are – Long ETL times – Access to real time data • Analyzing HBase data with MapReduce requires custom coding • Hive and SQL are already known by many analysts Page 7 Architecting the Future of Big Data
  • 8. © Hortonworks Inc. 2011 Use Case 1: HBase as ETL Data Sink Page 8 Architecting the Future of Big Data From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010 HDFS Tables INSERT … SELECT …! FROM … ! HBase Online Queries
  • 9. © Hortonworks Inc. 2011 Use Case 2: HBase as Data Source Page 9 Architecting the Future of Big Data From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010 HDFS Tables SELECT … JOIN …! GROUP BY … ! HBase Query Result
  • 10. © Hortonworks Inc. 2011 Use Case 3: Low Latency Warehouse Page 10 Architecting the Future of Big Data From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010 HDFS Tables HBase Continuous Updates HIVE QUERIES! Periodic Dump
  • 11. © Hortonworks Inc. 2011 Hive + HBase Example (HBase table) hbase(main):001:0> create 'short_urls', {NAME => 'u'}, {NAME=>'s'} hbase(main):014:0> scan 'short_urls' ROW COLUMN+CELL bit.ly/aaaa column=s:hits, value=100 bit.ly/aaaa column=u:url, value=hbase.apache.org/ bit.ly/abcd column=s:hits, value=123 bit.ly/abcd column=u:url, value=example.com/foo Page 11 Architecting the Future of Big Data
  • 12. © Hortonworks Inc. 2011 Hive + HBase Example (Hive table) CREATE TABLE short_urls( short_url string, url string, hit_count int ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, u:url, s:hits") TBLPROPERTIES ("hbase.table.name" = ”short_urls"); Page 12 Architecting the Future of Big Data
  • 13. © Hortonworks Inc. 2011 Storage Handler • Hive defines HiveStorageHandler class for different storage backends: HBase/ Cassandra / MongoDB/ etc • Storage Handler has hooks for –  Getting input / output formats –  Meta data operations hook: CREATE TABLE, DROP TABLE, etc • Storage Handler is a table level concept –  Does not support Hive partitions, and buckets Page 13 Architecting the Future of Big Data
  • 14. © Hortonworks Inc. 2011 Apache Hive + HBase Architecture Page 14 Architecting the Future of Big Data Metastore RDBMS Hive Thrift Server Driver CLI Hive Web Interface HDFS MapReduce Execution Parser Planner Optimizer M S C l i e n t HBase StorageHandler
  • 15. © Hortonworks Inc. 2011 Hive + HBase Integration • For Input/OutputFormat, getSplits(), etc underlying HBase classes are used • Column selection and certain filters can be pushed down • HBase tables can be used with other(Hadoop native) tables and SQL constructs • Hive DDL operations are converted to HBase DDL operations via the client hook. – All operations are performed by the client – No two phase commit Page 15 Architecting the Future of Big Data
  • 16. © Hortonworks Inc. 2011 Schema / Type Mapping Architecting the Future of Big Data Page 16
  • 17. © Hortonworks Inc. 2011 Schema Mapping • Hive table + columns + column types <=> HBase table + column families (+ column qualifiers) • Every field in Hive table is mapped to either – The table key (using :key as selector) – A column family (cf:) -> MAP fields in Hive – A column (cf:cq) •  Hive table does not need to include all columns in Hbase Page 17 Architecting the Future of Big Data
  • 18. © Hortonworks Inc. 2011 Schema Mapping - Example Page 18 Architecting the Future of Big Data CREATE TABLE short_urls( short_url string, url string, hit_count int, props, map<string,string> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, u:url, s:hits, p:")
  • 19. © Hortonworks Inc. 2011 Schema Mapping - Example Page 19 Architecting the Future of Big Data CREATE TABLE short_urls( short_url string, url string, hit_count int, props map<string,string> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, u:url, s:hits, p:")
  • 20. © Hortonworks Inc. 2011 Type Mapping • Added in Hive (0.9.0) • Previously all types were being converted to strings in HBase • Hive has: – Primitive types: INT, STRING, BINARY, DOUBLE etc – ARRAY<Type> – MAP<PrimitiveType, Type> – STRUCT<a:INT, b:STRING, c:STRING> • HBase does not have types – Bytes.toBytes() Page 20 Architecting the Future of Big Data
  • 21. © Hortonworks Inc. 2011 Type Mapping • Table level property "hbase.table.default.storage.type” = “binary” • Type mapping can be given per column after # – Any prefix of “binary” , eg u:url#b – Any prefix of “string” , eg u:url#s – The dash char “-” , eg u:url#- Page 21
  • 22. © Hortonworks Inc. 2011 Type Mapping - Example Page 22 Architecting the Future of Big Data CREATE TABLE short_urls( short_url string, url string, hit_count int, props, map<string,string> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key#b,u:url#b,s:hits#b,p:#s")
  • 23. © Hortonworks Inc. 2011 Type Mapping • If the type is not a primitive or Map, it is converted to a JSON string and serialized • Still a few rough edges for schema and type mapping: – No support for DECIMAL, BINARY Hive types – No mapping of HBase timestamp (can only provide put timestamp) – No arbitrary mapping of Structs / Arrays into HBase schema Page 23 Architecting the Future of Big Data
  • 24. © Hortonworks Inc. 2011 Bulk Load • Steps to bulk load: – Sample source data for range partitioning – Save sampling results to a file – Run CLUSTER BY query using HiveHFileOutputFormat and TotalOrderPartitioner – Import Hfiles into HBase table • Ideal setup should be SET hive.hbase.bulk=true INSERT OVERWRITE TABLE web_table SELECT …. Page 24 Architecting the Future of Big Data
  • 25. © Hortonworks Inc. 2011 Filter Pushdown Architecting the Future of Big Data Page 25
  • 26. © Hortonworks Inc. 2011 Filter Pushdown • Idea is to pass down filter expressions to the storage layer to minimize scanned data • To access indexes at hdfs or hbase • Example: CREATE EXTERNAL TABLE users (userid LONG, email STRING, … ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,…") SELECT ... FROM users WHERE userid > 1000000 and email LIKE ‘%@gmail.com’; -> scan.setStartRow(Bytes.toBytes(1000000)) Page 26 Architecting the Future of Big Data
  • 27. © Hortonworks Inc. 2011 Filter Decomposition • Optimizer pushes down the predicates to the query plan • Storage handlers can negotiate with the Hive optimizer to decompose the filter x > 3 AND upper(y) = 'XYZ’ • Handle x > 3, send upper(y) = ’XYZ’ as residual for Hive • Works with: key = 3, key > 3, etc key > 3 AND key < 100 • Only works against constant expressions Page 27 Architecting the Future of Big Data
  • 28. © Hortonworks Inc. 2011 Future of Hive + HBase • Improve on schema / type mapping • Fully secure Hive deployment options • HBase bulk import improvements • Filter pushdown: non key column filters • Sortable signed numeric types in HBase • Use HBase’s new typing API’s (upcoming in HBase) • Integration with Phoenix / extract common modules, hbase- sql ? Page 28 Architecting the Future of Big Data
  • 29. © Hortonworks Inc. 2011 References • Type mapping / Filter Pushdown – https://issues.apache.org/jira/browse/HIVE-1634 – https://issues.apache.org/jira/browse/HIVE-1226 – https://issues.apache.org/jira/browse/HIVE-1643 – https://issues.apache.org/jira/browse/HIVE-2815 Page 29 Architecting the Future of Big Data
  • 30. © Hortonworks Inc. 2011 Thanks Questions? Architecting the Future of Big Data Page 30