SlideShare una empresa de Scribd logo
1 de 21
HBase @Twitter
@gario @ctrezzo
HBase Meetup 7/16
Agenda
● Infrastructure overview
● Example use cases
● hRaven
Infrastructure Overview
● HBase/Hadoop versions
○ HBase 0.94.x
○ Hadoop 2.0
● PROC, DW, TST, EXP
● Puppet
○ Config management
○ Packaging/Deployment (RPMs)
○ Rolling Upgrades
● Using replication for data movement between PROC
and DW
Infrastructure Overview
Major Use Cases
● Mutable data store for batch processing
● Operational Intelligence
● Monitoring/Metrics
Mutable data store for batch
processing
● Tables copied from MySQL
○ Allowing for incremental loads
● MapReduce jobs over data in HBase
● Snapshot of data copied into HDFS for processing
○ HBASE-8369 will optimize this
Operational Intelligence
● DCEvents - Audit log for changes in production
● TCC big users of python!
○ HappyBase
○ Thrift Gateway
Monitoring/Metrics
https://github.com/twitter/hRaven
● Stores stats, configuration and timing for every map
reduce job on every cluster
● Structured around the full DAG of jobs from a Pig or
Scalding application
● Easily queryable for historical trending
● Allows for Pig reducer optimization based on historical
run stats
● Keep data online forever (12.6M jobs, 4.5B tasks +
attempts)
hRaven: Why?
● cluster - each cluster has a unique name mapping to
the Job Tracker
● user - map reduce jobs are run as a given user
● application - a Pig or Scalding script (or plain map
reduce job)
● flow - the combined DAG of jobs executed from a
single run of an application
● version - changes impacting the DAG are recorded as
a new version of the same application
hRaven: Key Concepts
hRaven: Application Flows
hRaven: Application Flows
● All jobs in a flow are ordered together
hRaven: Flow Storage
● Most recent flow is ordered first
hRaven: Flow Storage
● All jobs in a flow are ordered together
● Per-job metrics stored
○ Total map and reduce tasks
○ HDFS bytes read / written
○ File bytes read / written
○ Total map and reduce slot milliseconds
● Easy to aggregate stats for an entire flow
● Easy to scan the timeseries of each application’s flows
hRaven: Key Features
● Pig reducer optimizations
● Cluster utilization / capacity planning
● Application performance trending over time
● Identifying common job anti-patterns
● Ad-hoc analysis troubleshooting cluster problems
hRaven: Current Uses
hRaven: Current Uses
hRaven: Current Uses
● HBase 0.96 on Hadoop 2.0
● Flow centric hRaven UI
● Improvements to HBase replication
Future Work
Questions?
We are Hiring!
http://twitter.com/jobs
@JoinTheFlock

Más contenido relacionado

La actualidad más candente

Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
Jack Levin
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)
baggioss
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
enissoz
 

La actualidad más candente (19)

HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance Evaluation
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on Mesos
 

Similar a HBase @ Twitter

Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant ScaleHadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale
DataWorks Summit
 
Hadoop 2 @Twitter, Elephant Scale. Presented at
Hadoop 2 @Twitter, Elephant Scale. Presented at Hadoop 2 @Twitter, Elephant Scale. Presented at
Hadoop 2 @Twitter, Elephant Scale. Presented at
lohitvijayarenu
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Modern Data Stack France
 

Similar a HBase @ Twitter (20)

Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant ScaleHadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale
 
Hadoop 2 @Twitter, Elephant Scale. Presented at
Hadoop 2 @Twitter, Elephant Scale. Presented at Hadoop 2 @Twitter, Elephant Scale. Presented at
Hadoop 2 @Twitter, Elephant Scale. Presented at
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
NetFlow Data processing using Hadoop and Vertica
NetFlow Data processing using Hadoop and VerticaNetFlow Data processing using Hadoop and Vertica
NetFlow Data processing using Hadoop and Vertica
 
LAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96BoardsLAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96Boards
 
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Glusterfs and Hadoop
Glusterfs and HadoopGlusterfs and Hadoop
Glusterfs and Hadoop
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
 
Upcoming features in Airflow 2
Upcoming features in Airflow 2Upcoming features in Airflow 2
Upcoming features in Airflow 2
 
RubiX
RubiXRubiX
RubiX
 
ApacheCon 2022_ Large scale unification of file format.pptx
ApacheCon 2022_ Large scale unification of file format.pptxApacheCon 2022_ Large scale unification of file format.pptx
ApacheCon 2022_ Large scale unification of file format.pptx
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

HBase @ Twitter

  • 2. Agenda ● Infrastructure overview ● Example use cases ● hRaven
  • 3. Infrastructure Overview ● HBase/Hadoop versions ○ HBase 0.94.x ○ Hadoop 2.0 ● PROC, DW, TST, EXP ● Puppet ○ Config management ○ Packaging/Deployment (RPMs) ○ Rolling Upgrades ● Using replication for data movement between PROC and DW
  • 5. Major Use Cases ● Mutable data store for batch processing ● Operational Intelligence ● Monitoring/Metrics
  • 6. Mutable data store for batch processing ● Tables copied from MySQL ○ Allowing for incremental loads ● MapReduce jobs over data in HBase ● Snapshot of data copied into HDFS for processing ○ HBASE-8369 will optimize this
  • 7. Operational Intelligence ● DCEvents - Audit log for changes in production ● TCC big users of python! ○ HappyBase ○ Thrift Gateway
  • 9. ● Stores stats, configuration and timing for every map reduce job on every cluster ● Structured around the full DAG of jobs from a Pig or Scalding application ● Easily queryable for historical trending ● Allows for Pig reducer optimization based on historical run stats ● Keep data online forever (12.6M jobs, 4.5B tasks + attempts) hRaven: Why?
  • 10. ● cluster - each cluster has a unique name mapping to the Job Tracker ● user - map reduce jobs are run as a given user ● application - a Pig or Scalding script (or plain map reduce job) ● flow - the combined DAG of jobs executed from a single run of an application ● version - changes impacting the DAG are recorded as a new version of the same application hRaven: Key Concepts
  • 13. ● All jobs in a flow are ordered together hRaven: Flow Storage
  • 14. ● Most recent flow is ordered first hRaven: Flow Storage
  • 15. ● All jobs in a flow are ordered together ● Per-job metrics stored ○ Total map and reduce tasks ○ HDFS bytes read / written ○ File bytes read / written ○ Total map and reduce slot milliseconds ● Easy to aggregate stats for an entire flow ● Easy to scan the timeseries of each application’s flows hRaven: Key Features
  • 16. ● Pig reducer optimizations ● Cluster utilization / capacity planning ● Application performance trending over time ● Identifying common job anti-patterns ● Ad-hoc analysis troubleshooting cluster problems hRaven: Current Uses
  • 19. ● HBase 0.96 on Hadoop 2.0 ● Flow centric hRaven UI ● Improvements to HBase replication Future Work