SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
HBase Low Latency
Nick Dimiduk, Hortonworks (@xefyr)
Nicolas Liochon, Scaled Risk (@nkeywal)
Hadoop Summit June 4, 2014
Agenda
• Latency, what is it, how to measure it
• Write path
• Read path
• Next steps
What’s low latency
Latency is about percentiles
• Average != 50% percentile
• There are often order of magnitudes between « average » and « 95
percentile »
• Post 99% = « magical 1% ». Work in progress here.
• Meaning from micro seconds (High Frequency
Trading) to seconds (interactive queries)
• In this talk milliseconds
Measure latency
bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation
• More options related to HBase: autoflush, replicas, …
• Latency measured in micro second
• Easier for internal analysis
YCSB - Yahoo! Cloud Serving Benchmark
• Useful for comparison between databases
• Set of workload already defined
Write path
• Two parts
• Single put (WAL)
• The client just sends the put
• Multiple puts from the client (new behavior since 0.96)
• The client is much smarter
• Four stages to look at for latency
• Start (establish tcp connections, etc.)
• Steady: when expected conditions are met
• Machine failure: expected as well
• Overloaded system
Single put: communication & scheduling
• Client: TCP connection to the server
• Shared: multitheads on the same client are using the same TCP connection
• Pooling is possible and does improve the performances in some circonstances
• hbase.client.ipc.pool.size
• Server: multiple calls from multiple threads on multiple machines
• Can become thousand of simultaneous queries
• Scheduling is required
Single put: real work
• The server must
• Write into the WAL queue
• Sync the WAL queue (HDFS flush)
• Write into the memstore
• WALs queue is shared between all the regions/handlers
• Sync is avoided if another handlers did the work
• Your handler may flush more data than expected
Simple put: A small run
Percentile Time in ms
Mean 1.21
50% 0.95
95% 1.50
99% 2.12
Latency sources
• Candidate one: network
• 0.5ms within a datacenter
• Much less between nodes in the same rack
Percentile Time in ms
Mean 0.13
50% 0.12
95% 0.15
99% 0.47
Latency sources
• Candidate two: HDFS Flush
• We can still do better: HADOOP-7714 & sons.
Percentile Time in ms
Mean 0.33
50% 0.26
95% 0.59
99% 1.24
Latency sources
• Millisecond world: everything can go wrong
• JVM
• Network
• OS Scheduler
• File System
• All this goes into the post 99% percentile
• Requires monitoring
• Usually using the latest version helps
Latency sources
• Split (and presplits)
• Autosharding is great!
• Puts have to wait
• Impacts: seconds
• Balance
• Regions move
• Triggers a retry for the client
• hbase.client.pause = 100ms since HBase 0.96
• Garbage Collection
• Impacts: 10’s of ms, even with a good config
• Covered with the read path of this talk
From steady to loaded and overloaded
• Number of concurrent tasks is a factor of
• Number of cores
• Number of disks
• Number of remote machines used
• Difficult to estimate
• Queues are doomed to happen
• hbase.regionserver.handler.count
• So for low latency
• Replable scheduler since HBase 0.98 (HBASE-8884). Requires specific code.
• RPC Priorities: work in progress (HBASE-11048)
From loaded to overloaded
• MemStore takes too much room: flush, then blocksquite quickly
• hbase.regionserver.global.memstore.size.lower.limit
• hbase.regionserver.global.memstore.size
• hbase.hregion.memstore.block.multiplier
• Too many Hfiles: block until compactions keep up
• hbase.hstore.blockingStoreFiles
• Too many WALs files: Flush and block
• hbase.regionserver.maxlogs
Machine failure
• Failure
• Dectect
• Reallocate
• Replay WAL
• Replaying WAL is NOT required for puts
• hbase.master.distributed.log.replay
• (default true in 1.0)
• Failure = Dectect + Reallocate + Retry
• That’s in the range of ~1s for simple failures
• Silent failures leads puts you in the 10s range if the hardware does not help
• zookeeper.session.timeout
Single puts
• Millisecond range
• Spikes do happen in steady mode
• 100ms
• Causes: GC, load, splits
Streaming puts
Htable#setAutoFlushTo(false)
Htable#put
Htable#flushCommit
• As simple puts, but
• Puts are grouped and send in background
• Load is taken into account
• Does not block
Multiple puts
hbase.client.max.total.tasks (default 100)
hbase.client.max.perserver.tasks (default 5)
hbase.client.max.perregion.tasks (default 1)
• Decouple the client from a latency spike of a region server
• Increase the throughput by 50% compared to old multiput
• Makes split and GC more transparent
Conclusion on write path
• Single puts can be very fast
• It’s not a « hard real time » system: there are spikes
• Most latency spikes can be hidden when streaming puts
• Failure are NOT that difficult for the write path
• No WAL to replay
And now for the read path
Read path
• Get/short scan are assumed for low-latency operations
• Again, two APIs
• Single get: HTable#get(Get)
• Multi-get: HTable#get(List<Get>)
• Four stages, same as write path
• Start (tcp connection, …)
• Steady: when expected conditions are met
• Machine failure: expected as well
• Overloaded system: you may need to add machines or tune your workload
Multi get / Client
Group Gets by
RegionServer
Execute them
one by one
Multi get / Server
Multi get / Server
Access latency magnidesStorage hierarchy: a different view
Dean/2009
Memory is 100000x
faster than disk!
Disk seek = 10ms
Known unknowns
• For each candidate HFile
• Exclude by file metadata
• Timestamp
• Rowkey range
• Exclude by bloom filter
StoreFileScanner#
shouldUseScanner()
Unknown knowns
• Merge sort results polled from Stores
• Seek each scanner to a reference KeyValue
• Retrieve candidate data from disk
• Multiple HFiles => mulitple seeks
• hbase.storescanner.parallel.seek.enable=true
• Short Circuit Reads
• dfs.client.read.shortcircuit=true
• Block locality
• Happy clusters compact!
HFileBlock#
readBlockData()
BlockCache
• Reuse previously read data
• Maximize cache hit rate
• Larger cache
• Temporal access locality
• Physical access locality
BlockCache#getBlock()
BlockCache Showdown
• LruBlockCache
• Default, onheap
• Quite good most of the time
• Evictions impact GC
• BucketCache
• Offheap alternative
• Serialization overhead
• Large memory configurations
http://www.n10k.com/blog/block
cache-showdown/
L2 off-heap BucketCache
makes a strong showing
Latency enemies: Garbage Collection
• Use heap. Not too much. With CMS.
• Max heap
• 30GB (compressed pointers)
• 8-16GB if you care about 9’s
• Healthy cluster load
• regular, reliable collections
• 25-100ms pause on regular interval
• Overloaded RegionServer suffers GC overmuch
Off-heap to the rescue?
• BucketCache (0.96, HBASE-7404)
• Network interfaces (HBASE-9535)
• MemStore et al (HBASE-10191)
Latency enemies: Compactions
• Fewer HFiles => fewer seeks
• Evict data blocks!
• Evict Index blocks!!
• hfile.block.index.cacheonwrite
• Evict bloom blocks!!!
• hfile.block.bloom.cacheonwrite
• OS buffer cache to the rescue
• Compactected data is still fresh
• Better than going all the way back to disk
Failure
• Detect + Reassign + Replay
• Strong consistency requires replay
• Locality drops to 0
• Cache starts from scratch
Hedging our bets
• HDFS Hedged reads (2.4, HDFS-5776)
• Reads on secondary DataNodes
• Strongly consistent
• Works at the HDFS level
• Timeline consistency (HBASE-10070)
• Reads on « Replica Region »
• Not strongly consistent
Read latency in summary
• Steady mode
• Cache hit: < 1 ms
• Cache miss: + 10 ms per seek
• Writing while reading => cache churn
• GC: 25-100ms pause on regular interval
Network request + (1 - P(cache hit)) * (10 ms * seeks)
• Same long tail issues as write
• Overloaded: same scheduling issues as write
• Partial failures hurt a lot
HBase ranges for 99% latency
Put
Streamed
Multiput Get Timeline get
Steady milliseconds milliseconds milliseconds milliseconds
Failure seconds seconds seconds milliseconds
GC
10’s of
milliseconds milliseconds
10’s of
milliseconds milliseconds
What’s next
• Less GC
• Use less objects
• Offheap
• Compressed BlockCache (HBASE-8894)
• Prefered location (HBASE-4755)
• The « magical 1% »
• Most tools stops at the 99% latency
• What happens after is much more complex
Thanks!
Nick Dimiduk, Hortonworks (@xefyr)
Nicolas Liochon, Scaled Risk (@nkeywal)
Hadoop Summit June 4, 2014

Más contenido relacionado

La actualidad más candente

Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsDataWorks Summit
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopHortonworks
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureVARUN SAXENA
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultDataWorks Summit
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 

La actualidad más candente (20)

Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objects
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
Fast analytics kudu to druid
Fast analytics  kudu to druidFast analytics  kudu to druid
Fast analytics kudu to druid
 

Destacado

Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low LatencyNick Dimiduk
 
Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy Carol
Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy CarolScaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy Carol
Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy CarolHakka Labs
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBaseCon
 
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)tatsuya6502
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at XiaomiHBaseCon
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBaseCon
 
Нейронечёткая классификация слабо формализуемых данных | Тимур Гильмуллин
Нейронечёткая классификация слабо формализуемых данных | Тимур ГильмуллинНейронечёткая классификация слабо формализуемых данных | Тимур Гильмуллин
Нейронечёткая классификация слабо формализуемых данных | Тимур ГильмуллинPositive Hack Days
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Amazon Web Services
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101Nick Dimiduk
 

Destacado (14)

Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
 
Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy Carol
Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy CarolScaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy Carol
Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy Carol
 
Scheduling Policies in YARN
Scheduling Policies in YARNScheduling Policies in YARN
Scheduling Policies in YARN
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at Xiaomi
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
 
Нейронечёткая классификация слабо формализуемых данных | Тимур Гильмуллин
Нейронечёткая классификация слабо формализуемых данных | Тимур ГильмуллинНейронечёткая классификация слабо формализуемых данных | Тимур Гильмуллин
Нейронечёткая классификация слабо формализуемых данных | Тимур Гильмуллин
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101
 

Similar a HBase Low Latency

HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014Nick Dimiduk
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduseScott Miao
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015polo li
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014Modern Data Stack France
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseCloudera, Inc.
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Bob Pusateri
 
Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)jmhsieh
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Alexey Rybak
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveGluster.org
 
Caching: A Guided Tour - 10/12/2010
Caching: A Guided Tour - 10/12/2010Caching: A Guided Tour - 10/12/2010
Caching: A Guided Tour - 10/12/2010Jason Ragsdale
 
Intro to big data choco devday - 23-01-2014
Intro to big data   choco devday - 23-01-2014Intro to big data   choco devday - 23-01-2014
Intro to big data choco devday - 23-01-2014Hassan Islamov
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Cosmin Lehene
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleYifeng Jiang
 

Similar a HBase Low Latency (20)

HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep Dive
 
Caching: A Guided Tour - 10/12/2010
Caching: A Guided Tour - 10/12/2010Caching: A Guided Tour - 10/12/2010
Caching: A Guided Tour - 10/12/2010
 
Intro to big data choco devday - 23-01-2014
Intro to big data   choco devday - 23-01-2014Intro to big data   choco devday - 23-01-2014
Intro to big data choco devday - 23-01-2014
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 

Último

IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...DianaGray10
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxKaustubhBhavsar6
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechProduct School
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2DianaGray10
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsDianaGray10
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarThousandEyes
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 

Último (20)

IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? Webinar
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 

HBase Low Latency

  • 1. HBase Low Latency Nick Dimiduk, Hortonworks (@xefyr) Nicolas Liochon, Scaled Risk (@nkeywal) Hadoop Summit June 4, 2014
  • 2. Agenda • Latency, what is it, how to measure it • Write path • Read path • Next steps
  • 3. What’s low latency Latency is about percentiles • Average != 50% percentile • There are often order of magnitudes between « average » and « 95 percentile » • Post 99% = « magical 1% ». Work in progress here. • Meaning from micro seconds (High Frequency Trading) to seconds (interactive queries) • In this talk milliseconds
  • 4. Measure latency bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation • More options related to HBase: autoflush, replicas, … • Latency measured in micro second • Easier for internal analysis YCSB - Yahoo! Cloud Serving Benchmark • Useful for comparison between databases • Set of workload already defined
  • 5. Write path • Two parts • Single put (WAL) • The client just sends the put • Multiple puts from the client (new behavior since 0.96) • The client is much smarter • Four stages to look at for latency • Start (establish tcp connections, etc.) • Steady: when expected conditions are met • Machine failure: expected as well • Overloaded system
  • 6. Single put: communication & scheduling • Client: TCP connection to the server • Shared: multitheads on the same client are using the same TCP connection • Pooling is possible and does improve the performances in some circonstances • hbase.client.ipc.pool.size • Server: multiple calls from multiple threads on multiple machines • Can become thousand of simultaneous queries • Scheduling is required
  • 7. Single put: real work • The server must • Write into the WAL queue • Sync the WAL queue (HDFS flush) • Write into the memstore • WALs queue is shared between all the regions/handlers • Sync is avoided if another handlers did the work • Your handler may flush more data than expected
  • 8. Simple put: A small run Percentile Time in ms Mean 1.21 50% 0.95 95% 1.50 99% 2.12
  • 9. Latency sources • Candidate one: network • 0.5ms within a datacenter • Much less between nodes in the same rack Percentile Time in ms Mean 0.13 50% 0.12 95% 0.15 99% 0.47
  • 10. Latency sources • Candidate two: HDFS Flush • We can still do better: HADOOP-7714 & sons. Percentile Time in ms Mean 0.33 50% 0.26 95% 0.59 99% 1.24
  • 11. Latency sources • Millisecond world: everything can go wrong • JVM • Network • OS Scheduler • File System • All this goes into the post 99% percentile • Requires monitoring • Usually using the latest version helps
  • 12. Latency sources • Split (and presplits) • Autosharding is great! • Puts have to wait • Impacts: seconds • Balance • Regions move • Triggers a retry for the client • hbase.client.pause = 100ms since HBase 0.96 • Garbage Collection • Impacts: 10’s of ms, even with a good config • Covered with the read path of this talk
  • 13. From steady to loaded and overloaded • Number of concurrent tasks is a factor of • Number of cores • Number of disks • Number of remote machines used • Difficult to estimate • Queues are doomed to happen • hbase.regionserver.handler.count • So for low latency • Replable scheduler since HBase 0.98 (HBASE-8884). Requires specific code. • RPC Priorities: work in progress (HBASE-11048)
  • 14. From loaded to overloaded • MemStore takes too much room: flush, then blocksquite quickly • hbase.regionserver.global.memstore.size.lower.limit • hbase.regionserver.global.memstore.size • hbase.hregion.memstore.block.multiplier • Too many Hfiles: block until compactions keep up • hbase.hstore.blockingStoreFiles • Too many WALs files: Flush and block • hbase.regionserver.maxlogs
  • 15. Machine failure • Failure • Dectect • Reallocate • Replay WAL • Replaying WAL is NOT required for puts • hbase.master.distributed.log.replay • (default true in 1.0) • Failure = Dectect + Reallocate + Retry • That’s in the range of ~1s for simple failures • Silent failures leads puts you in the 10s range if the hardware does not help • zookeeper.session.timeout
  • 16. Single puts • Millisecond range • Spikes do happen in steady mode • 100ms • Causes: GC, load, splits
  • 17. Streaming puts Htable#setAutoFlushTo(false) Htable#put Htable#flushCommit • As simple puts, but • Puts are grouped and send in background • Load is taken into account • Does not block
  • 18. Multiple puts hbase.client.max.total.tasks (default 100) hbase.client.max.perserver.tasks (default 5) hbase.client.max.perregion.tasks (default 1) • Decouple the client from a latency spike of a region server • Increase the throughput by 50% compared to old multiput • Makes split and GC more transparent
  • 19. Conclusion on write path • Single puts can be very fast • It’s not a « hard real time » system: there are spikes • Most latency spikes can be hidden when streaming puts • Failure are NOT that difficult for the write path • No WAL to replay
  • 20. And now for the read path
  • 21. Read path • Get/short scan are assumed for low-latency operations • Again, two APIs • Single get: HTable#get(Get) • Multi-get: HTable#get(List<Get>) • Four stages, same as write path • Start (tcp connection, …) • Steady: when expected conditions are met • Machine failure: expected as well • Overloaded system: you may need to add machines or tune your workload
  • 22. Multi get / Client Group Gets by RegionServer Execute them one by one
  • 23. Multi get / Server
  • 24. Multi get / Server
  • 25. Access latency magnidesStorage hierarchy: a different view Dean/2009 Memory is 100000x faster than disk! Disk seek = 10ms
  • 26. Known unknowns • For each candidate HFile • Exclude by file metadata • Timestamp • Rowkey range • Exclude by bloom filter StoreFileScanner# shouldUseScanner()
  • 27. Unknown knowns • Merge sort results polled from Stores • Seek each scanner to a reference KeyValue • Retrieve candidate data from disk • Multiple HFiles => mulitple seeks • hbase.storescanner.parallel.seek.enable=true • Short Circuit Reads • dfs.client.read.shortcircuit=true • Block locality • Happy clusters compact! HFileBlock# readBlockData()
  • 28. BlockCache • Reuse previously read data • Maximize cache hit rate • Larger cache • Temporal access locality • Physical access locality BlockCache#getBlock()
  • 29. BlockCache Showdown • LruBlockCache • Default, onheap • Quite good most of the time • Evictions impact GC • BucketCache • Offheap alternative • Serialization overhead • Large memory configurations http://www.n10k.com/blog/block cache-showdown/ L2 off-heap BucketCache makes a strong showing
  • 30. Latency enemies: Garbage Collection • Use heap. Not too much. With CMS. • Max heap • 30GB (compressed pointers) • 8-16GB if you care about 9’s • Healthy cluster load • regular, reliable collections • 25-100ms pause on regular interval • Overloaded RegionServer suffers GC overmuch
  • 31. Off-heap to the rescue? • BucketCache (0.96, HBASE-7404) • Network interfaces (HBASE-9535) • MemStore et al (HBASE-10191)
  • 32. Latency enemies: Compactions • Fewer HFiles => fewer seeks • Evict data blocks! • Evict Index blocks!! • hfile.block.index.cacheonwrite • Evict bloom blocks!!! • hfile.block.bloom.cacheonwrite • OS buffer cache to the rescue • Compactected data is still fresh • Better than going all the way back to disk
  • 33. Failure • Detect + Reassign + Replay • Strong consistency requires replay • Locality drops to 0 • Cache starts from scratch
  • 34. Hedging our bets • HDFS Hedged reads (2.4, HDFS-5776) • Reads on secondary DataNodes • Strongly consistent • Works at the HDFS level • Timeline consistency (HBASE-10070) • Reads on « Replica Region » • Not strongly consistent
  • 35. Read latency in summary • Steady mode • Cache hit: < 1 ms • Cache miss: + 10 ms per seek • Writing while reading => cache churn • GC: 25-100ms pause on regular interval Network request + (1 - P(cache hit)) * (10 ms * seeks) • Same long tail issues as write • Overloaded: same scheduling issues as write • Partial failures hurt a lot
  • 36. HBase ranges for 99% latency Put Streamed Multiput Get Timeline get Steady milliseconds milliseconds milliseconds milliseconds Failure seconds seconds seconds milliseconds GC 10’s of milliseconds milliseconds 10’s of milliseconds milliseconds
  • 37. What’s next • Less GC • Use less objects • Offheap • Compressed BlockCache (HBASE-8894) • Prefered location (HBASE-4755) • The « magical 1% » • Most tools stops at the 99% latency • What happens after is much more complex
  • 38. Thanks! Nick Dimiduk, Hortonworks (@xefyr) Nicolas Liochon, Scaled Risk (@nkeywal) Hadoop Summit June 4, 2014

Notas del editor

  1. Random read/write database, latency is very important
  2. Micro-seconds? Seconds? We are talking milliseconds Everyone stops looking after the 99% -- the literature calls this “magical 1%”
  3. How to measure latency in HBase
  4. Client connects to region server with TCP connection Connection is shared by client threads Server manages lots of client connections Schedules client queries – synchronization, locks, queues, &c
  5. WAL queue shared between regions - sometimes the sync work has already been done for you, can help - sometimes your small edit is sync’d with another larger one, can hurt you
  6. Small cluster, 4-yo machines 1 put, 1 put, 1put… 99% is double the mean Servers doing other work, but nothing major
  7. Where do we spend time? How about the network? 0.5ms is conservative, usually much less TCP round trip, same cluster as previous slide 99% 4x mean
  8. Where do we spend time? How about HDFS? Flushing, writing, &c Just HDFS, 2.4, 1kb, flush, flush, flush 99% 5x mean – not bad
  9. What else? Millisecond world means minor things start to matter: - JVM 1.7 is better at blocking queues than 1.6 - forget to configure naggle? 40ms - older linux scheduler bugs, 50ms - facebook literature talks a lot about filesystems
  10. Regular cluster operations also hurt Region split – 1’s of seconds Region move – 100ms (better on 0.96) GC – 10’s ms even after configuration tuning
  11. Now add some load Load == concurrency == queues: 1ms 0.98 adds pluggable scheduler, so you can influence these queues yourself Work in progress to expose this through configuration
  12. Contract: save you from cluster explosion How to protect? Stop writing - too many WALs? Stop writes. - too many hfiles to compact? Stop writes. Lasts indefinitely. New default settings.
  13. Puts do not require reads, so WAL replay doesn’t block writes Simple crash: quickly detected, 1s Hung/frozen: - conservative detection takes longer - configured to 5-15s, but look like long GC pause
  14. New since 0.96
  15. Settings for average cluster, unaggressive client Decoupled client from issue of a single RS YSCB, single empty region, 50% better throughput (HBASE-6295)
  16. This talk: assume get/short scan implies low-latency requirements No fancy streaming client like Puts; waiting for slowest RS.
  17. Gets grouped by RS, groups sent in parallel, block until all groups return. Network call: 10’s micros, in parallel
  18. Read path in full detail is quite complex See Lar’s HBaseCon 2012 talk for the nitty-gritty Complexity optimizing around one invariant: 10ms seek
  19. Read path in full detail is quite complex See Lar’s HBaseCon 2012 talk for the nitty-gritty Complexity optimizing around one invariant: 10ms seek
  20. Complexity optimizing around one invariant: 10ms seek Aiming for 100 microSec world; how to work around this? Goal: avoid disk at all cost!
  21. Goal: don’t go to disk unless absolutely necessary. Tactic: Candidate HFile elimination. Regular compactions => 3-5 candidates Bloom filters on by default in 0.96+
  22. Mergesort over multiple files, multiple seeks More spindles = parallel scanning SCR avoids proxy process (DataNode) But remember! Goal: don’t go to disk unless absolutely necessary.
  23. “block” is a segment of an HFile Data blocks, index blocks, and bloom blocks Read blocks retained in BlockCache Seek to same and adjacent data become cache hits
  24. HBase ships with a variety of cache implementations Happy with 95% stats? Stick with LruBlockCache and modest heapsize Pushing 99%? Lru still okay, but watch that heapsize. Spent money on RAM? BucketCache
  25. GC is a part of healthy operation BlockCache garbage is awkward size and age, which means pauses Pause time is a function of heap size More like ~16GiB if you’re really worried about 99% Overloaded: more cache evictions, time in GC
  26. Why generate garbage at all? GC are smart, but maybe we know our pain spots better? Don’t know until we try
  27. Necessary for fewer scan candidates, fewer seeks Buffer cache to the rescue “That’s steady state and overloaded; let’s talk about failure”
  28. Replaying WALs takes time Unlucky: no data locality, talking to remove DataNode Empty cache “Failure isn’t binary. What about the sick and the dying?”
  29. Don’t wait for a slow machine!
  30. Reads dominated by disk seek, so keep that data in memory After cache miss, GC is the next candidate cause of latency “Ideal formula” P(cache hit): fn (cache size :: db size, request locality) Sometimes the jitter dominates
  31. Standard deployment, well designed schema Millisecond responses, seconds for failure recovery, and GC at a regular interval Everything we’ve focused on here is impactful of the 99% Beyond that there’s a lot of interesting problems to solve
  32. There’s always more work to be done Generate less garbage Compressed BlockCache Improve recovery time and locality
  33. Questions!