Trends in Supporting Production Apache HBase Clusters

Headline Goes Here
Speaker Name or Subhead Goes Here
DO NOT USE PUBLICLY
PRIOR TO 10/23/12
Trends in Supporting Production
Apache HBase Clusters
Jonathan Hsieh | @jmhsieh | Software Engineer at Cloudera /
HBase PMC Member
Kevin O’Dell| kevin.odell@cloudera| Systems Engineer at Cloudera
June 26, 2013

Who are we?
Jonathan Hsieh
• Cloudera:
• Software Engineer
• Apache HBase committer /
PMC
• Apache Flume founder
Kevin O’Dell
• Cloudera:
• Systems Engineer
• Apache HBase contributor
• Cloudera HBase Support Lead
2 6/26/13 Hadoop Summit / O'Dell, Hsieh

What is Apache HBase?
Apache HBase is a
reliable, column-
oriented data store that
provides consistent, low-
latency, random
read/write access.
ZK HDFS
App MR

HBase Architecture
ZK HDFS
App MR
• HBase is designed to be fault tolerant
and highly available
• It depends on other systems to be as
well.
• Replication for fault tolerance
• Serve regions from any Region server
• Failover HMasters
• ZK Quorums
• HDFS Block replication on Data Nodes

From the trenches at Cloudera Customer Operations
Trends Supporting HBase

Customers in 2011-12 vs in 2012-13
0.90.x / CDH3 era
• Red Hat 5.x
• Java jvm 1.6.13
• 4-8 disk machines
• 24-48 GB RAM
• Dual 4-core HT
• CDH3
• Apache HBase 0.90
• Apache Hadoop 0.20.x
0.92.x/0.94.x / CDH4 era
• Red Hat 6.x
• Java jvm 1.6.31
• 12-15 disk machines
• 48-96 GB RAM
• Dual 6-core HT
• CDH4
• Apache HBase 0.92/0.94
• Apache Hadoop 2.0

Support Incidents 6/2011-6/2012
• Patched Bug
• Patched delivered, or
• Fixed in next version
• Operational Workaround
• Misconfiguration
• Schema design / tuning
• hbck used to fix
• Network/HW/OS
• Problems with underlying
systems.
7
Patched
12%
Workaround
(hbck)
28%
Workaround
(config)
44%
Net/HW/OS
16%
6/11-6/12 - CDH3 / 0.90.x HBase
Support Tickets
6/26/13 Hadoop Summit / O'Dell, Hsieh

Comparing 6/11-6/12 to 6/12-6/13
8
Patched
12%
Workaround
(hbck)
28%
Workaround
(config)
44%
Net/HW/OS
16%
6/11-6/12 - CDH3 / 0.90.x HBase
Support Tickets
Patched
14%
Workaround
(config/hbck)
36%
Net/HW/OS
42%
Documentation
8%
6/12-6/13 - CDH3+CDH4 HBase
Support Tickets
Much smaller!
Merged
config/hbck
New
category
This is
bigger!

Comparing 2011 to 2012
• Majority customers
upgraded to CDH4.
• More customers, but similar
volume of support incidents
• Shrunk the CDH3’s largest
trouble spots significantly.
• Larger number of issues
due to underlying systems.
• This is actually a good thing!
9
Patched
14%
Workaround
(config/hbck)
36%
Net/HW/OS
42%
Documentation
8%
Support Tickets

Operation’s pain points from 6/12 – 6/13
• Hardware (Net/OS/HW)
• Upgrade (0.90 -> 0.92)
• HBase configuration

Hardware / Network / Operating System
• Leap second
• Transparent Huge pages
• Bad 10GB Ethernet Firmware
12
Bug
14%
Workaround
(config/hbck)
36%
Net/HW/OS
42%
Documentation
8%
Support Tickets

Cloudera Manager (CM) system host checker

Upgrade Issues
• Old .edits (HBASE-6440)
• 0-length HLogs (HBASE-6443)
• Bad region refs (HBASE-7199)
• Invalid HFile (Heisenbug)
14
Bug
14%
Workaround
(config/hbck)
36%
Net/HW/OS
42%
Documentation
8%
Support Tickets

Upgrade Assistance
• Parcels
• simplified distribution
• flexibility of install location
• side by side installs for rolling upgrades
• Rolling upgrades via CM
• hot fixes
• minor version upgrades
• Automated tests for upgrades and compatibility

Configuration / Feature
• Continuous Bulk Load
• Avoid and Use Puts
• Region tuning
• Updated defaults + CM
• GC tuning
• Updated defaults + CM
• Balancer
• Manual / custom tools
• Bad Schema
• Trial and Error
16
Bug
14%
Workaround
(config/hbck)
36%
Net/HW/OS
42%
Documentation
8%
Support Tickets

CM helps
• Sanity checks on configurations
• Wizard based installation and setup
• Wizard based rolling upgrades (minor versions)
• Wizard based backup and disaster recovery strategies

Configuration Management

Support improvement wishlist
• Improved “Ergonomics”
• Better default configuration and guard rails
• “I’m sorry Dave, I can’t let you do that”
• Improved error messaging
• Suggest likely root causes in logs
• Improve log signal-to-noise ratio
• More improved ops tooling and frameworks for app development
6/26/13 Hadoop Summit / O'Dell, Hsieh19

Good news
• All bug fixes go into the Apache versions before CDH
• HBase is maturing
• Higher percentage of incidents by underlying OS/HW/NW
• More performance and tuning oriented questions
• Similar percentage of incidents caused by bugs
• We’re getting better
• Lower percentage of incidents managed with workarounds
• More tools in place to help operational support
• Hbck, CM, defaults
• We can still do better!

Getting rid of workarounds
Trends Developing HBase

Developer Community
• Vibrant, Highly
Active community!
• We’re Growing!

Upstream Development Improvements for 0.95+
• Improving Usability
• Improving Reliability
• Improving Predictability
Patched
14%
Workaround
(config/hbck)
36%
Net/HW/OS
42%
Documentation
8%
Support Tickets

Improving Usability
Metrics and Frameworks

Usability Concerns
• Administering HBase has been too hard.
• Difficult to see what is happening in HBase
• Easy to make bad design decisions early without realizing
• New Developments
• Metrics Revamp
• HTrace
• Frameworks for Schema design

Metrics Options
Cloudera Manager OpenTSDB
26
Ganglia
Ganglia Image From:http://www.flickr.com/photos/hongiiv/

HTrace
• Problem:
• Where is time being spent inside HBase?
• Solution: HTrace Framework
• Inspired by Google Dapper
• Threaded through HBase and HDFS
• Tracks time spent in calls in a distributed system by tracking spans*
on different machines.
*Some assembly still required.

HBase Schemas
• HBase Application developers must iterate to find a suitable HBase
schema
• Schema critical for Performance at Scale
• How can we make this easier?
• How can we reduce the expertise required to do this?
• Today:
• Lots of tuning knobs
• Developers need to understand Column Families, Rowkey design, Data
encoding, …
• Some are expensive to change after the fact

Row key design techniques
• Numeric Keys and lexicographic sort
• Store numbers big-endian.
• Pad ASCII numbers with 0’s.
• Use reversal to have most significant traits first.
• Reverse URL.
• Reverse timestamp to get most recent first.
• (MAX_LONG - ts) so “time” gets monotonically smaller.
• Use composite keys to make key distribute nicely and work
well with sub-scans
• Ex: User-ReverseTimeStamp
• Do not use current timestamp as first part of row key!
29
Row100
Row3
Row 31
Row003
Row031
Row100
vs.
blog.cloudera.com
hbase.apache.org
strataconf.com
vs.
com.cloudera.blog
com.strataconf
org.apache.hbase

Row key design techniques
• Numeric Keys and lexicographic sort
• Store numbers big-endian.
• Pad ASCII numbers with 0’s.
• Use reversal to have most significant traits first.
• Reverse URL.
• Reverse timestamp to get most recent first.
• (MAX_LONG - ts) so “time” gets monotonically smaller.
• Use composite keys to make key distribute nicely and work
well with sub-scans
• Ex: User-ReverseTimeStamp
• Do not use current timestamp as first part of row key!
30
Row100
Row3
Row 31
Row003
Row031
Row100
vs.
blog.cloudera.com
hbase.apache.org
strataconf.com
vs.
com.cloudera.blog
com.strataconf
org.apache.hbase

Reliable
Reliable / Highly Available
• Reliable:
• Ability to recover service if a
component fails, without losing data.
• Highly Available:
• Ability to quickly recover service if a
• Goal: Minimize downtime!
Highly Available

Mean Time To Recovery (MTTR)
• Average time taken to automatically recover from a failure.
• Detection time
• Repair Time
• Notification Time
• Measure: HTrace (Dapper) Infrastructure (0.96+)
Detect Repair Notify
time

Reduce Detection Time
• Proactive notification of HMaster failure (0.95)
• Proactive notification of RS failure (0.95)
• Fast server failover (Hardware)
Detect Notify
time
Repair

Reduce Detection Time
• Proactive notification of HMaster failure (0.95)
• Proactive notification of RS failure (0.95)
• Fast server failover (Hardware)
Repair Notify
time
Detect

Reduce Recovery Time
• Distributed Log Splitting (0.92)
• Distributed Log Replay (0.95)
• Fast Write recovery (0.95)
• Pristine Read recovery (0.96+)
Notify
time
Detect Repair

Reduce Recovery Time
• Distributed Log Splitting (0.92)
• Distributed Log Replay (0.95)
• Fast Write recovery (0.95)
• Pristine Read recovery (0.96+)
Repair Notify
time
Detect

Reduce Notification Time
• Notify client on recovery
• Async Client rewrite (0.96+)
Notify
time
Detect Repair

Reduce Notification Time
• Notify client on recovery
• Async Client rewrite (0.96+)
Repair Notify
time
Detect

Compactions
Improving Predictability

Reliable
Reliable / Highly Available
• Reliable:
• Ability to recover service if a component
fails, without losing data.
• Goal: Minimize downtime!
Highly Available

Reliable
Reliable / Highly Available / Latency Tolerant
• Reliable:
• Ability to recover service if a component
fails, without losing data.
• Latency Tolerant
• Ability to perform and recover in a
predictable amount of time, without
losing data
• New Goal: Predictable performance
Highly Available
42
Latency
Tolerant

Common causes of performance variability
• Compaction
• Garbage Collection
• Locality Loss

Compaction
• Compactions optimizing read layout by rewriting files
• Reduce the seeks required to read a row
• Improve random read performance
• Age off expired or deleted data
• Assumes uniformly distributed write workload
• But we have new workloads:
• Continuous Bulk load write pattern
• Time-series write pattern

Compactions: Put workload
• Minor compactions
• Optimizes a sub set of adjacent
files
• Major Compactions
• Optimizes all files
• Choosing:
• Assume: older files should be
larger than newer files.
• “New” files are “larger” than
“older” files? major compaction
• Else, look at newer files and
select files for a minor
compaction
Newly flushed HFiles
Minor
…
…
Minor
MajorMinor

Compactions: Bulkload workload
• Functionality for loading data en
masse
• Intended for Bootstrapping
HBase tables
• New write workload:
frequently ingest data only via
bulk load
• Problem:
• Breaks age/size assumption!
• Major Compaction Storms!
• Compactions unnecessarily
rewrite large files.
Newly bulk loaded HFiles
Major
MajorMajor

Bulkload: Exploring Compactor
• Explore all compaction
possibilities
• Choose minor compactions
that reduces # of files while
incurring least IO.
• “the best bang of the buck”
• Compaction workload is
more manageable
Newly bulk loaded HFiles
Explore
Minor
Minor

Comparing 6/11-6/12 to 6/12-6/13
49
Patched
12%
Workaround
(hbck)
28%
Workaround
(config)
44%
Net/HW/OS
16%
6/11-6/12 - CDH3 / 0.90.x HBase
Support Tickets
Patched
14%
Workaround
(config/hbck)
36%
Net/HW/OS
42%
Documentation
8%
Support Tickets
Development
and tooling
efforts continue
to reduce
HBase is
becoming more
robust
Improved
testing

Summary by Version
0.90 0.92 /0.94 0.95-dev / 0.96 0.98 /trunk
•HBase Developer
Expertise
• HBase Operational
Experience
• Distributed Systems Admin
Experience
• 
•True Durability • Consistency
• Performance
• MTTR
• Protobufs
• Snapshots
• Table locks
• (Predictability)
• (File Block Affinity†)
•Distributed log
splitting*
•Distributed log splitting • Distributed log splitting
• Distributed log replay†
• Fast Write Recovery†
•Distributed log splitting
•Distributed log replay†
•Fast Write Recovery†
•(Pristine Region Read Recovery)
•Metrics • CF+Region Granularity
Metrics
• CF+Region Granularity Metrics
• Improved failure detection time
•CF +Region Granularity Metrics
•Improved failure detection time
•(Htrace)
Recovery in Hours Recovery in Minutes Recovery in Seconds (for writes) Recovery in Seconds
† experimental (in progress) *backported in CDH

Questions?
@kevinrodell
@jmhsieh

Trends in Supporting Production Apache HBase Clusters

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Trends in Supporting Production Apache HBase Clusters

Similar a Trends in Supporting Production Apache HBase Clusters (20)

Más de DataWorks Summit

Más de DataWorks Summit (20)

Último

Último (20)

Trends in Supporting Production Apache HBase Clusters

Notas del editor