Hadoop Summit 2012 | HBase Consistency and Performance Improvements

June 13, 2012

HBase Consistency and
Performance Improvements
Esteban Gutierrez, Gregory Chanan
{esteban, gchanan}@cloudera.com

Who We Are

• Esteban Gutierrez
– Customer Operations Engineer
- Focused on HBase operations
• Gregory Chanan
– HBase developer
– Currently focused on wire compatibility

2
©2012 Cloudera, Inc. All Rights Reserved.

Apache HBase

Apache HBase is a
distributed, scalable
column-oriented data
store that runs on top
of HDFS. It provides
consistent, low
latency, random
read/write access.

3

HBase Data Format

RowKey header:from header:subject body:text

greg_email1 sister@gmail.com Father’s day card <…>

greg_email2 friend@gmail.com Taco night <…>

4

HBase Data Format

Column names are family:qualifier



5

HBase Data Format

Column names are family:qualifier



Column Families are a set of related columns
that are physically stored together on disk

6

HBase Write Path

HBase
Put
Client
HBase Server

7

HBase Write Path

HBase
Put
Client
HBase Server
HLog
1. Write to HLog for disaster recovery Put

8

HBase Write Path

HBase
Put
Client
HBase Server
HLog

MemStore
2. Write to MemStore (in memory map)
Put

9

HBase Write Path

HBase
Put
Client
HBase Server
HLog

MemStore
Put

10

HBase Write Path

HBase
Put
Client
Put HBase Server
HLog
Put

MemStore

Put

11

HBase Write Path

HBase
Put
Client
Put HBase Server
HLog
1. Write to HLog for disaster recovery Put Put

MemStore

Put

12

HBase Write Path

HBase
Put
Client
Put HBase Server
HLog

MemStore MemStore
Put Put

13

HBase Write Path

HBase
Put
Client
Put HBase Server
HLog

MemStore MemStore
Put Put

HFile
3. Flush MemStore to disk as HFile
Put

14

HBase Write Path - Compactions

As we write and flush, we eventually get a
lot of HFiles
HFile

HFile

HFile

15

HBase Write Path - Compactions

As we write and flush, we eventually get a lot of
HFiles…

HFile

HFile
HFile

HFile

Merge these together in a ―compaction‖

16

HBase ACID

• HBase 0.90 guarantees ACID transactions
within a single row, ―with caveats‖
• HBase 0.92 guarantees ACID compliance
within a single row

17

What are ACID Transactions?

• Atomicity
– All parts of transaction complete or none
complete
• Consistency
– Only valid data written to database
• Isolation
– Parallel transactions do not impact each other’s
execution
• Durability
– Once transaction committed, it remains

18

HBase ACID in 0.92

• ―Any row returned by [a] scan will be a
consistent view (i.e. that version of the
complete row existed at some point in
time)‖[1]

[1] http://hbase.apache.org/acid-semantics.html

19

Histories from the Trenches

We have seen…

• Atomic Bulk Uploads
• Read ACID Compliance

20

Atomic Bulk Upload

• A common pattern of use in HBase is to
upload data as fast as possible from
external sources
• HRegion.bulkLoadHFile() makes that
possible

21

Atomic Bulk Upload

• Unfortunately importing Multiple
Column Family HFiles is not an
atomic operation

22

Atomic Bulk Upload

• Unfortunately importing Multiple
Column Family HFiles was not an
atomic operation

23

Atomic Bulk Upload

Row 1 HRegion.bulkLoadHFile() ≤ HBase 0.90.5

HFile1: HFile2: HFile3: HFile4:
header:to meta:labels body:text attach:file

T1 sister@...

T2 sister@... family
Scan

T3 sister@... family Hi…

T4 sister@... family Hi… image/jpeg

24

Atomic Bulk Upload

Workarounds
• Implement application level validation of
the imported data

25

Atomic Bulk Upload

Row 1 HRegion.bulkLoadHFiles() ≥ HBase 0.92


T1

T2
Scan

T3

T4

26

Atomic Bulk Upload

Row 1 HRegion.bulkLoadHFiles() ≥ HBase 0.92


T1

T2
Scan

T3

T4 sister@... family Hi… image/jpeg …

27

Read ACID Compliance

Issue
• Some records missing
• Results are used to update an user facing
application
• Customer is not happy
— ―Where is my data?”

28


Symptoms

Run 1
… … …
SPLIT_RAW_FILES …
Map-Reduce Framework
Map output records 500,000

29


Symptoms

Run 1 Run 2
… … … …
SPLIT_RAW_FILES … …
Map output records 500,000 499,997

30


Symptoms

Run 1 Run 2 Run 3
… … … … …
SPLIT_RAW_FILES … … …
Map output records 500,000 499,997 500,001

31


Symptoms

Run 1 Run 2 Run 3
… … … … …

header:to header:from body:text
greg_email1 sister@... greg@... Hi…
greg_email2 sister@...
esteban_email3 esteban@... Good news!..

esteban_email3 brother@...

32


Symptoms
Scale testing shows between 0.5% to 2% of inconsistent results between runs

Run 1 Run 2 Run 3
… … … … …

header:to header:from body:text
greg_email1 sister@... greg@... Hi…
greg_email2 sister@...
esteban_email3 esteban@... Good news!..

esteban_email3 brother@...

33


• Seen only twice by Cloudera
Support
• Hard to detect if application
level monitoring is not
implemented

34


Workarounds
• Re-try scan if not all CFs are present
• Or use a single CF
• Re-submit job if any inconsistency is found

35


Long-Term Solution
• Sometimes workarounds not possible --
SLAs!
• Upgrade to 0.92+

36

MVCC

• HBase maintains ACID semantics using
Multiversion Concurrency Control
• Instead of overwriting state, create a new
version of object with timestamp
memStoreTs RowKey fam1:col1 fam2:col2
t1 row1 val1 val1

37

Multi Version Concurrency Control

• HBase maintains ACID semantics using
Multiversion Concurrency Control
• Instead of overwriting state, create a new
version of object with timestamp (―memStoreTs‖)
memstoreTs RowKey fam1:col1 fam2:col2
t2 row1 val2 val2
t1 row1 val1 val1

• Reads never have to block
• ―memStoreTs‖ is not externally visible! Different
from external timestamp

38

Review: HBase Write Path

HBase
Put
Client
Put HBase Server
HLog
1. Write to Hlog for disaster recovery Put Put

MemStore MemStore
Put Put

HFile
3. Flush MemStore to disk as HFile
Put

39

Putting it together

Let’s go back to the beginning…

MemStore
memstoreTs RowKey hdr:from body:text

t1 greg_email wife pick up kids

40

Putting it together


MemStore


And start a scan.

41

Putting it together


MemStore

t2 greg_email coworker bug report
And start a scan.
And concurrently put.

42

Putting it together


MemStore

t2 greg_email coworker bug report
And start a scan. HFile
And concurrently put. RowKey body:text
Which causes a flush. greg_email bug report
greg_email pick up kids

43

Putting it together
Now, scan needs to make sense of this…
MemStore
memstoreTs RowKey hdr:from

t2 greg_email coworker
t1 greg_email wife

HFile
RowKey body:text
greg_email bug report

44

Putting it together
MemStore

t1 greg_email wife

HFile
RowKey body:text
greg_email bug report
But HFile has no timestamp!

45

Putting it together
MemStore

t1 greg_email wife

HFile Inconsistent Result
RowKey body:text RowKey hdr:from body:text
greg_email bug report greg_email wife wife bug report
bug report

46

Putting it together
MemStore

t1 greg_email wife

HFile Inconsistent Result
RowKey body:text RowKey hdr:from body:text
greg_email bug report greg_email wife wife bug report
bug report

47

Solution
Store the timestamp in the Hfile
MemStore HFile
memstoreTs RowKey hdr:from memStoreTs RowKey body:text

t2 greg_email bug report
t1 greg_email wife t1 greg_email pick up kids

Correct Result
RowKey hdr:from body:text
greg_email val1 wife val1 up kids
pick

Now we have all the information we need

48

Consistency

• Only some of the consistency issues in 0.90
– e.g. HBASE-5121: MajorCompaction may
affect scan's correctness
• Solution: Upgrade to 0.92/0.94

49

Consistency to Performance

• Initial community focus on correctness and
consistency
• HBase adoption growing
– Number of customers
– Size of deployment
• Newer focus on performance

50

Performance

• Initial community focus on correctness and
consistency
• HBase adoption growing
– Number of customers
– Size of deployment
• Newer focus on performance
– 0.94 dubbed the ―performance release‖

51

Performance Areas for Improvement

• Read Path
• Compactions
• Write Path
• HDFS level

52


• Read Path
– Support checksums in HFile format (HBASE-5047)
• Compactions
– Delete out of TTL store files before compactions
(HBASE-5199)
• Write Path
– HLog Compression (HBASE-4608)
• HDFS level
– Works with hadoop 2.0
– See HBase and HDFS: Past, Present and Future
• And much more!

53


• Read Path
• Compactions
(HBASE-5199)
• Write Path
• HDFS level
• And much more!

54

Read Path Performance: Checksums
• HDFS stores checksum in separate file
HFile Checksum

• So each file read actually requires two disk iops
• HBase often bottlenecked by random disk iops

55

Read Path Performance: Checksums
• Solution: Store checksum in HFile block
• Turn off HDFS-level checksum
HFile HFile Block
Chksum

Data

• On by default (―hbase.regionserver.checksum.verify‖)
• Bytes per checksum (―hbase.hstore.bytes.per.checksum‖) –
default is 16K

56


• Read Path
• Compactions
(HBASE-5199)
• Write Path
• HDFS level
• And much more!

57

Compaction Performance
• Recall: Compactions

• User can specify TTL per column family

58

Compaction Performance
• Recall: Compactions

• User can specify TTL per column family
• If all values in the HFile expired, delete rather than
compact

59


• Read Path
• Compactions
(HBASE-5199)
• Write Path
• HDFS level
• And much more!

60

HBase Performance Comparison

Test Setup:
• Compare CDH4 to CDH3u4
• 5 node cluster running Yahoo Cloud Serving
Benchmark (YCSB)
• 5 million records
• Two distributions of operations:
– 100% write
– 50% read, 50% write

61

HBase Performance Results

• 100% write workload:
– 49% throughput improvement
– 28% latency improvement
• 50% write, 50% read workload:
– 14% throughput improvement
– 14% latency improvement

62

HBase Performance Conclusion

• Caveat: Need to run performance tests on your
workload
• But compelling to upgrade to HBase to 0.92/0.94
and hadoop 2.0

63

Conclusion

• Many consistency improvements in 0.92 /
CDH4
• Performance improvements in 0.94
• 0.94 is wire compatible with 0.92, so will
be in a CDH4 update

64

References
• HBase Acid Semantics, http://hbase.apache.org/acid-
semantics.html
• Apache HBase Meetup @ SU; Michael Stack.
http://files.meetup.com/1350427/20120327hbase_meetu
p.pdf
• HBase Internals; Lars Hofhansl.
http://www.cloudera.com/resource/hbasecon-2012-
learning-hbase-internals/
• Hbase and HDFS: Past, Present, and Future; Todd
Lipcon http://www.cloudera.com/resource/hbasecon-
2012-hbase-and-hdfs-past-present-future/

65

Questions?

Thanks for listening!

66

Hadoop Summit 2012 | HBase Consistency and Performance Improvements

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

Destacado

Destacado (6)

Similar a Hadoop Summit 2012 | HBase Consistency and Performance Improvements

Similar a Hadoop Summit 2012 | HBase Consistency and Performance Improvements (17)

Más de Cloudera, Inc.

Más de Cloudera, Inc. (20)

Último

Último (20)

Hadoop Summit 2012 | HBase Consistency and Performance Improvements

Notas del editor