SlideShare una empresa de Scribd logo
1 de 66
June 13, 2012

HBase Consistency and
Performance Improvements
Esteban Gutierrez, Gregory Chanan
{esteban, gchanan}@cloudera.com
Who We Are

    • Esteban Gutierrez
      – Customer Operations Engineer
      - Focused on HBase operations
    • Gregory Chanan
      – HBase developer
      – Currently focused on wire compatibility




2
                      ©2012 Cloudera, Inc. All Rights Reserved.
Apache HBase

    Apache HBase is a
    distributed, scalable
    column-oriented data
    store that runs on top
    of HDFS. It provides
    consistent, low
    latency, random
    read/write access.


3
                     ©2012 Cloudera, Inc. All Rights Reserved.
HBase Data Format



RowKey        header:from                  header:subject          body:text

greg_email1   sister@gmail.com             Father’s day card       <…>

greg_email2   friend@gmail.com             Taco night              <…>




4
                       ©2012 Cloudera, Inc. All Rights Reserved.
HBase Data Format


Column names are family:qualifier
RowKey        header:from                  header:subject          body:text

greg_email1   sister@gmail.com             Father’s day card       <…>

greg_email2   friend@gmail.com             Taco night              <…>




5
                       ©2012 Cloudera, Inc. All Rights Reserved.
HBase Data Format


Column names are family:qualifier
RowKey        header:from                  header:subject          body:text

greg_email1   sister@gmail.com             Father’s day card       <…>

greg_email2   friend@gmail.com Taco night                          <…>




Column Families are a set of related columns
that are physically stored together on disk

6
                       ©2012 Cloudera, Inc. All Rights Reserved.
HBase Write Path

      HBase
               Put
      Client
                                                                 HBase Server




7
                     ©2012 Cloudera, Inc. All Rights Reserved.
HBase Write Path

        HBase
                              Put
        Client
                                                                                HBase Server
                                                                                           HLog
1. Write to HLog for disaster recovery                                                   Put




 8
                                    ©2012 Cloudera, Inc. All Rights Reserved.
HBase Write Path

        HBase
                              Put
        Client
                                                                                HBase Server
                                                                                           HLog
1. Write to HLog for disaster recovery                                                   Put


                                                                     MemStore
2. Write to MemStore (in memory map)
                                                                     Put




 9
                                    ©2012 Cloudera, Inc. All Rights Reserved.
HBase Write Path

        HBase
                              Put
        Client
                                                                                HBase Server
                                                                                           HLog
1. Write to HLog for disaster recovery                                                   Put


                                                                     MemStore
2. Write to MemStore (in memory map)
                                                                     Put




 10
                                    ©2012 Cloudera, Inc. All Rights Reserved.
HBase Write Path

     HBase
              Put
     Client
              Put                                               HBase Server
                                                                           HLog
                                                                         Put


                                                     MemStore

                                                     Put




11
                    ©2012 Cloudera, Inc. All Rights Reserved.
HBase Write Path

        HBase
                              Put
        Client
                              Put                                               HBase Server
                                                                                           HLog
1. Write to HLog for disaster recovery                                                   Put      Put


                                                                     MemStore

                                                                     Put




 12
                                    ©2012 Cloudera, Inc. All Rights Reserved.
HBase Write Path

        HBase
                              Put
        Client
                              Put                                               HBase Server
                                                                                           HLog
1. Write to HLog for disaster recovery                                                   Put      Put


                                                                     MemStore           MemStore
2. Write to MemStore (in memory map)
                                                                     Put                 Put




 13
                                    ©2012 Cloudera, Inc. All Rights Reserved.
HBase Write Path

        HBase
                              Put
        Client
                              Put                                               HBase Server
                                                                                           HLog
1. Write to HLog for disaster recovery                                                   Put      Put


                                                                     MemStore           MemStore
2. Write to MemStore (in memory map)
                                                                     Put                 Put


                                                                                           HFile
3. Flush MemStore to disk as HFile
                                                                                         Put



 14
                                    ©2012 Cloudera, Inc. All Rights Reserved.
HBase Write Path - Compactions

 As we write and flush, we eventually get a
 lot of HFiles
       HFile



       HFile



       HFile




15
                  ©2012 Cloudera, Inc. All Rights Reserved.
HBase Write Path - Compactions

 As we write and flush, we eventually get a lot of
 HFiles…

        HFile


                                                               HFile
        HFile



        HFile


 Merge these together in a ―compaction‖

16
                   ©2012 Cloudera, Inc. All Rights Reserved.
HBase ACID

 • HBase 0.90 guarantees ACID transactions
   within a single row, ―with caveats‖
 • HBase 0.92 guarantees ACID compliance
   within a single row




17
                ©2012 Cloudera, Inc. All Rights Reserved.
What are ACID Transactions?

 • Atomicity
     – All parts of transaction complete or none
       complete
 • Consistency
     – Only valid data written to database
 • Isolation
     – Parallel transactions do not impact each other’s
       execution
 • Durability
     – Once transaction committed, it remains

18
                       ©2012 Cloudera, Inc. All Rights Reserved.
HBase ACID in 0.92


 • ―Any row returned by [a] scan will be a
   consistent view (i.e. that version of the
   complete row existed at some point in
   time)‖[1]

 [1] http://hbase.apache.org/acid-semantics.html



19
                    ©2012 Cloudera, Inc. All Rights Reserved.
Histories from the Trenches

     We have seen…

 • Atomic Bulk Uploads
 • Read ACID Compliance




20
                 ©2012 Cloudera, Inc. All Rights Reserved.
Atomic Bulk Upload



 • A common pattern of use in HBase is to
   upload data as fast as possible from
   external sources
 • HRegion.bulkLoadHFile() makes that
   possible



21
                 ©2011 Cloudera, Inc. All Rights Reserved.
Atomic Bulk Upload



 • Unfortunately importing Multiple
   Column Family HFiles is not an
   atomic operation




22
                  ©2012 Cloudera, Inc. All Rights Reserved.
Atomic Bulk Upload



 • Unfortunately importing Multiple
   Column Family HFiles was not an
   atomic operation




23
                 ©2012 Cloudera, Inc. All Rights Reserved.
Atomic Bulk Upload


          Row 1           HRegion.bulkLoadHFile() ≤ HBase 0.90.5


                    HFile1:               HFile2:                      HFile3:       HFile4:
                  header:to              meta:labels                  body:text     attach:file


     T1            sister@...


     T2            sister@...          family
          Scan




     T3            sister@...          family                        Hi…

     T4            sister@...          family                        Hi…          image/jpeg




24
                                ©2012 Cloudera, Inc. All Rights Reserved.
Atomic Bulk Upload

 Workarounds
 • Implement application level validation of
   the imported data




25
                  ©2011 Cloudera, Inc. All Rights Reserved.
Atomic Bulk Upload


          Row 1           HRegion.bulkLoadHFiles() ≥ HBase 0.92


                    HFile1:             HFile2:                      HFile3:     HFile4:
                  header:to            meta:labels                  body:text   attach:file


     T1

     T2
          Scan




     T3

     T4




26
                              ©2012 Cloudera, Inc. All Rights Reserved.
Atomic Bulk Upload


          Row 1           HRegion.bulkLoadHFiles() ≥ HBase 0.92


                    HFile1:               HFile2:                      HFile3:       HFile4:
                  header:to              meta:labels                  body:text     attach:file


     T1

     T2
          Scan




     T3

     T4            sister@...          family                        Hi…          image/jpeg …




27
                                ©2012 Cloudera, Inc. All Rights Reserved.
Read ACID Compliance

 Issue
 • Some records missing
 • Results are used to update an user facing
   application
 • Customer is not happy
     — ―Where is my data?”




28
                   ©2011 Cloudera, Inc. All Rights Reserved.
Read ACID Compliance

 Symptoms

                                                                                 Run 1
              …             …                                                …
                            SPLIT_RAW_FILES                                  …
     Map-Reduce Framework
                            Map output records                               500,000




29
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read ACID Compliance

 Symptoms

                                                                                 Run 1   Run 2
              …             …                                                …           …
                            SPLIT_RAW_FILES                                  …           …
     Map-Reduce Framework
                            Map output records                               500,000     499,997




30
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read ACID Compliance

 Symptoms

                                                                                 Run 1   Run 2     Run 3
              …             …                                                …           …         …
                            SPLIT_RAW_FILES                                  …           …         …
     Map-Reduce Framework
                            Map output records                               500,000     499,997   500,001




31
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read ACID Compliance

 Symptoms

                                                                                             Run 1   Run 2     Run 3
                   …                 …                                                   …           …         …
                                     SPLIT_RAW_FILES                                     …           …         …
     Map-Reduce Framework
                                     Map output records                                  500,000     499,997   500,001


                       header:to      header:from              body:text
     greg_email1       sister@...     greg@...                 Hi…
     greg_email2       sister@...
     esteban_email3                   esteban@...              Good news!..

     esteban_email3    brother@...




32
                                                 ©2011 Cloudera, Inc. All Rights Reserved.
Read ACID Compliance

 Symptoms
       Scale testing shows between 0.5% to 2% of inconsistent results between runs

                                                                                             Run 1   Run 2     Run 3
                   …                 …                                                   …           …         …
                                     SPLIT_RAW_FILES                                     …           …         …
     Map-Reduce Framework
                                     Map output records                                  500,000     499,997   500,001


                       header:to      header:from              body:text
     greg_email1       sister@...     greg@...                 Hi…
     greg_email2       sister@...
     esteban_email3                   esteban@...              Good news!..

     esteban_email3    brother@...




33
                                                 ©2011 Cloudera, Inc. All Rights Reserved.
Read ACID Compliance




 • Seen only twice by Cloudera
   Support
 • Hard to detect if application
   level monitoring is not
   implemented


34
                  ©2012 Cloudera, Inc. All Rights Reserved.
Read ACID Compliance

 Workarounds
 • Re-try scan if not all CFs are present
 • Or use a single CF
 • Re-submit job if any inconsistency is found




35
                  ©2011 Cloudera, Inc. All Rights Reserved.
Read ACID Compliance

 Long-Term Solution
 • Sometimes workarounds not possible --
   SLAs!
 • Upgrade to 0.92+




36
                ©2011 Cloudera, Inc. All Rights Reserved.
MVCC

 • HBase maintains ACID semantics using
   Multiversion Concurrency Control
 • Instead of overwriting state, create a new
   version of object with timestamp
     memStoreTs RowKey              fam1:col1                        fam2:col2
     t1        row1                 val1                             val1




37
                         ©2012 Cloudera, Inc. All Rights Reserved.
Multi Version Concurrency Control

 • HBase maintains ACID semantics using
   Multiversion Concurrency Control
 • Instead of overwriting state, create a new
   version of object with timestamp (―memStoreTs‖)
     memstoreTs RowKey              fam1:col1                        fam2:col2
     t2        row1                 val2                             val2
     t1        row1                 val1                             val1

 • Reads never have to block
 • ―memStoreTs‖ is not externally visible! Different
   from external timestamp


38
                         ©2012 Cloudera, Inc. All Rights Reserved.
Review: HBase Write Path

        HBase
                              Put
        Client
                              Put                                               HBase Server
                                                                                           HLog
1. Write to Hlog for disaster recovery                                                   Put      Put


                                                                     MemStore           MemStore
2. Write to MemStore (in memory map)
                                                                     Put                 Put


                                                                                           HFile
3. Flush MemStore to disk as HFile
                                                                                         Put



 39
                                    ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together

 Let’s go back to the beginning…

                    MemStore
     memstoreTs RowKey            hdr:from                           body:text

     t1        greg_email         wife                               pick up kids




40
                         ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together

 Let’s go back to the beginning…

                    MemStore
     memstoreTs RowKey            hdr:from                           body:text

     t1        greg_email         wife                               pick up kids


 And start a scan.




41
                         ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together

 Let’s go back to the beginning…

                    MemStore
     memstoreTs RowKey            hdr:from                           body:text

     t2        greg_email         coworker                           bug report
     t1        greg_email         wife                               pick up kids
 And start a scan.
 And concurrently put.




42
                         ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together

 Let’s go back to the beginning…

                    MemStore
     memstoreTs RowKey            hdr:from                           body:text

     t2        greg_email         coworker                           bug report
     t1        greg_email         wife                               pick up kids
 And start a scan.                                                       HFile
 And concurrently put.                                      RowKey             body:text
 Which causes a flush.                                      greg_email         bug report
                                                            greg_email         pick up kids




43
                         ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Now, scan needs to make sense of this…
               MemStore
     memstoreTs RowKey               hdr:from

     t2          greg_email          coworker
     t1          greg_email          wife

                 HFile
          RowKey         body:text
          greg_email     bug report
          greg_email     pick up kids




44
                               ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Now, scan needs to make sense of this…
                 MemStore
     memstoreTs RowKey                  hdr:from

     t2            greg_email           coworker
     t1            greg_email           wife

                    HFile
           RowKey           body:text
           greg_email       bug report
           greg_email       pick up kids
          But HFile has no timestamp!




45
                                  ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Now, scan needs to make sense of this…
                 MemStore
     memstoreTs RowKey                  hdr:from

     t2            greg_email           coworker
     t1            greg_email           wife

                    HFile                                                     Inconsistent Result
           RowKey           body:text                         RowKey              hdr:from    body:text
           greg_email       bug report                        greg_email          wife wife    bug report
                                                                                              bug report
           greg_email       pick up kids
          But HFile has no timestamp!




46
                                  ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Now, scan needs to make sense of this…
                 MemStore
     memstoreTs RowKey                  hdr:from

     t2            greg_email           coworker
     t1            greg_email           wife

                    HFile                                                     Inconsistent Result
           RowKey           body:text                         RowKey              hdr:from    body:text
           greg_email       bug report                        greg_email          wife wife    bug report
                                                                                              bug report
           greg_email       pick up kids
          But HFile has no timestamp!




47
                                  ©2012 Cloudera, Inc. All Rights Reserved.
Solution
      Store the timestamp in the Hfile
               MemStore                                                        HFile
memstoreTs    RowKey       hdr:from                      memStoreTs          RowKey       body:text

                                                         t2                  greg_email   bug report
t2            greg_email   coworker
t1            greg_email   wife                          t1                  greg_email   pick up kids


                           Correct Result
                  RowKey          hdr:from                          body:text
                  greg_email      val1 wife                         val1 up kids
                                                                     pick


      Now we have all the information we need


     48
                               ©2012 Cloudera, Inc. All Rights Reserved.
Consistency

 • Only some of the consistency issues in 0.90
   – e.g. HBASE-5121: MajorCompaction may
     affect scan's correctness
 • Solution: Upgrade to 0.92/0.94




49
                   ©2012 Cloudera, Inc. All Rights Reserved.
Consistency to Performance

 • Initial community focus on correctness and
   consistency
 • HBase adoption growing
     – Number of customers
     – Size of deployment
 • Newer focus on performance




50
                      ©2012 Cloudera, Inc. All Rights Reserved.
Performance

 • Initial community focus on correctness and
   consistency
 • HBase adoption growing
     – Number of customers
     – Size of deployment
 • Newer focus on performance
     – 0.94 dubbed the ―performance release‖




51
                       ©2012 Cloudera, Inc. All Rights Reserved.
Performance Areas for Improvement

 •   Read Path
 •   Compactions
 •   Write Path
 •   HDFS level




52
                   ©2012 Cloudera, Inc. All Rights Reserved.
Performance Areas for Improvement

 • Read Path
     – Support checksums in HFile format (HBASE-5047)
 • Compactions
     – Delete out of TTL store files before compactions
       (HBASE-5199)
 • Write Path
     – HLog Compression (HBASE-4608)
 • HDFS level
     – Works with hadoop 2.0
     – See HBase and HDFS: Past, Present and Future
 • And much more!

53
                        ©2012 Cloudera, Inc. All Rights Reserved.
Performance Areas for Improvement

 • Read Path
     – Support checksums in HFile format (HBASE-5047)
 • Compactions
     – Delete out of TTL store files before compactions
       (HBASE-5199)
 • Write Path
     – HLog Compression (HBASE-4608)
 • HDFS level
     – Works with hadoop 2.0
     – See HBase and HDFS: Past, Present and Future
 • And much more!

54
                        ©2012 Cloudera, Inc. All Rights Reserved.
Read Path Performance: Checksums
 • HDFS stores checksum in separate file
            HFile              Checksum




 • So each file read actually requires two disk iops
 • HBase often bottlenecked by random disk iops




55
                        ©2012 Cloudera, Inc. All Rights Reserved.
Read Path Performance: Checksums
 • Solution: Store checksum in HFile block
 • Turn off HDFS-level checksum
              HFile                                  HFile Block
                                                           Chksum

                                                              Data




 • On by default (―hbase.regionserver.checksum.verify‖)
 • Bytes per checksum (―hbase.hstore.bytes.per.checksum‖) –
   default is 16K



56
                        ©2012 Cloudera, Inc. All Rights Reserved.
Performance Areas for Improvement

 • Read Path
     – Support checksums in HFile format (HBASE-5047)
 • Compactions
     – Delete out of TTL store files before compactions
       (HBASE-5199)
 • Write Path
     – HLog Compression (HBASE-4608)
 • HDFS level
     – Works with hadoop 2.0
     – See HBase and HDFS: Past, Present and Future
 • And much more!

57
                        ©2012 Cloudera, Inc. All Rights Reserved.
Compaction Performance
 • Recall: Compactions




 • User can specify TTL per column family




58
                      ©2012 Cloudera, Inc. All Rights Reserved.
Compaction Performance
 • Recall: Compactions




 • User can specify TTL per column family
 • If all values in the HFile expired, delete rather than
   compact


59
                         ©2012 Cloudera, Inc. All Rights Reserved.
Performance Areas for Improvement

 • Read Path
     – Support checksums in HFile format (HBASE-5047)
 • Compactions
     – Delete out of TTL store files before compactions
       (HBASE-5199)
 • Write Path
     – HLog Compression (HBASE-4608)
 • HDFS level
     – Works with hadoop 2.0
     – See HBase and HDFS: Past, Present and Future
 • And much more!

60
                        ©2012 Cloudera, Inc. All Rights Reserved.
HBase Performance Comparison

 Test Setup:
 • Compare CDH4 to CDH3u4
 • 5 node cluster running Yahoo Cloud Serving
   Benchmark (YCSB)
 • 5 million records
 • Two distributions of operations:
     – 100% write
     – 50% read, 50% write




61
                      ©2012 Cloudera, Inc. All Rights Reserved.
HBase Performance Results

 • 100% write workload:
     – 49% throughput improvement
     – 28% latency improvement
 • 50% write, 50% read workload:
     – 14% throughput improvement
     – 14% latency improvement




62
                      ©2012 Cloudera, Inc. All Rights Reserved.
HBase Performance Conclusion

 • Caveat: Need to run performance tests on your
   workload
 • But compelling to upgrade to HBase to 0.92/0.94
   and hadoop 2.0




63
                   ©2012 Cloudera, Inc. All Rights Reserved.
Conclusion

 • Many consistency improvements in 0.92 /
   CDH4
 • Performance improvements in 0.94
 • 0.94 is wire compatible with 0.92, so will
   be in a CDH4 update




64
                  ©2012 Cloudera, Inc. All Rights Reserved.
References
 • HBase Acid Semantics, http://hbase.apache.org/acid-
   semantics.html
 • Apache HBase Meetup @ SU; Michael Stack.
   http://files.meetup.com/1350427/20120327hbase_meetu
   p.pdf
 • HBase Internals; Lars Hofhansl.
   http://www.cloudera.com/resource/hbasecon-2012-
   learning-hbase-internals/
 • Hbase and HDFS: Past, Present, and Future; Todd
   Lipcon http://www.cloudera.com/resource/hbasecon-
   2012-hbase-and-hdfs-past-present-future/



65
                     ©2012 Cloudera, Inc. All Rights Reserved.
Questions?

 Thanks for listening!




66
                     ©2012 Cloudera, Inc. All Rights Reserved.

Más contenido relacionado

La actualidad más candente

Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012StampedeCon
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera, Inc.
 
Compaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloCompaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloHortonworks
 
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHortonworks
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Cloudera, Inc.
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batchboorad
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationAdam Kawa
 
How to collect Big Data into Hadoop
How to collect Big Data into HadoopHow to collect Big Data into Hadoop
How to collect Big Data into HadoopSadayuki Furuhashi
 
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentYahoo Developer Network
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksCloudera, Inc.
 
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...Sumeet Singh
 

La actualidad más candente (19)

Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
 
Compaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloCompaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache Accumulo
 
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batch
 
Introduction to h base
Introduction to h baseIntroduction to h base
Introduction to h base
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
How to collect Big Data into Hadoop
How to collect Big Data into HadoopHow to collect Big Data into Hadoop
How to collect Big Data into Hadoop
 
Hbase
HbaseHbase
Hbase
 
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
 
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 

Destacado

HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...Cloudera, Inc.
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceCloudera, Inc.
 
HBaseCon 2013: Deal Personalization Engine with HBase @ Groupon
HBaseCon 2013: Deal Personalization Engine with HBase @ GrouponHBaseCon 2013: Deal Personalization Engine with HBase @ Groupon
HBaseCon 2013: Deal Personalization Engine with HBase @ GrouponCloudera, Inc.
 
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - SematextHBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - SematextCloudera, Inc.
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaselarsgeorge
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQueryDharmesh Vaya
 

Destacado (6)

HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
 
HBaseCon 2013: Deal Personalization Engine with HBase @ Groupon
HBaseCon 2013: Deal Personalization Engine with HBase @ GrouponHBaseCon 2013: Deal Personalization Engine with HBase @ Groupon
HBaseCon 2013: Deal Personalization Engine with HBase @ Groupon
 
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - SematextHBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 

Similar a Hadoop Summit 2012 | HBase Consistency and Performance Improvements

Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)alexbaranau
 
Hadoop and HBase in the Real World
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real WorldCloudera, Inc.
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveTapan Avasthi
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
Hadoop and h base in the real world
Hadoop and h base in the real worldHadoop and h base in the real world
Hadoop and h base in the real worldJoey Echeverria
 
HBase User Group #9: HBase and HDFS
HBase User Group #9: HBase and HDFSHBase User Group #9: HBase and HDFS
HBase User Group #9: HBase and HDFSCloudera, Inc.
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFSDataWorks Summit
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use casesJoey Echeverria
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Hortonworks
 
Ria2010 workshop dev mobile
Ria2010 workshop dev mobileRia2010 workshop dev mobile
Ria2010 workshop dev mobileMichael Chaize
 
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersProtecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersDataWorks Summit
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataCloudera, Inc.
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...DataWorks Summit
 
Hadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and RepairHadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and RepairCloudera, Inc.
 

Similar a Hadoop Summit 2012 | HBase Consistency and Performance Improvements (17)

Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
Hadoop and HBase in the Real World
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real World
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
Hadoop and h base in the real world
Hadoop and h base in the real worldHadoop and h base in the real world
Hadoop and h base in the real world
 
HBase User Group #9: HBase and HDFS
HBase User Group #9: HBase and HDFSHBase User Group #9: HBase and HDFS
HBase User Group #9: HBase and HDFS
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFS
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use cases
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012
 
Ria2010 workshop dev mobile
Ria2010 workshop dev mobileRia2010 workshop dev mobile
Ria2010 workshop dev mobile
 
Future of HCatalog
Future of HCatalogFuture of HCatalog
Future of HCatalog
 
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersProtecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against Disasters
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
 
Apache HDFS High Availability
Apache HDFS High AvailabilityApache HDFS High Availability
Apache HDFS High Availability
 
Hadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and RepairHadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and Repair
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 

Hadoop Summit 2012 | HBase Consistency and Performance Improvements

  • 1. June 13, 2012 HBase Consistency and Performance Improvements Esteban Gutierrez, Gregory Chanan {esteban, gchanan}@cloudera.com
  • 2. Who We Are • Esteban Gutierrez – Customer Operations Engineer - Focused on HBase operations • Gregory Chanan – HBase developer – Currently focused on wire compatibility 2 ©2012 Cloudera, Inc. All Rights Reserved.
  • 3. Apache HBase Apache HBase is a distributed, scalable column-oriented data store that runs on top of HDFS. It provides consistent, low latency, random read/write access. 3 ©2012 Cloudera, Inc. All Rights Reserved.
  • 4. HBase Data Format RowKey header:from header:subject body:text greg_email1 sister@gmail.com Father’s day card <…> greg_email2 friend@gmail.com Taco night <…> 4 ©2012 Cloudera, Inc. All Rights Reserved.
  • 5. HBase Data Format Column names are family:qualifier RowKey header:from header:subject body:text greg_email1 sister@gmail.com Father’s day card <…> greg_email2 friend@gmail.com Taco night <…> 5 ©2012 Cloudera, Inc. All Rights Reserved.
  • 6. HBase Data Format Column names are family:qualifier RowKey header:from header:subject body:text greg_email1 sister@gmail.com Father’s day card <…> greg_email2 friend@gmail.com Taco night <…> Column Families are a set of related columns that are physically stored together on disk 6 ©2012 Cloudera, Inc. All Rights Reserved.
  • 7. HBase Write Path HBase Put Client HBase Server 7 ©2012 Cloudera, Inc. All Rights Reserved.
  • 8. HBase Write Path HBase Put Client HBase Server HLog 1. Write to HLog for disaster recovery Put 8 ©2012 Cloudera, Inc. All Rights Reserved.
  • 9. HBase Write Path HBase Put Client HBase Server HLog 1. Write to HLog for disaster recovery Put MemStore 2. Write to MemStore (in memory map) Put 9 ©2012 Cloudera, Inc. All Rights Reserved.
  • 10. HBase Write Path HBase Put Client HBase Server HLog 1. Write to HLog for disaster recovery Put MemStore 2. Write to MemStore (in memory map) Put 10 ©2012 Cloudera, Inc. All Rights Reserved.
  • 11. HBase Write Path HBase Put Client Put HBase Server HLog Put MemStore Put 11 ©2012 Cloudera, Inc. All Rights Reserved.
  • 12. HBase Write Path HBase Put Client Put HBase Server HLog 1. Write to HLog for disaster recovery Put Put MemStore Put 12 ©2012 Cloudera, Inc. All Rights Reserved.
  • 13. HBase Write Path HBase Put Client Put HBase Server HLog 1. Write to HLog for disaster recovery Put Put MemStore MemStore 2. Write to MemStore (in memory map) Put Put 13 ©2012 Cloudera, Inc. All Rights Reserved.
  • 14. HBase Write Path HBase Put Client Put HBase Server HLog 1. Write to HLog for disaster recovery Put Put MemStore MemStore 2. Write to MemStore (in memory map) Put Put HFile 3. Flush MemStore to disk as HFile Put 14 ©2012 Cloudera, Inc. All Rights Reserved.
  • 15. HBase Write Path - Compactions As we write and flush, we eventually get a lot of HFiles HFile HFile HFile 15 ©2012 Cloudera, Inc. All Rights Reserved.
  • 16. HBase Write Path - Compactions As we write and flush, we eventually get a lot of HFiles… HFile HFile HFile HFile Merge these together in a ―compaction‖ 16 ©2012 Cloudera, Inc. All Rights Reserved.
  • 17. HBase ACID • HBase 0.90 guarantees ACID transactions within a single row, ―with caveats‖ • HBase 0.92 guarantees ACID compliance within a single row 17 ©2012 Cloudera, Inc. All Rights Reserved.
  • 18. What are ACID Transactions? • Atomicity – All parts of transaction complete or none complete • Consistency – Only valid data written to database • Isolation – Parallel transactions do not impact each other’s execution • Durability – Once transaction committed, it remains 18 ©2012 Cloudera, Inc. All Rights Reserved.
  • 19. HBase ACID in 0.92 • ―Any row returned by [a] scan will be a consistent view (i.e. that version of the complete row existed at some point in time)‖[1] [1] http://hbase.apache.org/acid-semantics.html 19 ©2012 Cloudera, Inc. All Rights Reserved.
  • 20. Histories from the Trenches We have seen… • Atomic Bulk Uploads • Read ACID Compliance 20 ©2012 Cloudera, Inc. All Rights Reserved.
  • 21. Atomic Bulk Upload • A common pattern of use in HBase is to upload data as fast as possible from external sources • HRegion.bulkLoadHFile() makes that possible 21 ©2011 Cloudera, Inc. All Rights Reserved.
  • 22. Atomic Bulk Upload • Unfortunately importing Multiple Column Family HFiles is not an atomic operation 22 ©2012 Cloudera, Inc. All Rights Reserved.
  • 23. Atomic Bulk Upload • Unfortunately importing Multiple Column Family HFiles was not an atomic operation 23 ©2012 Cloudera, Inc. All Rights Reserved.
  • 24. Atomic Bulk Upload Row 1 HRegion.bulkLoadHFile() ≤ HBase 0.90.5 HFile1: HFile2: HFile3: HFile4: header:to meta:labels body:text attach:file T1 sister@... T2 sister@... family Scan T3 sister@... family Hi… T4 sister@... family Hi… image/jpeg 24 ©2012 Cloudera, Inc. All Rights Reserved.
  • 25. Atomic Bulk Upload Workarounds • Implement application level validation of the imported data 25 ©2011 Cloudera, Inc. All Rights Reserved.
  • 26. Atomic Bulk Upload Row 1 HRegion.bulkLoadHFiles() ≥ HBase 0.92 HFile1: HFile2: HFile3: HFile4: header:to meta:labels body:text attach:file T1 T2 Scan T3 T4 26 ©2012 Cloudera, Inc. All Rights Reserved.
  • 27. Atomic Bulk Upload Row 1 HRegion.bulkLoadHFiles() ≥ HBase 0.92 HFile1: HFile2: HFile3: HFile4: header:to meta:labels body:text attach:file T1 T2 Scan T3 T4 sister@... family Hi… image/jpeg … 27 ©2012 Cloudera, Inc. All Rights Reserved.
  • 28. Read ACID Compliance Issue • Some records missing • Results are used to update an user facing application • Customer is not happy — ―Where is my data?” 28 ©2011 Cloudera, Inc. All Rights Reserved.
  • 29. Read ACID Compliance Symptoms Run 1 … … … SPLIT_RAW_FILES … Map-Reduce Framework Map output records 500,000 29 ©2011 Cloudera, Inc. All Rights Reserved.
  • 30. Read ACID Compliance Symptoms Run 1 Run 2 … … … … SPLIT_RAW_FILES … … Map-Reduce Framework Map output records 500,000 499,997 30 ©2011 Cloudera, Inc. All Rights Reserved.
  • 31. Read ACID Compliance Symptoms Run 1 Run 2 Run 3 … … … … … SPLIT_RAW_FILES … … … Map-Reduce Framework Map output records 500,000 499,997 500,001 31 ©2011 Cloudera, Inc. All Rights Reserved.
  • 32. Read ACID Compliance Symptoms Run 1 Run 2 Run 3 … … … … … SPLIT_RAW_FILES … … … Map-Reduce Framework Map output records 500,000 499,997 500,001 header:to header:from body:text greg_email1 sister@... greg@... Hi… greg_email2 sister@... esteban_email3 esteban@... Good news!.. esteban_email3 brother@... 32 ©2011 Cloudera, Inc. All Rights Reserved.
  • 33. Read ACID Compliance Symptoms Scale testing shows between 0.5% to 2% of inconsistent results between runs Run 1 Run 2 Run 3 … … … … … SPLIT_RAW_FILES … … … Map-Reduce Framework Map output records 500,000 499,997 500,001 header:to header:from body:text greg_email1 sister@... greg@... Hi… greg_email2 sister@... esteban_email3 esteban@... Good news!.. esteban_email3 brother@... 33 ©2011 Cloudera, Inc. All Rights Reserved.
  • 34. Read ACID Compliance • Seen only twice by Cloudera Support • Hard to detect if application level monitoring is not implemented 34 ©2012 Cloudera, Inc. All Rights Reserved.
  • 35. Read ACID Compliance Workarounds • Re-try scan if not all CFs are present • Or use a single CF • Re-submit job if any inconsistency is found 35 ©2011 Cloudera, Inc. All Rights Reserved.
  • 36. Read ACID Compliance Long-Term Solution • Sometimes workarounds not possible -- SLAs! • Upgrade to 0.92+ 36 ©2011 Cloudera, Inc. All Rights Reserved.
  • 37. MVCC • HBase maintains ACID semantics using Multiversion Concurrency Control • Instead of overwriting state, create a new version of object with timestamp memStoreTs RowKey fam1:col1 fam2:col2 t1 row1 val1 val1 37 ©2012 Cloudera, Inc. All Rights Reserved.
  • 38. Multi Version Concurrency Control • HBase maintains ACID semantics using Multiversion Concurrency Control • Instead of overwriting state, create a new version of object with timestamp (―memStoreTs‖) memstoreTs RowKey fam1:col1 fam2:col2 t2 row1 val2 val2 t1 row1 val1 val1 • Reads never have to block • ―memStoreTs‖ is not externally visible! Different from external timestamp 38 ©2012 Cloudera, Inc. All Rights Reserved.
  • 39. Review: HBase Write Path HBase Put Client Put HBase Server HLog 1. Write to Hlog for disaster recovery Put Put MemStore MemStore 2. Write to MemStore (in memory map) Put Put HFile 3. Flush MemStore to disk as HFile Put 39 ©2012 Cloudera, Inc. All Rights Reserved.
  • 40. Putting it together Let’s go back to the beginning… MemStore memstoreTs RowKey hdr:from body:text t1 greg_email wife pick up kids 40 ©2012 Cloudera, Inc. All Rights Reserved.
  • 41. Putting it together Let’s go back to the beginning… MemStore memstoreTs RowKey hdr:from body:text t1 greg_email wife pick up kids And start a scan. 41 ©2012 Cloudera, Inc. All Rights Reserved.
  • 42. Putting it together Let’s go back to the beginning… MemStore memstoreTs RowKey hdr:from body:text t2 greg_email coworker bug report t1 greg_email wife pick up kids And start a scan. And concurrently put. 42 ©2012 Cloudera, Inc. All Rights Reserved.
  • 43. Putting it together Let’s go back to the beginning… MemStore memstoreTs RowKey hdr:from body:text t2 greg_email coworker bug report t1 greg_email wife pick up kids And start a scan. HFile And concurrently put. RowKey body:text Which causes a flush. greg_email bug report greg_email pick up kids 43 ©2012 Cloudera, Inc. All Rights Reserved.
  • 44. Putting it together Now, scan needs to make sense of this… MemStore memstoreTs RowKey hdr:from t2 greg_email coworker t1 greg_email wife HFile RowKey body:text greg_email bug report greg_email pick up kids 44 ©2012 Cloudera, Inc. All Rights Reserved.
  • 45. Putting it together Now, scan needs to make sense of this… MemStore memstoreTs RowKey hdr:from t2 greg_email coworker t1 greg_email wife HFile RowKey body:text greg_email bug report greg_email pick up kids But HFile has no timestamp! 45 ©2012 Cloudera, Inc. All Rights Reserved.
  • 46. Putting it together Now, scan needs to make sense of this… MemStore memstoreTs RowKey hdr:from t2 greg_email coworker t1 greg_email wife HFile Inconsistent Result RowKey body:text RowKey hdr:from body:text greg_email bug report greg_email wife wife bug report bug report greg_email pick up kids But HFile has no timestamp! 46 ©2012 Cloudera, Inc. All Rights Reserved.
  • 47. Putting it together Now, scan needs to make sense of this… MemStore memstoreTs RowKey hdr:from t2 greg_email coworker t1 greg_email wife HFile Inconsistent Result RowKey body:text RowKey hdr:from body:text greg_email bug report greg_email wife wife bug report bug report greg_email pick up kids But HFile has no timestamp! 47 ©2012 Cloudera, Inc. All Rights Reserved.
  • 48. Solution Store the timestamp in the Hfile MemStore HFile memstoreTs RowKey hdr:from memStoreTs RowKey body:text t2 greg_email bug report t2 greg_email coworker t1 greg_email wife t1 greg_email pick up kids Correct Result RowKey hdr:from body:text greg_email val1 wife val1 up kids pick Now we have all the information we need 48 ©2012 Cloudera, Inc. All Rights Reserved.
  • 49. Consistency • Only some of the consistency issues in 0.90 – e.g. HBASE-5121: MajorCompaction may affect scan's correctness • Solution: Upgrade to 0.92/0.94 49 ©2012 Cloudera, Inc. All Rights Reserved.
  • 50. Consistency to Performance • Initial community focus on correctness and consistency • HBase adoption growing – Number of customers – Size of deployment • Newer focus on performance 50 ©2012 Cloudera, Inc. All Rights Reserved.
  • 51. Performance • Initial community focus on correctness and consistency • HBase adoption growing – Number of customers – Size of deployment • Newer focus on performance – 0.94 dubbed the ―performance release‖ 51 ©2012 Cloudera, Inc. All Rights Reserved.
  • 52. Performance Areas for Improvement • Read Path • Compactions • Write Path • HDFS level 52 ©2012 Cloudera, Inc. All Rights Reserved.
  • 53. Performance Areas for Improvement • Read Path – Support checksums in HFile format (HBASE-5047) • Compactions – Delete out of TTL store files before compactions (HBASE-5199) • Write Path – HLog Compression (HBASE-4608) • HDFS level – Works with hadoop 2.0 – See HBase and HDFS: Past, Present and Future • And much more! 53 ©2012 Cloudera, Inc. All Rights Reserved.
  • 54. Performance Areas for Improvement • Read Path – Support checksums in HFile format (HBASE-5047) • Compactions – Delete out of TTL store files before compactions (HBASE-5199) • Write Path – HLog Compression (HBASE-4608) • HDFS level – Works with hadoop 2.0 – See HBase and HDFS: Past, Present and Future • And much more! 54 ©2012 Cloudera, Inc. All Rights Reserved.
  • 55. Read Path Performance: Checksums • HDFS stores checksum in separate file HFile Checksum • So each file read actually requires two disk iops • HBase often bottlenecked by random disk iops 55 ©2012 Cloudera, Inc. All Rights Reserved.
  • 56. Read Path Performance: Checksums • Solution: Store checksum in HFile block • Turn off HDFS-level checksum HFile HFile Block Chksum Data • On by default (―hbase.regionserver.checksum.verify‖) • Bytes per checksum (―hbase.hstore.bytes.per.checksum‖) – default is 16K 56 ©2012 Cloudera, Inc. All Rights Reserved.
  • 57. Performance Areas for Improvement • Read Path – Support checksums in HFile format (HBASE-5047) • Compactions – Delete out of TTL store files before compactions (HBASE-5199) • Write Path – HLog Compression (HBASE-4608) • HDFS level – Works with hadoop 2.0 – See HBase and HDFS: Past, Present and Future • And much more! 57 ©2012 Cloudera, Inc. All Rights Reserved.
  • 58. Compaction Performance • Recall: Compactions • User can specify TTL per column family 58 ©2012 Cloudera, Inc. All Rights Reserved.
  • 59. Compaction Performance • Recall: Compactions • User can specify TTL per column family • If all values in the HFile expired, delete rather than compact 59 ©2012 Cloudera, Inc. All Rights Reserved.
  • 60. Performance Areas for Improvement • Read Path – Support checksums in HFile format (HBASE-5047) • Compactions – Delete out of TTL store files before compactions (HBASE-5199) • Write Path – HLog Compression (HBASE-4608) • HDFS level – Works with hadoop 2.0 – See HBase and HDFS: Past, Present and Future • And much more! 60 ©2012 Cloudera, Inc. All Rights Reserved.
  • 61. HBase Performance Comparison Test Setup: • Compare CDH4 to CDH3u4 • 5 node cluster running Yahoo Cloud Serving Benchmark (YCSB) • 5 million records • Two distributions of operations: – 100% write – 50% read, 50% write 61 ©2012 Cloudera, Inc. All Rights Reserved.
  • 62. HBase Performance Results • 100% write workload: – 49% throughput improvement – 28% latency improvement • 50% write, 50% read workload: – 14% throughput improvement – 14% latency improvement 62 ©2012 Cloudera, Inc. All Rights Reserved.
  • 63. HBase Performance Conclusion • Caveat: Need to run performance tests on your workload • But compelling to upgrade to HBase to 0.92/0.94 and hadoop 2.0 63 ©2012 Cloudera, Inc. All Rights Reserved.
  • 64. Conclusion • Many consistency improvements in 0.92 / CDH4 • Performance improvements in 0.94 • 0.94 is wire compatible with 0.92, so will be in a CDH4 update 64 ©2012 Cloudera, Inc. All Rights Reserved.
  • 65. References • HBase Acid Semantics, http://hbase.apache.org/acid- semantics.html • Apache HBase Meetup @ SU; Michael Stack. http://files.meetup.com/1350427/20120327hbase_meetu p.pdf • HBase Internals; Lars Hofhansl. http://www.cloudera.com/resource/hbasecon-2012- learning-hbase-internals/ • Hbase and HDFS: Past, Present, and Future; Todd Lipcon http://www.cloudera.com/resource/hbasecon- 2012-hbase-and-hdfs-past-present-future/ 65 ©2012 Cloudera, Inc. All Rights Reserved.
  • 66. Questions? Thanks for listening! 66 ©2012 Cloudera, Inc. All Rights Reserved.

Notas del editor

  1. We are going to talk about recent improvements in HBase for ACID consistency and performance. We are going to discuss customer cases, and also look at the internals of HBase to give you a taste of these issues.
  2. This is what the data format looks like, how do we write it?
  3. This is what the data format looks like, how do we write it?
  4. This is what the data format looks like, how do we write it?
  5. This is what the data format looks like, how do we write it?
  6. This is what the data format looks like, how do we write it?
  7. This is what the data format looks like, how do we write it?
  8. This is what the data format looks like, how do we write it?
  9. This is what the data format looks like, how do we write it?
  10. At Cloudera support we have seen few issues where hbase consistency can be a problem
  11. In some workflows is desirable to upload data directly (ETL) into Hbase instead of invokingPut() to add new records. Depending on the case of use it might also have some performance advantages.
  12. It was Fixed in 0.92HBASE-4552 and back ported into Hbase 0.90.5 (for convenience its also available since CDH3u3)
  13. Each read returns partial content for the same row. It can be empty data or an old version of the data.
  14. Also is possible to monitor the logs and metrics before exposing the new data to users.
  15. WithHBASE-4552 read-write lock was implemented in order to make the data available to the readers until the bulkupload was complete. Also the old method was deprecated and a new one was implemented.
  16. Once this lock is release the data is available to the readers (Scan)
  17. In our example this is a system is an email storage based hbase that stores millions of emails and a MR task is concurrently running to classify emails as spam.
  18. MR users will see the counters familiar, in this example we are running a filter that scans only for a dataset of 500K records from a table with 50M rows.
  19. Remember, the filter should return always 500k records
  20. Remember, the filter should return always 500k records
  21. Not only empty rows can be this behavior, depending on the number of version you can get old data too!
  22. This is production, so you can’t stop the service just to try a workaround.
  23. This is production, so you can’t stop the service just to try a workaround.
  24. This is what the data format looks like, how do we write it?
  25. 2
  26. 2