SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
Storage Infrastructure Behind
                 Facebook Messages
                        HBase/HDFS/ZK/MapReduce

                        Kannan Muthukkaruppan
                          Software Engineer, Facebook




Big Data Experiences & Scars, HPTS 2011
Oct 24th, 2011
The New Facebook Messages

 Messages   Chats    Emails   SMS
Why we chose HBase
▪   High write throughput
▪   Good random read performance
▪   Horizontal scalability
▪   Automatic Failover
▪   Strong consistency
▪   Benefits of HDFS
    ▪   Fault tolerant, scalable, checksums, MapReduce
    ▪   internal dev & ops expertise
What do we store in HBase
▪   HBase
    ▪   Small messages
    ▪   Message metadata (thread/message indices)
    ▪   Search index


▪   Haystack (our photo store)
    ▪   Attachments
    ▪   Large messages
HBase-HDFS System Overview
                             Database Layer

   HBASE
       Master      Backup
                   Master
   Region           Region                    Region    ...
   Server           Server                    Server
                                                              Coordination Service
            Storage Layer

HDFS                                                   Zookeeper Quorum
   Namenode          Secondary Namenode                ZK      ZK           ...
                                                       Peer    Peer
Datanode Datanode           Datanode          ...
Facebook Messages Architecture
                                                           User Directory Service
           Clients
   (Front End, MTA, etc.)
                            What’s the cell for
                               this user?         Cell 2
        Cell 1                                                   Cell 3

                                  Cell 1   Application Server
 Application Server                                  Application Server

                                                  Attachments
                                             HBase/HDFS/Z
  HBase/HDFS/Z                                Message, Metadata,
                                             K Search HBase/HDFS/Z
                                                        Index
  K                                                    K


                            Haystack
Typical Cluster Layout
 ▪   Multiple clusters/cells for messaging
     ▪   20 servers/rack; 5 or more racks per cluster
 ▪   Controllers (master/Zookeeper) spread across racks
 ZooKeeper Peer       ZooKeeper Peer     ZooKeeper Peer    ZooKeeper Peer     ZooKeeper Peer
 HDFS Namenode        Backup Namenode    Job Tracker       HBase Master     Backup HBase Master


Region Server        Region Server      Region Server     Region Server      Region Server
Data Node            Data Node          Data Node         Data Node          Data Node
Task Tracker         Task Tracker       Task Tracker      Task Tracker       Task Tracker



19x...                19x...            19x...            19x...             19x...


Region Server        Region Server      Region Server     Region Server      Region Server
Data Node            Data Node          Data Node         Data Node          Data Node
Task Tracker         Task Tracker       Task Tracker      Task Tracker       Task Tracker

         Rack #1          Rack #2            Rack #3           Rack #4            Rack #5
Facebook Messages: Quick Stats
▪   6B+ messages/day


▪   Traffic to HBase
     ▪ 75+ Billion R+W ops/day

     ▪ At peak: 1.5M ops/sec

     ▪ ~ 55% Read vs. 45% Write ops

     ▪ Avg write op inserts ~16 records across multiple

       column families.
Facebook Messages: Quick Stats (contd.)
▪   2PB+ of online data in HBase (6PB+ with replication;
    excludes backups)
    ▪ message data, metadata, search index

▪   All data LZO compressed
▪   Growing at 250TB/month
Facebook Messages: Quick Stats (contd.)
Timeline:
▪   Started in Dec 2009
▪   Roll out started in Nov 2010
▪   Fully rolled out by July 2011 (migrated 1B+ accounts from
    legacy messages!)


While in production:
▪   Schema changes: not once, but twice!
▪   Implemented & rolled out HFile V2 and numerous other
    optimizations in an upward compatible manner!
Shadow Testing: Before Rollout
▪   Both product and infrastructure were changing.
▪   Shadows the old messages product + chat while the new one was
    under development

            Messages                    Chat          Backfilled data from
           V1 Frontend                Frontend        MySQL for % of users and
                                                      mirrored their production
                                                      traffic


                                                       Application Servers



       Cassandra         MySQL           ChatLogger
                                                      HBase/HDFS Cluster


                         Production                        Shadow
Shadow Testing: After Rollout
▪   Shadows new version of the Messages product.
▪   All backend changes go through shadow cluster before prod push


                       New
                     Messages
                     Frontend



                 Application Servers               Application Servers




                                                 HBase/HDFS Cluster
                 HBase/HDFS Cluster


                                                       Shadow
                   Production
Backup/Recovery (V1)
•   During early phase, concerned about potential bugs in HBase.
•   Off-line backups: written to HDFS via Scribe
•   Recovery tools; testing of recovery tools

          Message
          Frontend


     Application Servers          Scribe/HDFS                      Local
                                                                 Datacenter
                                                                   HDFS          Remote
                                                                                Datacenter
                                                         • Merge & Dedup          HDFS
    HBase/HDFS Cluster                                   • Store locally, and
                                                         • Copy to Remote DC


                           Double log from AppServer & HBase
                           to reduce probability of data loss
Backups (V2)
▪   Now, does periodic HFile level backups.
▪   Working on:
    ▪   Moving to HFile + Commit Log based backups to be able to recover to
        finer grained points in time
        ▪   Avoid need to log data to Scribe.
    ▪   Zero copy (hard link based) fast backups
Messages Schema & Evolution
▪   “Actions” (data) Column Family the source of truth
    ▪   Log of all user actions (addMessage, markAsRead, etc.)

▪   Metadata (thread index, message index, search index) etc. in other
    column families
▪   Metadata portion of schema underwent 3 changes:
        ▪   Coarse grained snapshots (early development; rollout up to 1M users)
        ▪   Hybrid (up to full rollout – 1B+ accounts; 800M+ active)
        ▪   Fine-grained metadata (after rollout)
▪   MapReduce jobs against production clusters!
        ▪   Ran in throttled way
        ▪   Heavy use of HBase bulk import features
Write Path Overview
Region Server
                ....
            Region #2
       Region #1
                        ....
                  ColumnFamily #2

            ColumnFamily #1
                                    Memstore


                HFiles (in HDFS)



  Write Ahead Log ( in HDFS)
                                      Append/Sync
Flushes: Memstore -> HFile
Region Server
               ....
           Region #2
      Region #1
                        ....
                 ColumnFamily #2

           ColumnFamily #1
                                       Memstore


                       HFiles                               flush




           Data in HFile is sorted; has block index for efficient
           retrieval
Read Path Overview
Region Server
               ....
           Region #2
      Region #1
                         ....
                 ColumnFamily #2

           ColumnFamily #1
                                   Memstore


                       HFiles



                                         Get
Compactions
Region Server
               ....
           Region #2
      Region #1
                        ....
                 ColumnFamily #2

           ColumnFamily #1
                                   Memstore

               HFiles
Reliability: Early work
▪   HDFS sync support for durability of transactions
▪   Multi-CF transaction atomicity
▪   Several bug fixes in log recovery
▪   New block placement policy in HDFS
    ▪   To reduce probability of data loss
Availability: Early Work
▪   Common reasons for unavailability:
    ▪   S/W upgrades
        ▪   Solution: rolling upgrades
    ▪   Schema Changes
        ▪   Applications needs new Column Families
        ▪   Need to change settings for a CF
        ▪   Solution: online “alter table”
    ▪   Load balancing or cluster restarts took forever
        ▪   Upon investigation: stuck waiting for compactions to finish
        ▪   Solution: Interruptible Compactions!
Performance: Early Work
▪   Read optimizations:
    ▪   Seek optimizations for rows with large number of cells
    ▪   Bloom Filters
        ▪   minimize HFile lookups
    ▪   Timerange hints on HFiles (great for temporal data)
    ▪   Multigets
    ▪   Improved handling of compressed HFiles
Performance: Compactions                                          Was
▪   Critical for read performance
▪   Old Algorithm:
#1. Start from newest file (file 0); include next file if:
    ▪   size[i] < size[i-1] * C (good!)
#2. Always compact at least 4 files, even if rule #1 isn’t met.

Solution:
                                                                  Now
#1. Compact at least 4 files, but only if eligible files found.
#2. Also, new file selection based on summation of sizes.
           size[i] < (size[0] + size[1] + …size[i-1]) * C
Performance: Compactions
▪   More problems!
    ▪   Read performance dips during      Small Compaction Q   Big Compaction Q
        peak
    ▪   Major compaction storms               Compaction         Compaction

    ▪   Large compactions bottleneck          Compaction        Compaction

▪   Enhancements/fixes:                       Compaction
    ▪   Staggered major compactions
                                              Compaction
    ▪   Multi-thread compactions;
        separate queues for small & big
        compactions
    ▪   Aggressive off-peak compactions
Metrics, metrics, metrics…
▪   Initially, only had coarse level overall metrics (get/put latency/ops;
    block cache counters).
▪   Slow query logging
▪   Added per Column Family stats for:
    ▪   ops counts, latency
    ▪   block cache usage & hit ratio
    ▪   memstore usage
    ▪   on-disk file sizes
    ▪   file counts
    ▪   bytes returned, bytes flushed, compaction statistics
    ▪   stats by block type (data block vs. index blocks vs. bloom blocks, etc.)
    ▪   bloom filter stats
Metrics (contd.)
▪   HBase Master Statistics:
    ▪   Number of region servers alive
    ▪   Number of regions
    ▪   Load balancing statistics
    ▪   ..
▪   All stats stored in Facebook’s Operational Data Store (ODS).
▪   Lots of ODS dashboards for debugging issues
    ▪   Side note: ODS planning to use HBase for storage pretty soon!
Need to keep up as data grows on you!
▪   Rapidly iterated on several new features while in production:
    ▪   Block indexes upto 6GB per server! Cluster starts taking longer and
        longer. Block cache hit ratio on the decline.
        ▪   Solution: HFile V2
                ▪   Multi-level block index, Sharded Bloom Filters
    ▪   Network pegged after restarts
        ▪   Solution: Locality on full & rolling restart
    ▪   High disk utilization during peak
        ▪   Solution: Several “seek” optimizations to reduce disk IOPS
            ▪   Lazy Seeks (use time hints to avoid seeking into older HFiles)
            ▪   Special bloom filter for deletes to avoid additional seek
            ▪   Utilize off-peak IOPS to do more aggressive compactions during
Scares & Scars!
▪   Not without our share of scares and incidents:
    ▪   s/w bugs. (e.g., deadlocks, incompatible LZO used for bulk imported data, etc.)
        ▪   found a edge case bug in log recovery as recently as last week!
    ▪   performance spikes every 6 hours (even off-peak)!
            ▪   cleanup of HDFS’s Recycle bin was sub-optimal! Needed code and config fix.
    ▪   transient rack switch failures
    ▪   Zookeeper leader election took than 10 minutes when one member of the
        quorum died. Fixed in more recent version of ZK.
    ▪   HDFS Namenode – SPOF
    ▪   flapping servers (repeated failures)
Scares & Scars! (contd.)
▪   Sometimes, tried things which hadn’t been tested in dark launch!
        ▪   Added a rack of servers to help with performance issue
            ▪   Pegged top of the rack network bandwidth!
                ▪   Had to add the servers at much slower pace. Very manual .
                ▪   Intelligent load balancing needed to make this more automated.
▪   A high % of issues caught in shadow/stress testing
▪   Lots of alerting mechanisms in place to detect failures cases
    ▪   Automate recovery for a lots of common ones
    ▪   Treat alerts on shadow cluster as hi-pri too!
▪   Sharding service across multiple HBase cells also paid off
Future Work
▪   Reliability, Availability, Scalability!
▪   Lot of new use cases on top of HBase in the works.


      ▪   HDFS Namenode HA
      ▪   Recovering gracefully from transient issues
      ▪   Fast hot-backups
      ▪   Delta-encoding in block cache
      ▪   Replication
      ▪   Performance (HBase and HDFS)
      ▪   HBase as a service Multi-tenancy
      ▪   Features- coprocessors, secondary indices
Acknowledgements
Work of lot of people spanning HBase, HDFS, Migrations,
Backup/Recovery pipeline, and Ops!

   Karthik Ranganathan            Mikhail Bautin
   Nicolas Spiegelberg            Pritam Damania
   Aravind Menon                  Prakash Khemani
   Dhruba Borthakur               Doug Porter
   Hairong Kuang                  Alex Laslavic
   Jonathan Gray                  Kannan Muthukkaruppan
   Rodrigo Schmidt                + Apache HBase
   Amitanand Aiyer                + Interns
   Guoqiang (Jerry) Chen          + Messages FrontEnd Team
   Liyin Tang                     + Messages AppServer Team
   Madhu Vaidya                   + many more folks…
Thanks! Questions?
facebook.com/engineering

Más contenido relacionado

La actualidad más candente

Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0enissoz
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for ArchitectsNick Dimiduk
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseCloudera, Inc.
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsTrendProgContest13
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBayMongoDB
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clustersenissoz
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebookparallellabs
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Cloudera, Inc.
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationScott Miao
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかToshihiro Suzuki
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationAdam Kawa
 
Alfresco scalability and performnce
Alfresco   scalability and performnceAlfresco   scalability and performnce
Alfresco scalability and performncePaul Hampton
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 

La actualidad más candente (20)

Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
 
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ TwitterCross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
Alfresco scalability and performnce
Alfresco   scalability and performnceAlfresco   scalability and performnce
Alfresco scalability and performnce
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 

Similar a Storage Infrastructure Behind Facebook Messages

Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012StampedeCon
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera, Inc.
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name NodeAaron Cordova
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012Chris Huang
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfsNAVER D2
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceDerek Chen
 
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Timothy Spann
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopSpagoWorld
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbaseRavi Veeramachaneni
 
HugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage SystemHugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage Systemqlw5
 

Similar a Storage Infrastructure Behind Facebook Messages (20)

Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 
Hbase jdd
Hbase jddHbase jdd
Hbase jdd
 
Hbase
HbaseHbase
Hbase
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name Node
 
Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from Hadoop
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
HugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage SystemHugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage System
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Storage Infrastructure Behind Facebook Messages

  • 1. Storage Infrastructure Behind Facebook Messages HBase/HDFS/ZK/MapReduce Kannan Muthukkaruppan Software Engineer, Facebook Big Data Experiences & Scars, HPTS 2011 Oct 24th, 2011
  • 2. The New Facebook Messages Messages Chats Emails SMS
  • 3. Why we chose HBase ▪ High write throughput ▪ Good random read performance ▪ Horizontal scalability ▪ Automatic Failover ▪ Strong consistency ▪ Benefits of HDFS ▪ Fault tolerant, scalable, checksums, MapReduce ▪ internal dev & ops expertise
  • 4. What do we store in HBase ▪ HBase ▪ Small messages ▪ Message metadata (thread/message indices) ▪ Search index ▪ Haystack (our photo store) ▪ Attachments ▪ Large messages
  • 5. HBase-HDFS System Overview Database Layer HBASE Master Backup Master Region Region Region ... Server Server Server Coordination Service Storage Layer HDFS Zookeeper Quorum Namenode Secondary Namenode ZK ZK ... Peer Peer Datanode Datanode Datanode ...
  • 6. Facebook Messages Architecture User Directory Service Clients (Front End, MTA, etc.) What’s the cell for this user? Cell 2 Cell 1 Cell 3 Cell 1 Application Server Application Server Application Server Attachments HBase/HDFS/Z HBase/HDFS/Z Message, Metadata, K Search HBase/HDFS/Z Index K K Haystack
  • 7. Typical Cluster Layout ▪ Multiple clusters/cells for messaging ▪ 20 servers/rack; 5 or more racks per cluster ▪ Controllers (master/Zookeeper) spread across racks ZooKeeper Peer ZooKeeper Peer ZooKeeper Peer ZooKeeper Peer ZooKeeper Peer HDFS Namenode Backup Namenode Job Tracker HBase Master Backup HBase Master Region Server Region Server Region Server Region Server Region Server Data Node Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker 19x... 19x... 19x... 19x... 19x... Region Server Region Server Region Server Region Server Region Server Data Node Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker Rack #1 Rack #2 Rack #3 Rack #4 Rack #5
  • 8. Facebook Messages: Quick Stats ▪ 6B+ messages/day ▪ Traffic to HBase ▪ 75+ Billion R+W ops/day ▪ At peak: 1.5M ops/sec ▪ ~ 55% Read vs. 45% Write ops ▪ Avg write op inserts ~16 records across multiple column families.
  • 9. Facebook Messages: Quick Stats (contd.) ▪ 2PB+ of online data in HBase (6PB+ with replication; excludes backups) ▪ message data, metadata, search index ▪ All data LZO compressed ▪ Growing at 250TB/month
  • 10. Facebook Messages: Quick Stats (contd.) Timeline: ▪ Started in Dec 2009 ▪ Roll out started in Nov 2010 ▪ Fully rolled out by July 2011 (migrated 1B+ accounts from legacy messages!) While in production: ▪ Schema changes: not once, but twice! ▪ Implemented & rolled out HFile V2 and numerous other optimizations in an upward compatible manner!
  • 11. Shadow Testing: Before Rollout ▪ Both product and infrastructure were changing. ▪ Shadows the old messages product + chat while the new one was under development Messages Chat Backfilled data from V1 Frontend Frontend MySQL for % of users and mirrored their production traffic Application Servers Cassandra MySQL ChatLogger HBase/HDFS Cluster Production Shadow
  • 12. Shadow Testing: After Rollout ▪ Shadows new version of the Messages product. ▪ All backend changes go through shadow cluster before prod push New Messages Frontend Application Servers Application Servers HBase/HDFS Cluster HBase/HDFS Cluster Shadow Production
  • 13. Backup/Recovery (V1) • During early phase, concerned about potential bugs in HBase. • Off-line backups: written to HDFS via Scribe • Recovery tools; testing of recovery tools Message Frontend Application Servers Scribe/HDFS Local Datacenter HDFS Remote Datacenter • Merge & Dedup HDFS HBase/HDFS Cluster • Store locally, and • Copy to Remote DC Double log from AppServer & HBase to reduce probability of data loss
  • 14. Backups (V2) ▪ Now, does periodic HFile level backups. ▪ Working on: ▪ Moving to HFile + Commit Log based backups to be able to recover to finer grained points in time ▪ Avoid need to log data to Scribe. ▪ Zero copy (hard link based) fast backups
  • 15. Messages Schema & Evolution ▪ “Actions” (data) Column Family the source of truth ▪ Log of all user actions (addMessage, markAsRead, etc.) ▪ Metadata (thread index, message index, search index) etc. in other column families ▪ Metadata portion of schema underwent 3 changes: ▪ Coarse grained snapshots (early development; rollout up to 1M users) ▪ Hybrid (up to full rollout – 1B+ accounts; 800M+ active) ▪ Fine-grained metadata (after rollout) ▪ MapReduce jobs against production clusters! ▪ Ran in throttled way ▪ Heavy use of HBase bulk import features
  • 16. Write Path Overview Region Server .... Region #2 Region #1 .... ColumnFamily #2 ColumnFamily #1 Memstore HFiles (in HDFS) Write Ahead Log ( in HDFS) Append/Sync
  • 17. Flushes: Memstore -> HFile Region Server .... Region #2 Region #1 .... ColumnFamily #2 ColumnFamily #1 Memstore HFiles flush Data in HFile is sorted; has block index for efficient retrieval
  • 18. Read Path Overview Region Server .... Region #2 Region #1 .... ColumnFamily #2 ColumnFamily #1 Memstore HFiles Get
  • 19. Compactions Region Server .... Region #2 Region #1 .... ColumnFamily #2 ColumnFamily #1 Memstore HFiles
  • 20. Reliability: Early work ▪ HDFS sync support for durability of transactions ▪ Multi-CF transaction atomicity ▪ Several bug fixes in log recovery ▪ New block placement policy in HDFS ▪ To reduce probability of data loss
  • 21. Availability: Early Work ▪ Common reasons for unavailability: ▪ S/W upgrades ▪ Solution: rolling upgrades ▪ Schema Changes ▪ Applications needs new Column Families ▪ Need to change settings for a CF ▪ Solution: online “alter table” ▪ Load balancing or cluster restarts took forever ▪ Upon investigation: stuck waiting for compactions to finish ▪ Solution: Interruptible Compactions!
  • 22. Performance: Early Work ▪ Read optimizations: ▪ Seek optimizations for rows with large number of cells ▪ Bloom Filters ▪ minimize HFile lookups ▪ Timerange hints on HFiles (great for temporal data) ▪ Multigets ▪ Improved handling of compressed HFiles
  • 23. Performance: Compactions Was ▪ Critical for read performance ▪ Old Algorithm: #1. Start from newest file (file 0); include next file if: ▪ size[i] < size[i-1] * C (good!) #2. Always compact at least 4 files, even if rule #1 isn’t met. Solution: Now #1. Compact at least 4 files, but only if eligible files found. #2. Also, new file selection based on summation of sizes. size[i] < (size[0] + size[1] + …size[i-1]) * C
  • 24. Performance: Compactions ▪ More problems! ▪ Read performance dips during Small Compaction Q Big Compaction Q peak ▪ Major compaction storms Compaction Compaction ▪ Large compactions bottleneck Compaction Compaction ▪ Enhancements/fixes: Compaction ▪ Staggered major compactions Compaction ▪ Multi-thread compactions; separate queues for small & big compactions ▪ Aggressive off-peak compactions
  • 25. Metrics, metrics, metrics… ▪ Initially, only had coarse level overall metrics (get/put latency/ops; block cache counters). ▪ Slow query logging ▪ Added per Column Family stats for: ▪ ops counts, latency ▪ block cache usage & hit ratio ▪ memstore usage ▪ on-disk file sizes ▪ file counts ▪ bytes returned, bytes flushed, compaction statistics ▪ stats by block type (data block vs. index blocks vs. bloom blocks, etc.) ▪ bloom filter stats
  • 26. Metrics (contd.) ▪ HBase Master Statistics: ▪ Number of region servers alive ▪ Number of regions ▪ Load balancing statistics ▪ .. ▪ All stats stored in Facebook’s Operational Data Store (ODS). ▪ Lots of ODS dashboards for debugging issues ▪ Side note: ODS planning to use HBase for storage pretty soon!
  • 27. Need to keep up as data grows on you! ▪ Rapidly iterated on several new features while in production: ▪ Block indexes upto 6GB per server! Cluster starts taking longer and longer. Block cache hit ratio on the decline. ▪ Solution: HFile V2 ▪ Multi-level block index, Sharded Bloom Filters ▪ Network pegged after restarts ▪ Solution: Locality on full & rolling restart ▪ High disk utilization during peak ▪ Solution: Several “seek” optimizations to reduce disk IOPS ▪ Lazy Seeks (use time hints to avoid seeking into older HFiles) ▪ Special bloom filter for deletes to avoid additional seek ▪ Utilize off-peak IOPS to do more aggressive compactions during
  • 28. Scares & Scars! ▪ Not without our share of scares and incidents: ▪ s/w bugs. (e.g., deadlocks, incompatible LZO used for bulk imported data, etc.) ▪ found a edge case bug in log recovery as recently as last week! ▪ performance spikes every 6 hours (even off-peak)! ▪ cleanup of HDFS’s Recycle bin was sub-optimal! Needed code and config fix. ▪ transient rack switch failures ▪ Zookeeper leader election took than 10 minutes when one member of the quorum died. Fixed in more recent version of ZK. ▪ HDFS Namenode – SPOF ▪ flapping servers (repeated failures)
  • 29. Scares & Scars! (contd.) ▪ Sometimes, tried things which hadn’t been tested in dark launch! ▪ Added a rack of servers to help with performance issue ▪ Pegged top of the rack network bandwidth! ▪ Had to add the servers at much slower pace. Very manual . ▪ Intelligent load balancing needed to make this more automated. ▪ A high % of issues caught in shadow/stress testing ▪ Lots of alerting mechanisms in place to detect failures cases ▪ Automate recovery for a lots of common ones ▪ Treat alerts on shadow cluster as hi-pri too! ▪ Sharding service across multiple HBase cells also paid off
  • 30. Future Work ▪ Reliability, Availability, Scalability! ▪ Lot of new use cases on top of HBase in the works. ▪ HDFS Namenode HA ▪ Recovering gracefully from transient issues ▪ Fast hot-backups ▪ Delta-encoding in block cache ▪ Replication ▪ Performance (HBase and HDFS) ▪ HBase as a service Multi-tenancy ▪ Features- coprocessors, secondary indices
  • 31. Acknowledgements Work of lot of people spanning HBase, HDFS, Migrations, Backup/Recovery pipeline, and Ops! Karthik Ranganathan Mikhail Bautin Nicolas Spiegelberg Pritam Damania Aravind Menon Prakash Khemani Dhruba Borthakur Doug Porter Hairong Kuang Alex Laslavic Jonathan Gray Kannan Muthukkaruppan Rodrigo Schmidt + Apache HBase Amitanand Aiyer + Interns Guoqiang (Jerry) Chen + Messages FrontEnd Team Liyin Tang + Messages AppServer Team Madhu Vaidya + many more folks…