Facebook has one of the largest Apache Hadoop data warehouses in the world, primarily queried through Apache Hive for offline data processing and analytics. However, the need for realtime analytics and end-user access has led to the development of several new systems built using Apache HBase. This talk will cover specific use cases and the work done at Facebook around building large scale, low latency and high throughput realtime services with Hadoop and HBase. This includes several significant contributions to existing projects as well as the release of new open source projects.
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoop and HBase - Jonathan Gray, Facebook
1.
2. Realtime Big Data at Facebook
with Hadoop and HBase
Jonathan Gray
November ,
Hadoop World NYC
3. Agenda
Why Hadoop and HBase?
Applications of HBase at Facebook
Future of HBase at Facebook
4. About Me Jonathan Gray
▪ Previous life as Co-Founder of Streamy.com
▪ Realtime Social News Aggregator
▪ Big Data problems led us to Hadoop/HBase
▪ HBase committer and Hadoop user/complainer
▪ Software Engineer at Facebook
▪ Develop, support, and evangelize HBase across teams
▪ Recently joined Database Infrastructure Engineering
MySQL and HBase together at last!
6. Cache Data analysis
OS Web server Database Language
7. Problems with existing stack
▪ MySQL is stable, but...
▪ Limited throughput
▪ Not inherently distributed
▪ Table size limits
▪ Inflexible schema
▪ Memcached is fast, but...
▪ Only key-value so data is opaque
▪ No write-through
8. Problems with existing stack
▪ Hadoop is scalable, but...
▪ MapReduce is slow
▪ Writing MapReduce is difficult
▪ Does not support random writes
▪ Poor support for random reads
9. Specialized solutions
▪ Inbox Search
▪ Cassandra
▪ High-throughput, persistent key-value
▪ Tokyo Cabinet
▪ Large scale data warehousing
▪ Hive
▪ Custom C++ servers for lots of other stuff
10. Finding a new online data store
▪ Consistent patterns emerge
▪ Massive datasets, often largely inactive
▪ Lots of writes
▪ Fewer reads
▪ Dictionaries and lists
▪ Entity-centric schemas
▪ per-user, per-domain, per-app
11. Finding a new online data store
▪ Other requirements laid out
▪ Elasticity
▪ High availability
▪ Strong consistency within a datacenter
▪ Fault isolation
▪ Some non-requirements
▪ Network partitions within a single datacenter
▪ Active-active serving from multiple datacenters
12. Finding a new online data store
▪ In , engineers at FB compared DBs
▪ Apache Cassandra, Apache HBase, Sharded MySQL
▪ Compared performance, scalability, and features
▪ HBase gave excellent write performance, good reads
▪ HBase already included many nice-to-have features
▪ Atomic read-modify-write operations
▪ Multiple shards per server
▪ Bulk importing
▪ Range scans
13. HBase uses HDFS
We get the benefits of HDFS as a storage
system for free
▪ Fault tolerance
▪ Scalability
▪ Checksums fix corruptions
▪ MapReduce
▪ Fault isolation of disks
▪ HDFS battle tested at petabyte scale at Facebook
▪ Lots of existing operational experience
14. HBase in a nutshell
▪ Sorted and column-oriented
▪ High write throughput
▪ Horizontal scalability
▪ Automatic failover
▪ Regions sharded dynamically
18. Facebook Messaging
▪ Largest engineering effort in the history of FB
▪ engineers over more than a year
▪ Incorporates over infrastructure technologies
▪ Hadoop, HBase, Haystack, ZooKeeper, etc...
▪ A product at massive scale on day one
▪ Hundreds of millions of active users
▪ + billion messages a month
▪ k instant messages a second on average
19. Messaging Challenges
▪ High write throughput
▪ Every message, instant message, SMS, and e-mail
▪ Search indexes and metadata for all of the above
▪ Denormalized schema
▪ Massive clusters
▪ So much data and usage requires a large server footprint
▪ Do not want outages to impact availability
▪ Must be able to easily scale out
20. High Write Throughput
Write
Key Value
Sequential
Key val Key val write
Key val Key val
Key val Key val
.
. .
.
. . memory
Sorted in
Key val Key val
.
.
.
Key val Sequential Key val
write
Commit Log Memstore
23. Facebook Messages Stats
▪ B+ messages per day
▪ B+ read/write ops to HBase per day
▪ . M ops/sec at peak
▪ read, write
▪~ columns per operation across multiple families
▪ PB+ of online data in HBase
▪ LZO compressed and un-replicated ( PB replicated)
▪ Growing at TB/month
27. Puma as Realtime MapReduce
▪ Map phase with PTail
▪ Divide the input log stream into N shards
▪ First version supported random bucketing
▪ Now supports application-level bucketing
▪ Reduce phase with HBase
▪ Every row+column in HBase is an output key
▪ Aggregate key counts using atomic counters
▪ Can also maintain per-key lists or other structures
28. Puma for Facebook Insights
▪ Realtime URL/Domain Insights
▪ Domain owners can see deep analytics for their site
▪ Clicks, Likes, Shares, Comments, Impressions
▪ Detailed demographic breakdowns (anonymized)
▪ Top URLs calculated per-domain and globally
▪ Massive Throughput
▪ Billions of URLs
▪> Million counter increments per second
29.
30.
31. Future of Puma
▪ Centrally managed service for many products
▪ Several other applications in production
▪ Commerce Tracking
▪ Ad Insights
▪ Making Puma generic
▪ Dynamically configured by product teams
▪ Custom query language
33. ODS
▪ Operational Data Store
▪ System metrics (CPU, Memory, IO, Network)
▪ Application metrics (Web, DB, Caches)
▪ Facebook metrics (Usage, Revenue)
▪ Easily graph this data over time
▪ Supports complex aggregation, transformations, etc.
▪ Difficult to scale with MySQL
▪ Millions of unique time-series with billions of points
▪ Irregular data growth patterns
37. Why now?
▪ MySQL+Memcached hard to replace, but...
▪ Joins and other RDBMS functionality are gone
▪ From writing SQL to using APIs
▪ Next generation of services and caches make the
persistent storage engine transparent to www
▪ Primarily a financially motivated decision
▪ MySQL works, but can HBase save us money?
▪ Also, are there things we just couldn’t do before?
38. HBase vs. MySQL
▪ MySQL at Facebook
▪ Tier size determined solely by IOPS
▪ Heavy on random IO for reads and writes
▪ Rely on fast disks or flash to scale individual nodes
▪ HBase showing promise of cost savings
▪ Fewer IOPS on write-heavy workloads
▪ Larger tables on denser, cheaper nodes
▪ Simpler operations and replication “for free”
39. HBase vs. MySQL
▪ MySQL is not going anywhere soon
▪ It works!
▪ But HBase is a great addition to the tool belt
▪ Different set of trade-offs
▪ Great at storing key-values, dictionaries, and lists
▪ Products with heavy write requirements
▪ Generated data
▪ Potential capital and operational cost savings
40. UDB Challenges
▪ MySQL has a + year head start
▪ HBase is still a pre- . database system
▪ Insane Requirements
▪ Zero data loss, low latency, very high throughput
▪ Reads, writes, and atomic read-modify-writes
▪ WAN replication, backups w/ point-in-time recovery
▪ Live migration of critical user data w/ existing shards
▪ queryf() and other fun edge cases to deal with
41. Technical/Developer oriented talk tomorrow:
Apache HBase Road Map
A short history of nearly everything HBase. Past, present, and future.
Wednesday @ 1PM in the Met Ballroom
42. Check out the HBase at Facebook Page:
facebook.com/UsingHbase
Thanks! Questions?