SlideShare una empresa de Scribd logo
1 de 14
1
David MacKenzie
Box Engineering
@davrmac @BoxEng
/events @ Box: Using HBase
as a message queue
2
Share, manage and access your content from any device, anywhere
3
What is the /events API?
• Realtime stream of all activity happening within a user’s account
• GET /events?stream_position=1234&stream_type=all
• Persistent and re-playable
1 2 3 4 5
Client
4
Why did we build it?
• Main use-case was sync  switch from batch to incremental diffs
• Several requirements arose from the sync use case:
‒ Guaranteed delivery
‒ Clients can be offline for days at a time
‒ Arbitrary number of clients consuming each user’s stream
Persistence
Re-playability
5
How is it implemented?
• Each user assigned a separate section of the HBase key-space
• Messages are stored in order from oldest to newest within a user’s
section of the key-space
• Reads map directly to scans from the provided position to the user’s end
key
• Row key structure: <pseudo-random prefix>_<user_id>_<position>
2-bytes of user_id sha1 Millisecond timestamp
6
Using a timestamp as a queue position
• Pro: Allows for allocating roughly monotonically increasing positions
with no co-ordination between write requests
• Con: Isn’t sufficient to guarantee append-only semantics in the presence
of parallel writes
Write
Write 2
Write
R
e
a
d
1
2
R
e
a
d
7
Time-bounding and Back-scanning
• Need to ensure that clients don’t advance their stream positions past
writes that will eventually succeed
‒ But clients do need to advance position eventually
‒ How do we know when it’s safe?
• Solution: time-bound writes and back-scan reads
‒ Time-bounding: every write to HBase must complete within a fixed time-bound to be
considered successful
‒ No guaranteed delivery for unsuccessful writes.
‒ Clients should retry failed writes at higher stream positions.
‒ Back-scanning: clients cannot advance their stream positions further than (current
time – back-scan interval)
‒ Back-scan interval >= write time-bound
• Provides guaranteed delivery but at the cost of duplicate events
8
3
Write
Write
R
e
a
d
2
3
Write R
e
a
d
1
2
3
Write
R
e
a
dWrite 4
9
Replication
• Master/slave architecture
‒ One cluster per DC
‒ Master cluster handles all reads and writes
‒ Slave clusters are passive replicas
• On promotion, clients transparently fail over to the new master cluster
• Can’t use native HBase replication directly
‒ Could cause clients to miss events when failing over to a lagging cluster
Replication
1
2
1
Failover Replication
1
2
1
Write
R
e
a
d3
10
Replication Contd.
• Replication system needs to be aware of master/slave failovers
‒ Stop exactly replicating messages. Start appending messages to the current ends of
the queues.
• Currently, use a client-level replication system piggy backing on MySQL
replication
• Plan to switch to a system that hooks into HBase replication by
configuring itself as a slave HBase cluster
1
2
1
Failover
1
2
1
3
4
R
e
a
d
11
Why HBase?
• Closest off-the-rack queuing system is Kafka
‒ Developed at LinkedIn. Open sourced in 2011.
‒ Originally built to power LinkedIn’s analytics pipeline
‒ Very similar model built around “ordered commit logs”
‒ Allow for easy addition of new subscribers
‒ Allow for varying subscriber consumption patterns  slow subscribers don’t back up the
pipeline
12
Why HBase and not Kafka?
• Better consistency vs. availability tradeoffs
‒ No automatic rack aware replica placement
‒ No automatic replica re-assignment upon replica failure
‒ On replica failure, no fast failover of new writes to new replicas.
‒ Can’t require minimum replication factor for new writes without significantly impacting
availability on replica failure
• Replication support
‒ Not enough control over Kafka queue positions to implement transparent client
failovers between replica clusters
• Unable to scale to millions of topics
‒ Currently tops out in the tens of thousands of topics.
‒ Design requires very granular topic tracking. Barrier to scale.
13
In conclusion…
• We were able to leverage HBase to store millions of guaranteed delivery
message queues, each of which was:
‒ replicated between data centers
‒ independently consumable by an arbitrary number of clients
• Cluster metrics:
‒ ~30 nodes per cluster
‒ 15K write/sec at peak. Bursts of up to 40K writes/sec.
‒ 50K-60K requests/sec at peak.
14
Questions?
Twitter @davrmac
@BoxEng
Engineering Blog tech.blog.box.com
Platform developers.box.com
Open Source opensource.box.com

Más contenido relacionado

Destacado

Destacado (20)

Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
 
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
HBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsHBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three Acts
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBase
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
 
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 Minutes
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
Bulk Loading in the Wild: Ingesting the World's Energy Data
Bulk Loading in the Wild: Ingesting the World's Energy DataBulk Loading in the Wild: Ingesting the World's Energy Data
Bulk Loading in the Wild: Ingesting the World's Energy Data
 

Más de HBaseCon

Más de HBaseCon (20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 

Último

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Último (20)

WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 

HBaseCon 2015: Events @ Box - Using HBase as a Message Queue

  • 1. 1 David MacKenzie Box Engineering @davrmac @BoxEng /events @ Box: Using HBase as a message queue
  • 2. 2 Share, manage and access your content from any device, anywhere
  • 3. 3 What is the /events API? • Realtime stream of all activity happening within a user’s account • GET /events?stream_position=1234&stream_type=all • Persistent and re-playable 1 2 3 4 5 Client
  • 4. 4 Why did we build it? • Main use-case was sync  switch from batch to incremental diffs • Several requirements arose from the sync use case: ‒ Guaranteed delivery ‒ Clients can be offline for days at a time ‒ Arbitrary number of clients consuming each user’s stream Persistence Re-playability
  • 5. 5 How is it implemented? • Each user assigned a separate section of the HBase key-space • Messages are stored in order from oldest to newest within a user’s section of the key-space • Reads map directly to scans from the provided position to the user’s end key • Row key structure: <pseudo-random prefix>_<user_id>_<position> 2-bytes of user_id sha1 Millisecond timestamp
  • 6. 6 Using a timestamp as a queue position • Pro: Allows for allocating roughly monotonically increasing positions with no co-ordination between write requests • Con: Isn’t sufficient to guarantee append-only semantics in the presence of parallel writes Write Write 2 Write R e a d 1 2 R e a d
  • 7. 7 Time-bounding and Back-scanning • Need to ensure that clients don’t advance their stream positions past writes that will eventually succeed ‒ But clients do need to advance position eventually ‒ How do we know when it’s safe? • Solution: time-bound writes and back-scan reads ‒ Time-bounding: every write to HBase must complete within a fixed time-bound to be considered successful ‒ No guaranteed delivery for unsuccessful writes. ‒ Clients should retry failed writes at higher stream positions. ‒ Back-scanning: clients cannot advance their stream positions further than (current time – back-scan interval) ‒ Back-scan interval >= write time-bound • Provides guaranteed delivery but at the cost of duplicate events
  • 9. 9 Replication • Master/slave architecture ‒ One cluster per DC ‒ Master cluster handles all reads and writes ‒ Slave clusters are passive replicas • On promotion, clients transparently fail over to the new master cluster • Can’t use native HBase replication directly ‒ Could cause clients to miss events when failing over to a lagging cluster Replication 1 2 1 Failover Replication 1 2 1 Write R e a d3
  • 10. 10 Replication Contd. • Replication system needs to be aware of master/slave failovers ‒ Stop exactly replicating messages. Start appending messages to the current ends of the queues. • Currently, use a client-level replication system piggy backing on MySQL replication • Plan to switch to a system that hooks into HBase replication by configuring itself as a slave HBase cluster 1 2 1 Failover 1 2 1 3 4 R e a d
  • 11. 11 Why HBase? • Closest off-the-rack queuing system is Kafka ‒ Developed at LinkedIn. Open sourced in 2011. ‒ Originally built to power LinkedIn’s analytics pipeline ‒ Very similar model built around “ordered commit logs” ‒ Allow for easy addition of new subscribers ‒ Allow for varying subscriber consumption patterns  slow subscribers don’t back up the pipeline
  • 12. 12 Why HBase and not Kafka? • Better consistency vs. availability tradeoffs ‒ No automatic rack aware replica placement ‒ No automatic replica re-assignment upon replica failure ‒ On replica failure, no fast failover of new writes to new replicas. ‒ Can’t require minimum replication factor for new writes without significantly impacting availability on replica failure • Replication support ‒ Not enough control over Kafka queue positions to implement transparent client failovers between replica clusters • Unable to scale to millions of topics ‒ Currently tops out in the tens of thousands of topics. ‒ Design requires very granular topic tracking. Barrier to scale.
  • 13. 13 In conclusion… • We were able to leverage HBase to store millions of guaranteed delivery message queues, each of which was: ‒ replicated between data centers ‒ independently consumable by an arbitrary number of clients • Cluster metrics: ‒ ~30 nodes per cluster ‒ 15K write/sec at peak. Bursts of up to 40K writes/sec. ‒ 50K-60K requests/sec at peak.
  • 14. 14 Questions? Twitter @davrmac @BoxEng Engineering Blog tech.blog.box.com Platform developers.box.com Open Source opensource.box.com