Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest

447 visualizaciones

Publicado el

Tianying Chang

HBase is used to serve online facing traffic in Pinterest. It means no downtime is allowed. However, we were on HBase 94. To upgrade to latest version, we need to figure out a way to live upgrade while keeping Pinterest site live. Recently, we successfully upgrade 94 HBase cluster to 1.2 with no downtime. We made change to both Asynchbase and HBase server side. We will talk about what we did and how we did it. We will also talk about the finding in config and performance tuning we did to achieve low latency.

hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#

Publicado en: Tecnología
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest

  1. 1. Tian-Ying Chang Storage & Caching, Engineering @ Pinterest Removable Singularity AStoryof HBaseUpgradeatPinterest
  2. 2. 1 2 3 4 5 Agenda Usage of HBase in Pinterest Upgrading Situation and Challenges Migration Steps Performance Tuning Final Notes
  3. 3. HBase @ Pinterest §Early applications: homefeed, search & discovery §UserMetaStore: a generic store layer for all applications using user KV data §Zen: a graph service layer • Many applications can be modeled as graph • Usage of HBase flourished after Zen release §Other misc. use cases, e.g., analytics, OpenTSDB
  4. 4. 40+ HBase clusters on 0.94
  5. 5. Need Upgrading §HBase 0.94 is not supported by community anymore §Newer version has better reliability, availability and performance §Easier to contribute back to community
  6. 6. Singularity of HBase 0.96 §RPC protocol changes §Data format changes §HDFS folder structure changes §API changes §Generally considered “impossible” to live upgrade without downtime http://blog.cloudera.com/blog/2012/06/the-singularity-hbase-compatibility-and-extensibility/
  7. 7. The Dilemma §Singularity: cannot live upgrade from HBase 0.94 to later version • Need downtime §Pinterest hosts critical online real time services on HBase • Cannot afford downtime §Stuck?
  8. 8. Fast Forward §Successfully upgraded production clusters last year • Chose one of the most loaded clusters as pilot • No application redeploy needed on the day of switching to 1.2 cluster • Live switch with no interrupt to Pinterest site §Big performance improvement
  9. 9. P99LatencyofDifferentAPIs Pointof LiveSwitch
  10. 10. How did we do it?
  11. 11. ZK Client read/write HBase 0.94
  12. 12. ZK Client read/write HBase 0.94 HBase 0.94 native replication
  13. 13. ZK Client read/write HBase 0.94 HBase 0.94 native replication
  14. 14. ZK Client read/write HBase 0.94
  15. 15. ZK Client read/write HBase 0.94 HBase 0.94 1. Build empty cluster
  16. 16. ZK Client read/write HBase 0.94 HBase 0.94 native replication 1. Build empty cluster 2. Setup replication
  17. 17. ZK Client read/write HBase 0.94 HBase 0.94 native replication 1. Build empty cluster 2. Setup replication 3. Export snapshot
  18. 18. ZK Client read/write HBase 0.94 HBase 0.94 native replication 1. Build empty cluster 2. Setup replication 3. Export snapshot 4. Recover table from snapshot
  19. 19. ZK Client read/write HBase 0.94 HBase 0.94 native replication 1. Build empty cluster 2. Setup replication 3. Export snapshot 4. Recover table from snapshot 5. Replication drain
  20. 20. ZK Client read/write HBase 0.94 HBase 0.94 native replication
  21. 21. ZK Client read/write HBase 0.94 HBase 1.2replication
  22. 22. ZK Client read/write HBase 0.94 HBase 1.2replication
  23. 23. ZK Client read/write HBase 0.94HBase 0.94
  24. 24. ZK Client read/write HBase 0.94 HBase 1.2 1. Build empty cluster HBase 0.94
  25. 25. ZK Client read/write HBase 0.94 HBase 1.2 non-native replication 1. Build empty cluster 2. Setup replication HBase 0.94
  26. 26. ZK Client read/write HBase 0.94 HBase 1.2 non-native replication 1. Build empty cluster 2. Setup replication 3. Export snapshot HBase 0.94
  27. 27. ZK Client read/write HBase 0.94 HBase 1.2 non-native replication 1. Build empty cluster 2. Setup replication 3. Export snapshot 4. Recover 1.2 table from 0.94 snapshot HBase 0.94
  28. 28. ZK Client read/write HBase 0.94 HBase 1.2 non-native replication 1. Build empty cluster 2. Setup replication 3. Export snapshot 4. Recover table from snapshot 5. Replication drain HBase 0.94
  29. 29. ZK Client read/write HBase 0.94 HBase 1.2 non-native replicationHBase 0.94
  30. 30. Major Problems to Solve Client able to talk to both 0.94 and 1.2 automatically Data can be replicated between HBase 0.94 and 1.2 bi- directional
  31. 31. AsyncHBase Client §Chose AsyncHBase due to better throughput and latency • Stock AsyncHBase 1.7 can talk to both 0.94 and later version by detecting the HBase version and use different protocol §But cannot directly use stock AsyncHBase 1.7 • We made many private improvements internally • Need to make those private features work with 1.2 cluster
  32. 32. AsyncHBase Client Improvements § BatchGet (open sourced in tsdb 2.4) § SmallScan § Ping feature to handle AWS network issue § Pluggable metric framework § Metrics broken down by RPC/region/RS • Useful for debugging issues with better slice and dice § Rate limiting feature • Automatically throttle/blacklist requests based on, e.g., latency • Easier and better place to do throttling than at HBase RS side § In progress to open source
  33. 33. Live Data Migration §Export snapshots from 0.94, recover tables in 1.2 • Relatively easy since we were already doing it between 0.94 and 0.94 • Modifying our existing DR/backup tool to work between 0.94 and 1.2 §Bidirectional live replication between 0.94 and 1.2 • Breaking changes in RPC protocol means native replication does not work • Using thrift replication to overcome the issue
  34. 34. Thrift Replication §Patch from Flurry HBASE-12814 §Fixed a bug in the 0.98/1.2 version • Threading bug exposed during prod data testing with high write QPS • Fixed by implementing thrift client connection pool for each replication sink • Fix also made the replication more performant §Bidirectional is needed for potential rollback §Verification!! • Chckr: tool for checking data replication consistency between 0.94 and 1.2 cluster • Used a configurable timestamp parameter to eliminate false positive caused by replication delay
  35. 35. Upgrade Steps (Recap) §Build new 1.2 empty cluster §Set up master/master thrift replication between 0.94 and 1.2 §Export snapshot from 0.94 into 1.2 cluster §Recover table in 1.2 cluster §Replication draining §Monitor health metrics §Switch client to use 1.2 cluster
  36. 36. Performance §Use production dark read/write traffic to verify performance §Measured round trip latency from AsyncHBase layer • Cannot do server latency compare since 1.2 has p999 server side latency, while 0.94 does not • Use metric breakdown by RPS/region/RS with Ostrich implementation to compare performance
  37. 37. Get BatchGet Put CompAndSet Delete
  38. 38. Read Performance §Default config has worse p99 latency ** • Bucket cache hurts p99 latency due to bad GC §Short circuit read helps latency §Use CMS instead of G1GC §Native checksum helps latency **The read path off-heap feature from 2.0 should help a lot. HBASE-17138
  39. 39. Write Performance §Use write heavy load user case to expose perf issues §Metrics shows much higher wal sync ops than 0.94 §Disruptor based wal sync implementation caused too much wal sync operation §hbase.regionserver.hlog.syncer.count default is 5, changed to 1
  40. 40. Thanks! Community helps from Michael Stack and Rahul Gidwani
  41. 41. © Copyright, All Rights Reserved, Pinterest Inc. 2017 We are hiring!

×