SlideShare a Scribd company logo
1 of 9
YFROG! ,[object Object]
10,000 Concurrent requests per second
Super fast!
Super Huge datastore – 2bl rows.
Backend is scalable
Does not lose data
Why? – HBASE is used for 99% of the backend
HBASE Best Practices or Taming the Beast ,[object Object]
ImageShack: 25 ml monthly uniques

More Related Content

What's hot

Cloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflakeCloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflake
SANG WON PARK
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
DataWorks Summit
 

What's hot (20)

Zabbix introduction ( RadixCloud Radix Technologies SA)
Zabbix introduction ( RadixCloud Radix Technologies SA)Zabbix introduction ( RadixCloud Radix Technologies SA)
Zabbix introduction ( RadixCloud Radix Technologies SA)
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Making the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysMaking the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British Airways
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
 
CLUB DB2 第122回 DB2管理本の著者が教える 簡単運用管理入門
CLUB DB2 第122回  DB2管理本の著者が教える 簡単運用管理入門CLUB DB2 第122回  DB2管理本の著者が教える 簡単運用管理入門
CLUB DB2 第122回 DB2管理本の著者が教える 簡単運用管理入門
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Cloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflakeCloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflake
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Zabbix Performance Tuning
Zabbix Performance TuningZabbix Performance Tuning
Zabbix Performance Tuning
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
오비맥주 사례로 보는 엔터프라이즈 데이터센터의 클라우드 마이그레이션
오비맥주 사례로 보는 엔터프라이즈 데이터센터의 클라우드 마이그레이션오비맥주 사례로 보는 엔터프라이즈 데이터센터의 클라우드 마이그레이션
오비맥주 사례로 보는 엔터프라이즈 데이터센터의 클라우드 마이그레이션
 
AnsibleおよびDockerで始めるInfrastructure as a Code
AnsibleおよびDockerで始めるInfrastructure as a CodeAnsibleおよびDockerで始めるInfrastructure as a Code
AnsibleおよびDockerで始めるInfrastructure as a Code
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
 

Viewers also liked

Adding Search to the Hadoop Ecosystem
Adding Search to the Hadoop EcosystemAdding Search to the Hadoop Ecosystem
Adding Search to the Hadoop Ecosystem
Cloudera, Inc.
 
Apache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance BenchmarksApache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance Benchmarks
Hortonworks
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
DataWorks Summit
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 

Viewers also liked (10)

Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Adding Search to the Hadoop Ecosystem
Adding Search to the Hadoop EcosystemAdding Search to the Hadoop Ecosystem
Adding Search to the Hadoop Ecosystem
 
Apache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance BenchmarksApache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance Benchmarks
 
Hadoop World 2011 Keynote: Ebay - Hugh Williams
Hadoop World 2011 Keynote: Ebay - Hugh WilliamsHadoop World 2011 Keynote: Ebay - Hugh Williams
Hadoop World 2011 Keynote: Ebay - Hugh Williams
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
 
HDFS Analysis for Small Files
HDFS Analysis for Small FilesHDFS Analysis for Small Files
HDFS Analysis for Small Files
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan
 
REST to RESTful Web Service
REST to RESTful Web ServiceREST to RESTful Web Service
REST to RESTful Web Service
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 

Similar to Hug Hbase Presentation.

1. Scaling PHP/MySQL...Presentation from Flickr
	
1.	
Scaling PHP/MySQL...Presentation from Flickr	
1.	
Scaling PHP/MySQL...Presentation from Flickr
1. Scaling PHP/MySQL...Presentation from Flickr
akshat
 

Similar to Hug Hbase Presentation. (20)

The Smug Mug Tale
The Smug Mug TaleThe Smug Mug Tale
The Smug Mug Tale
 
Mysql talk
Mysql talkMysql talk
Mysql talk
 
HBase: Extreme makeover
HBase: Extreme makeoverHBase: Extreme makeover
HBase: Extreme makeover
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
Taming Go's Memory Usage — and Avoiding a Rust Rewrite
Taming Go's Memory Usage — and Avoiding a Rust RewriteTaming Go's Memory Usage — and Avoiding a Rust Rewrite
Taming Go's Memory Usage — and Avoiding a Rust Rewrite
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 
Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data Deduplication
 
Cassandra Anti-Patterns
Cassandra Anti-PatternsCassandra Anti-Patterns
Cassandra Anti-Patterns
 
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesHBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
 
Jvm & Garbage collection tuning for low latencies application
Jvm & Garbage collection tuning for low latencies applicationJvm & Garbage collection tuning for low latencies application
Jvm & Garbage collection tuning for low latencies application
 
1. Scaling PHP/MySQL...Presentation from Flickr
	
1.	
Scaling PHP/MySQL...Presentation from Flickr	
1.	
Scaling PHP/MySQL...Presentation from Flickr
1. Scaling PHP/MySQL...Presentation from Flickr
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
 
HBase at Flurry
HBase at FlurryHBase at Flurry
HBase at Flurry
 
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_Plan
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
 

Recently uploaded

Recently uploaded (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Hug Hbase Presentation.

  • 1.
  • 4. Super Huge datastore – 2bl rows.
  • 7. Why? – HBASE is used for 99% of the backend
  • 8.
  • 9. ImageShack: 25 ml monthly uniques
  • 10. Yfrog: 33 ml monthly uniques
  • 11. 4 Hbase Clusters of various sizes (50TB to 1 PT)
  • 12. Storing and serving 250ml photos (500kb average per file), 60 servers
  • 13. Yfrog is powered by smaller 50 TB cluster, with 2 billion rows, 20 servers
  • 14. Using 0.89x and 0.90x versions
  • 15.
  • 16. Lots of RAM is good but only to a point, just avoid swap.
  • 17. We use sub $1k desktop grade servers, they work great!
  • 18. Check your network hardware for packet drops (we had outifDiscards interrupting zookeeper messages, Region servers would suicide during packet loss), just use ping -f to test for packet loss between core nodes.
  • 19. JVM GC does take lots of CPU when misconfigured – e.g. Small NewSize
  • 20. Single Namenode? No problem, just build two clusters have your APP tier do log query replication and replays when needed.
  • 21. Inexpensive 2TB hitachi disks (~$100) work great, get more units for your money.
  • 22.
  • 23. 2. Setup HDFS to work flawlessly (pay attention to ulimits, thread limits, hardware stats, graphs, iowait, etc)
  • 24. 3. Adjust JVM GC NewSize to be at least 100MB (if YG GC is too slow for 100MB, you need faster CPUs).
  • 25. 4. For metadata rows (small rows) adjust your Hbase block size to be 4 or 8kb, you will see less IO and more blocks will fit into RAM.
  • 26.
  • 27.
  • 28. Memstore size graph should be fairly flat with even flushes over time.
  • 29. Iowait graphs should not go over 70-80% during major compaction, and 20% during minor compactions. Otherwise just add more disks and/or nodes.
  • 30. Monitor and graph Thrift threads (via ps -eLf | grep PID), if your threads end up over 25,000, you may run out of RAM. We have dedicated thrift boxes so that we don't accidently kill RS nodes.
  • 31. We use Nagios to monitor and alert for DN, RS, ZK, NN, etc on their web tcp ports – very helpful.
  • 32. Run hbck to check for consistency of meta structures.
  • 33.
  • 34. Various RAM brands – boxes crash for no reason.
  • 35. Glibc in FC13 had race condition bug, would lock up nodes, crash JVM processes under high load. Solution: yum -y update glibc (invalid binfree)
  • 36. When running in mixed hardware environment, some boxes were slow enough to affect HDFS for the whole cluster – looking at “runnable threads” and “fsreadlatency” in Ganglia always pointed which boxes were 'slow'
  • 37. Running cloudera HDFS under user 'hadoop', that was restricted to 1024 threads by default would crash datanodes, but only during compactions. Setting hadoop soft(and hard) nproc 32,000 in limits.conf resolved it.
  • 38. GC sometimes autotunes NewSize of 20MB, caused GC run to 20 or 30 per second, causing CPU to flatline at 100% and kill the RS. Manually setting to 128MB resolved this issue.
  • 39.
  • 42. Fast – 0.5 ms puts, 2-3ms reads, 10ms disk reads.
  • 43. Recovers quickly when nodes are taken down
  • 44. Oncall team can finally relax
  • 45.
  • 46. Load test HBASE with YCSB – just leave it running for a week, if nothing crashes, you are good. Best not to test with live user traffic :)
  • 47. Do not worry about Namenode redundancy, just backup /name dir frequently. Setup secondary Hbase cluster with the money you save on not buying 'Server' grade nodes.
  • 48. Burn in your disks, even if they are new
  • 49. Put Memcached between your App. Tier and Hbase, App. Bugs will hit memcached first, keeping hbase safe from the assault, which could drive your utilization.
  • 50.
  • 53. And everyone else on the hbase user list who helped us out during the rough times.