SlideShare a Scribd company logo
1 of 90
Scaling Uber’s Elasticsearch as an Geo-Temporal Database
Danny Yuan @ Uber
InfoQ.com: News & Community Site
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
uber-elasticsearch-clusters
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon London
www.qconlondon.com
Use Cases for a Geo-Temporal Database
Real-time Decisions on Global Scale
Dynamic Pricing: Every Hexagon, Every Minute
Dynamic Pricing: Every Hexagon, Every Minute
Metrics: how many UberXs were in a trip in the past 10 minutes
Metrics: how many UberXs were in a trip in the past 10 minutes
Market Analysis: Travel Times
Forecasting: Granular Forecasting of Rider Demand
How Can We Produce Geo-Temporal Data for Ever Changing Business Needs?
Key Question: What Is the Right Abstraction?
Abstraction: Single-Table OLAP on Geo-Temporal Data
Abstraction: Single-Table OLAP on Geo-Temporal Data
SELECT			<agg	functions>,	<dimensions>	

FROM					<data_source>

WHERE				<boolean	filter>

GROUP	BY	<dimensions>

HAVING			<boolean	filter>

ORDER	BY	<sorting	criterial>

LIMIT				<n>

Abstraction: Single-Table OLAP on Geo-Temporal Data
SELECT			<agg	functions>,	<dimensions>	

FROM					<data_source>

WHERE				<boolean	filter>

GROUP	BY	<dimensions>

HAVING			<boolean	filter>

ORDER	BY	<sorting	criterial>

LIMIT				<n>

Why Elasticsearch?
- Arbitrary boolean query
- Sub-second response time
- Built-in distributed aggregation functions
- High-cardinality queries
- Idempotent insertion to deduplicate data
- Second-level data freshness
- Scales with data volume
- Operable by small team
Current Scale: An Important Context
- Ingestion: 850K to 1.3M messages/second
- Ingestion volume: 12TB / day
- Doc scans: 100M to 4B docs/ second
- Data size: 1 PB
- Cluster size: 700 ElasticSearch Machines
- Ingestion pipeline: 100+ Data Pipeline Jobs
Our Story of Scaling Elasticsearch
Three Dimensions of Scale
Ingestion
Query
Operation
Driving Principles
- Optimize for fast iteration
- Optimize for simple operations
- Optimize for automation and tools
- Optimize for being reasonably fast
The Past: We Started Small
Constraints for Being Small
- Three-person team
- Two data centers
- Small set of requirements: common analytics for machines
First Order of Business: Take Care of the Basics
Get Single-Node Right: Follow the 20-80 Rule
- One table <—> multiple indices by time range
- Disable _source field
- Disable _all field
- Use doc_values for storage
- Disable analyzed field
- Tune JVM parameters
Make Decisions with Numbers
- What’s the maximum number of recovery threads?
- What’s the maximum size of request queue?
- What should the refresh rate be?
- How many shards should an index have?
- What’s the throttling threshold?
- Solution: Set up end-to-end stress testing framework
Deployment in Two Data Centers
- Each data center has exclusive set of cities
- Should tolerate failure of a single data center
- Ingestion should continue to work
- Querying any city should return correct results
Deployment in Two Data Centers: trade space for availability
Deployment in Two Data Centers: trade space for availability
Deployment in Two Data Centers: trade space for availability
Discretize Geo Locations: H3
Optimizations to Ingestion
Optimizations to Ingestion
Dealing with Large Volume of Data
- An event source produces more than 3TB every day
- Key insight: human does not need too granular data
- Key insight: stream data usually has lots of redundancy
- Pruning unnecessary fields
- Devise algorithms to remove redundancy
- 3TB —> 42 GB, more than 70x of reduction!
- Bulk write
Dealing with Large Volume of Data
Data Modeling Matters
Example: Efficient and Reliable Join
- Example: Calculate Completed/Requested ratio with two different event streams
Example: Efficient and Reliable Join: Use Elasticsearch
- Calculate Completed/Requested ratio from two Kafka topics
- Can we use streaming join?
- Can we join on the query side?
- Solution: rendezvous at Elasticsearch on trip ID
TripID Pickup Time Completed
1 2018-02-03T… TRUE
2 2018-02-3T… FALSE
Example: aggregation on state transitions
Optimize Querying Elasticsearch
Hide Query Optimization from Users
- Do we really expect every user to write Elasticsearch queries?
- What if someone issues a very expensive query?
- Solution: Isolation with a query layer
Query Layer with Multiple Clusters
Query Layer with Multiple Clusters
Query Layer with Multiple Clusters
- Generate efficient Elasticsearch queries
- Rejecting expensive queries
- Routing queries - hardcoded first
Efficient Query Generation
- “GROUP BY a, b”
Rejecting Expensive Queries
- 10,000 hexagons / city x 1440 minutes per day x 800 cities
- Cardinality: 11 Billion (!) buckets —> Out Of Memory Error
Routing Queries
"DEMAND": {
"CLUSTERS": {
"TIER0": {
"CLUSTERS": ["ES_CLUSTER_TIER0"],
},
"TIER2": {
"CLUSTERS": ["ES_CLUSTER_TIER2"]
}
},
"INDEX": "MARKETPLACE_DEMAND-",
"SUFFIXFORMAT": “YYYYMM.WW",
"ROUTING": “PRODUCT_ID”,
}
Routing Queries
"DEMAND": {
"CLUSTERS": {
"TIER0": {
"CLUSTERS": ["ES_CLUSTER_TIER0"],
},
"TIER2": {
"CLUSTERS": ["ES_CLUSTER_TIER2"]
}
},
"INDEX": "MARKETPLACE_DEMAND-",
"SUFFIXFORMAT": “YYYYMM.WW",
"ROUTING": “PRODUCT_ID”,
}
Routing Queries
"DEMAND": {
"CLUSTERS": {
"TIER0": {
"CLUSTERS": ["ES_CLUSTER_TIER0"],
},
"TIER2": {
"CLUSTERS": ["ES_CLUSTER_TIER2"]
}
},
"INDEX": "MARKETPLACE_DEMAND-",
"SUFFIXFORMAT": “YYYYMM.WW",
"ROUTING": “PRODUCT_ID”,
}
Routing Queries
"DEMAND": {
"CLUSTERS": {
"TIER0": {
"CLUSTERS": ["ES_CLUSTER_TIER0"],
},
"TIER2": {
"CLUSTERS": ["ES_CLUSTER_TIER2"]
}
},
"INDEX": "MARKETPLACE_DEMAND-",
"SUFFIXFORMAT": “YYYYMM.WW",
"ROUTING": “PRODUCT_ID”,
}
Summary of First Iteration
Evolution: Success Breeds Failures
Unexpected Surges
Applications Went Haywire
Solution: Distributed Rate limiting
Solution: Distributed Rate limiting
Per-Cluster Rate Limit
Solution: Distributed Rate limiting
Per-Instance Rate Limit
Workload Evolved
- Users query months of data for modeling and complex analytics
- Key insight: Data can be a little stale for long-range queries
- Solution: Caching layer and delayed execution
Time Series Cache
Time Series Cache
- Redis as the cache store
- Cache key is based on normalized query content and time range
Time Series Cache
- Redis as the cache store
- Cache key is based on normalized query content and time range
Time Series Cache
- Redis as the cache store
- Cache key is based on normalized query content and time range
Time Series Cache
- Redis as the cache store
- Cache key is based on normalized query content and time range
Time Series Cache
- Redis as the cache store
- Cache key is based on normalized query content and time range
Time Series Cache
- Redis as the cache store
- Cache key is based on normalized query content and time range
Delayed Execution
- Allow registering long-running queries
- Provide cached but stale data for such queries
- Dedicated cluster and queued executions
- Rationale: three months of data vs a few hours of staleness
- Example: [-30d, 0d] —> [-30d, -1d]
Scale Operations
- Make the system transparent
- Optimize for MTTR - mean time to recover
- Strive for consistency
- Automation is the most effective way to get consistency
Driving Principles
- Cluster slowed down with all metrics being normal
- Requires additional instrumentation
- ES Plugin as a solution
Challenge: Diagnosis
- Elasticsearch cluster becomes harder to operate as its size increases
- MTTR increases as cluster size increases
- Multi-tenancy becomes a huge issue
- Can’t have too many shards
Challenge: Cluster Size Becomes an Enemy
- 3 clusters —> many smaller clusters
- Dynamic routing
- Meta-data driven
Federation
Federation
Federation
Federation
Federation
Federation
Federation
How Can We Trust the Data?
Self-Serving Trust System
Self-Serving Trust System
Self-Serving Trust System
Self-Serving Trust System
Too Much Manual Maintenance Work
- Adjusting queue size
- Restart machines
- Relocating shards
Too Much Manual Maintenance Work
Auto Ops
Auto Ops
Ongoing Work for the Future
- Strong reliability
- Strong consistency among replicas
- Multi-tenancy
Future Work
Summary
- Three dimensions of scaling: ingestion, query, and operations
- Be simple and practical: successful systems emerge from simple ones
- Abstraction and data modeling matter
- Invest in thorough instrumentation
- Invest in automation and tools
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
uber-elasticsearch-clusters

More Related Content

More from C4Media

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinC4Media
 

More from C4Media (20)

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 

Recently uploaded (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Scaling Uber's Elasticsearch Clusters

  • 1. Scaling Uber’s Elasticsearch as an Geo-Temporal Database Danny Yuan @ Uber
  • 2. InfoQ.com: News & Community Site Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ uber-elasticsearch-clusters • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon London www.qconlondon.com
  • 4. Use Cases for a Geo-Temporal Database
  • 5. Real-time Decisions on Global Scale
  • 6. Dynamic Pricing: Every Hexagon, Every Minute
  • 7. Dynamic Pricing: Every Hexagon, Every Minute
  • 8. Metrics: how many UberXs were in a trip in the past 10 minutes
  • 9. Metrics: how many UberXs were in a trip in the past 10 minutes
  • 12. How Can We Produce Geo-Temporal Data for Ever Changing Business Needs?
  • 13. Key Question: What Is the Right Abstraction?
  • 14. Abstraction: Single-Table OLAP on Geo-Temporal Data
  • 15. Abstraction: Single-Table OLAP on Geo-Temporal Data SELECT <agg functions>, <dimensions> 
 FROM <data_source>
 WHERE <boolean filter>
 GROUP BY <dimensions>
 HAVING <boolean filter>
 ORDER BY <sorting criterial>
 LIMIT <n>

  • 16. Abstraction: Single-Table OLAP on Geo-Temporal Data SELECT <agg functions>, <dimensions> 
 FROM <data_source>
 WHERE <boolean filter>
 GROUP BY <dimensions>
 HAVING <boolean filter>
 ORDER BY <sorting criterial>
 LIMIT <n>

  • 17. Why Elasticsearch? - Arbitrary boolean query - Sub-second response time - Built-in distributed aggregation functions - High-cardinality queries - Idempotent insertion to deduplicate data - Second-level data freshness - Scales with data volume - Operable by small team
  • 18. Current Scale: An Important Context - Ingestion: 850K to 1.3M messages/second - Ingestion volume: 12TB / day - Doc scans: 100M to 4B docs/ second - Data size: 1 PB - Cluster size: 700 ElasticSearch Machines - Ingestion pipeline: 100+ Data Pipeline Jobs
  • 19. Our Story of Scaling Elasticsearch
  • 20. Three Dimensions of Scale Ingestion Query Operation
  • 21. Driving Principles - Optimize for fast iteration - Optimize for simple operations - Optimize for automation and tools - Optimize for being reasonably fast
  • 22. The Past: We Started Small
  • 23. Constraints for Being Small - Three-person team - Two data centers - Small set of requirements: common analytics for machines
  • 24. First Order of Business: Take Care of the Basics
  • 25. Get Single-Node Right: Follow the 20-80 Rule - One table <—> multiple indices by time range - Disable _source field - Disable _all field - Use doc_values for storage - Disable analyzed field - Tune JVM parameters
  • 26. Make Decisions with Numbers - What’s the maximum number of recovery threads? - What’s the maximum size of request queue? - What should the refresh rate be? - How many shards should an index have? - What’s the throttling threshold? - Solution: Set up end-to-end stress testing framework
  • 27. Deployment in Two Data Centers - Each data center has exclusive set of cities - Should tolerate failure of a single data center - Ingestion should continue to work - Querying any city should return correct results
  • 28. Deployment in Two Data Centers: trade space for availability
  • 29. Deployment in Two Data Centers: trade space for availability
  • 30. Deployment in Two Data Centers: trade space for availability
  • 34. Dealing with Large Volume of Data - An event source produces more than 3TB every day - Key insight: human does not need too granular data - Key insight: stream data usually has lots of redundancy
  • 35. - Pruning unnecessary fields - Devise algorithms to remove redundancy - 3TB —> 42 GB, more than 70x of reduction! - Bulk write Dealing with Large Volume of Data
  • 37. Example: Efficient and Reliable Join - Example: Calculate Completed/Requested ratio with two different event streams
  • 38. Example: Efficient and Reliable Join: Use Elasticsearch - Calculate Completed/Requested ratio from two Kafka topics - Can we use streaming join? - Can we join on the query side? - Solution: rendezvous at Elasticsearch on trip ID TripID Pickup Time Completed 1 2018-02-03T… TRUE 2 2018-02-3T… FALSE
  • 39. Example: aggregation on state transitions
  • 41. Hide Query Optimization from Users - Do we really expect every user to write Elasticsearch queries? - What if someone issues a very expensive query? - Solution: Isolation with a query layer
  • 42. Query Layer with Multiple Clusters
  • 43. Query Layer with Multiple Clusters
  • 44. Query Layer with Multiple Clusters - Generate efficient Elasticsearch queries - Rejecting expensive queries - Routing queries - hardcoded first
  • 45. Efficient Query Generation - “GROUP BY a, b”
  • 46. Rejecting Expensive Queries - 10,000 hexagons / city x 1440 minutes per day x 800 cities - Cardinality: 11 Billion (!) buckets —> Out Of Memory Error
  • 47. Routing Queries "DEMAND": { "CLUSTERS": { "TIER0": { "CLUSTERS": ["ES_CLUSTER_TIER0"], }, "TIER2": { "CLUSTERS": ["ES_CLUSTER_TIER2"] } }, "INDEX": "MARKETPLACE_DEMAND-", "SUFFIXFORMAT": “YYYYMM.WW", "ROUTING": “PRODUCT_ID”, }
  • 48. Routing Queries "DEMAND": { "CLUSTERS": { "TIER0": { "CLUSTERS": ["ES_CLUSTER_TIER0"], }, "TIER2": { "CLUSTERS": ["ES_CLUSTER_TIER2"] } }, "INDEX": "MARKETPLACE_DEMAND-", "SUFFIXFORMAT": “YYYYMM.WW", "ROUTING": “PRODUCT_ID”, }
  • 49. Routing Queries "DEMAND": { "CLUSTERS": { "TIER0": { "CLUSTERS": ["ES_CLUSTER_TIER0"], }, "TIER2": { "CLUSTERS": ["ES_CLUSTER_TIER2"] } }, "INDEX": "MARKETPLACE_DEMAND-", "SUFFIXFORMAT": “YYYYMM.WW", "ROUTING": “PRODUCT_ID”, }
  • 50. Routing Queries "DEMAND": { "CLUSTERS": { "TIER0": { "CLUSTERS": ["ES_CLUSTER_TIER0"], }, "TIER2": { "CLUSTERS": ["ES_CLUSTER_TIER2"] } }, "INDEX": "MARKETPLACE_DEMAND-", "SUFFIXFORMAT": “YYYYMM.WW", "ROUTING": “PRODUCT_ID”, }
  • 51. Summary of First Iteration
  • 56. Solution: Distributed Rate limiting Per-Cluster Rate Limit
  • 57. Solution: Distributed Rate limiting Per-Instance Rate Limit
  • 58. Workload Evolved - Users query months of data for modeling and complex analytics - Key insight: Data can be a little stale for long-range queries - Solution: Caching layer and delayed execution
  • 60. Time Series Cache - Redis as the cache store - Cache key is based on normalized query content and time range
  • 61. Time Series Cache - Redis as the cache store - Cache key is based on normalized query content and time range
  • 62. Time Series Cache - Redis as the cache store - Cache key is based on normalized query content and time range
  • 63. Time Series Cache - Redis as the cache store - Cache key is based on normalized query content and time range
  • 64. Time Series Cache - Redis as the cache store - Cache key is based on normalized query content and time range
  • 65. Time Series Cache - Redis as the cache store - Cache key is based on normalized query content and time range
  • 66. Delayed Execution - Allow registering long-running queries - Provide cached but stale data for such queries - Dedicated cluster and queued executions - Rationale: three months of data vs a few hours of staleness - Example: [-30d, 0d] —> [-30d, -1d]
  • 68. - Make the system transparent - Optimize for MTTR - mean time to recover - Strive for consistency - Automation is the most effective way to get consistency Driving Principles
  • 69. - Cluster slowed down with all metrics being normal - Requires additional instrumentation - ES Plugin as a solution Challenge: Diagnosis
  • 70. - Elasticsearch cluster becomes harder to operate as its size increases - MTTR increases as cluster size increases - Multi-tenancy becomes a huge issue - Can’t have too many shards Challenge: Cluster Size Becomes an Enemy
  • 71. - 3 clusters —> many smaller clusters - Dynamic routing - Meta-data driven Federation
  • 78. How Can We Trust the Data?
  • 83. Too Much Manual Maintenance Work
  • 84. - Adjusting queue size - Restart machines - Relocating shards Too Much Manual Maintenance Work
  • 87. Ongoing Work for the Future
  • 88. - Strong reliability - Strong consistency among replicas - Multi-tenancy Future Work
  • 89. Summary - Three dimensions of scaling: ingestion, query, and operations - Be simple and practical: successful systems emerge from simple ones - Abstraction and data modeling matter - Invest in thorough instrumentation - Invest in automation and tools
  • 90. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ uber-elasticsearch-clusters