SlideShare una empresa de Scribd logo
1 de 28
Consulting Engineer, MongoDB
André Spiegel
#MongoDBWorld
The Weather of the Century
Part II: High Performance
What was the weather
when you were born?
Data Format: Raw and in MongoDB
0303725053947282013060322517+40779-073969FM-15+0048KNYC
V0309999C00005030485MN0080475N5+02115+02005100975
ADDAA101000095AU100001015AW1105GA1025+016765999GA2045+024385999
GA3075+030485999GD11991+0167659GD22991+0243859GD33991+0304859...
{
"st" : "u725053",
"ts" : ISODate("2013-06-03T22:51:00Z"),
"airTemperature" : {
"value" : 21.1,
"quality" : "5"
},
"atmosphericPressure" : {
"value" : 1009.7,
"quality" : "5"
}
}
Station Identifier
(»NYC Central Park«)
How Big Is It?
• 2.5 billion data points
• 4 Terabyte (1.6k per document)
• “moderately big”
How to do this with MongoDB?
First Deployment
• Asingle server with a really big disk
Application mongod
i2.8xlarge
251 GB RAM
6 TB SSD
c3.8xlarge
Second Deployment
• Areally big cluster where everything is in RAM
Application / mongos
...
100 x r3.2xlarge
61 GB RAM
@
100 GB disk
mongod
c3.8xlarge
Second Deployment
• Areally big cluster where everything is in RAM
Application / mongos
...
100 x r3.2xlarge
61 GB RAM
@
100 GB disk
mongod
Now... how much would you pay?
..
$60,000 / yr
$700,000 / yr
Use Cases
• Bulk loading
– getting all data into the system
• Latency and throughput for queries
– point in space-time
– one station, one year
– the whole world, once upon a time
• Aggregation and Exploration
– warmest and coldest day ever, etc.
Bulk Loading: Principles
• On the application side:
– batch size
– number of client threads
– use unordered bulk writes
• On the server side:
– Journaling off ( temporarily! )
– Index later
– In cluster: pre-split, no balancing
Bulk Loading: Single Server
batch
size
threads
through
put
8 threads,
batch size 100
→ 85,000 doc/s
Bulk Loading: Single Server
• Settings: 8 threads
batch size 100
• Total loading time: 10 h 20 min
• Documents per second: 70,000
• Index build time: 7 h 40 min (ts_1_st_1)
Bulk Loading: Cluster
144 threads,
batch size 200
→ 220,000 doc/s
Bulk Loading: Cluster
• Shard Key: Station ID, hashed
• Settings: 10 mongos @ 144
threads
batch size 200
• Total loading time: 3 h 10 min
• Documents per second: 228,000
• Index build time: 5 min (ts_1_st_1)
Queries: Point in Space-Time
db.data.find({"st" : "u747940",
"ts" : ISODate("1969-07-16T12:00:00Z")})
Queries: Point in Space-Time
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
single server cluster
ms
avg
95th
99th
max.
throughput:
40,000/s 610,000/s
(10 mongos)
db.data.find({"st" : "u747940",
"ts" : ISODate("1969-07-16T12:00:00Z")})
Queries: One Station, One Year
db.data.find({"st" : "u103840",
"ts" : {"$gte": ISODate("1989-01-01"),
"$lt" : ISODate("1990-01-01")}})
0
1000
2000
3000
4000
5000
single server cluster
ms
avg
95th
99th
Queries: One Station, One Year
max.
throughput: 20/s 430/s
(10 mongos)
targeted query
db.data.find({"st" : "u103840",
"ts" : {"$gte": ISODate("1989-01-01"),
"$lt" : ISODate("1990-01-01")}})
Queries: The Whole World, Once
Upon...
db.data.find({"ts" : ISODate("2000-01-01T00:00:00Z")})
0
2000
4000
6000
8000
10000
single server cluster
ms
avg
95th
99th
Queries: The Whole World, Once
Upon...
max.
throughput: 8/s
310/s
(10 mongos)
scatter/gather query
db.data.find({"ts" : ISODate("2000-01-01T00:00:00Z")})
Analytics and Exploration
• Analytics means ad-hoc queries for which
we do not have an index
– Find all tornados
– Maximum reported temperature
• We cannot just index everything
– memory
– write performance
Analytics: Find all Tornados
db.data.find ({
"presentWeatherObservation.condition" : "99"
})
47 s
Cluster
1 h 28 min
Single Server
Analytics: Maximum Temperature
db.data.aggregate ([
{ "$match" : { "airTemperature.quality" :
{ "$in" : [ "1", "5" ] } } },
{ "$group" : { "_id" : null,
"maxTemp" : { "$max" :
"$airTemperature.value" } } }
])
61.8 °C = 143 °F
2 min
Cluster
4 h 45 min
Single Server
Summary: Single Server
Pro
• Cost-effective
• Very good latency for single queries
Con
• Some operations are prohibitive:
– Indexing
– Table Scans
Summary: Cluster
Con
• High cost
Pro
• High throughput
• Very good latency for single queries
• Scatter-gather yields significant speed-up
• Analytics are possible
..
Thank you.

Más contenido relacionado

La actualidad más candente

CloudClustering: Toward a scalable machine learning toolkit for Windows Azure
CloudClustering: Toward a scalable machine learning toolkit for Windows AzureCloudClustering: Toward a scalable machine learning toolkit for Windows Azure
CloudClustering: Toward a scalable machine learning toolkit for Windows Azure
Ankur Dave
 
Data structure programs in c++
Data structure programs in c++Data structure programs in c++
Data structure programs in c++
mmirfan
 

La actualidad más candente (20)

The Ring programming language version 1.5.1 book - Part 63 of 180
The Ring programming language version 1.5.1 book - Part 63 of 180The Ring programming language version 1.5.1 book - Part 63 of 180
The Ring programming language version 1.5.1 book - Part 63 of 180
 
The Ring programming language version 1.9 book - Part 78 of 210
The Ring programming language version 1.9 book - Part 78 of 210The Ring programming language version 1.9 book - Part 78 of 210
The Ring programming language version 1.9 book - Part 78 of 210
 
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
 
Using PyPy instead of Python for speed
Using PyPy instead of Python for speedUsing PyPy instead of Python for speed
Using PyPy instead of Python for speed
 
CloudClustering: Toward a scalable machine learning toolkit for Windows Azure
CloudClustering: Toward a scalable machine learning toolkit for Windows AzureCloudClustering: Toward a scalable machine learning toolkit for Windows Azure
CloudClustering: Toward a scalable machine learning toolkit for Windows Azure
 
Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)
 
The Ring programming language version 1.7 book - Part 67 of 196
The Ring programming language version 1.7 book - Part 67 of 196The Ring programming language version 1.7 book - Part 67 of 196
The Ring programming language version 1.7 book - Part 67 of 196
 
The Ring programming language version 1.6 book - Part 70 of 189
The Ring programming language version 1.6 book - Part 70 of 189The Ring programming language version 1.6 book - Part 70 of 189
The Ring programming language version 1.6 book - Part 70 of 189
 
Gnocchi Profiling v2
Gnocchi Profiling v2Gnocchi Profiling v2
Gnocchi Profiling v2
 
JVM performance options. How it works
JVM performance options. How it worksJVM performance options. How it works
JVM performance options. How it works
 
Data structure programs in c++
Data structure programs in c++Data structure programs in c++
Data structure programs in c++
 
Gnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.x
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioA Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.io
 
Gnocchi v4 (preview)
Gnocchi v4 (preview)Gnocchi v4 (preview)
Gnocchi v4 (preview)
 
The Ring programming language version 1.9 book - Part 73 of 210
The Ring programming language version 1.9 book - Part 73 of 210The Ring programming language version 1.9 book - Part 73 of 210
The Ring programming language version 1.9 book - Part 73 of 210
 
Zone.js 2017
Zone.js 2017Zone.js 2017
Zone.js 2017
 
The Ring programming language version 1.5.1 book - Part 64 of 180
The Ring programming language version 1.5.1 book - Part 64 of 180The Ring programming language version 1.5.1 book - Part 64 of 180
The Ring programming language version 1.5.1 book - Part 64 of 180
 
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
 
The Ring programming language version 1.3 book - Part 44 of 88
The Ring programming language version 1.3 book - Part 44 of 88The Ring programming language version 1.3 book - Part 44 of 88
The Ring programming language version 1.3 book - Part 44 of 88
 

Similar a The Weather of the Century Part 2: High Performance

Thermal modeling and management of cluster storage systems xunfei jiang 2014
Thermal modeling and management of cluster storage systems xunfei jiang 2014Thermal modeling and management of cluster storage systems xunfei jiang 2014
Thermal modeling and management of cluster storage systems xunfei jiang 2014
Xiao Qin
 
Mongo db world 2014 billrun
Mongo db world 2014   billrunMongo db world 2014   billrun
Mongo db world 2014 billrun
MongoDB
 

Similar a The Weather of the Century Part 2: High Performance (20)

Building and Scaling the Internet of Things with MongoDB at Vivint
Building and Scaling the Internet of Things with MongoDB at Vivint Building and Scaling the Internet of Things with MongoDB at Vivint
Building and Scaling the Internet of Things with MongoDB at Vivint
 
Cuse2
Cuse2Cuse2
Cuse2
 
An introduction to Deep Learning with Apache MXNet (November 2017)
An introduction to Deep Learning with Apache MXNet (November 2017)An introduction to Deep Learning with Apache MXNet (November 2017)
An introduction to Deep Learning with Apache MXNet (November 2017)
 
Thermal modeling and management of cluster storage systems xunfei jiang 2014
Thermal modeling and management of cluster storage systems xunfei jiang 2014Thermal modeling and management of cluster storage systems xunfei jiang 2014
Thermal modeling and management of cluster storage systems xunfei jiang 2014
 
GC Tuning & Troubleshooting Crash Course
GC Tuning & Troubleshooting Crash CourseGC Tuning & Troubleshooting Crash Course
GC Tuning & Troubleshooting Crash Course
 
Become a Garbage Collection Hero
Become a Garbage Collection HeroBecome a Garbage Collection Hero
Become a Garbage Collection Hero
 
Pick diamonds from garbage
Pick diamonds from garbagePick diamonds from garbage
Pick diamonds from garbage
 
Capacity Planning for Linux Systems
Capacity Planning for Linux SystemsCapacity Planning for Linux Systems
Capacity Planning for Linux Systems
 
MongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDBMongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDB
 
Using Deep Learning (Computer Vision) to Search for Oil and Gas
Using Deep Learning (Computer Vision) to Search for Oil and GasUsing Deep Learning (Computer Vision) to Search for Oil and Gas
Using Deep Learning (Computer Vision) to Search for Oil and Gas
 
Ns fundamentals 1
Ns fundamentals 1Ns fundamentals 1
Ns fundamentals 1
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data Types
 
Stream-based Data Synchronization
Stream-based Data SynchronizationStream-based Data Synchronization
Stream-based Data Synchronization
 
Mongo db world 2014 billrun
Mongo db world 2014   billrunMongo db world 2014   billrun
Mongo db world 2014 billrun
 
Log
LogLog
Log
 
OpenZFS data-driven performance
OpenZFS data-driven performanceOpenZFS data-driven performance
OpenZFS data-driven performance
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
 
Priority assignment on the mp so c with dmac
Priority assignment on the mp so c with dmacPriority assignment on the mp so c with dmac
Priority assignment on the mp so c with dmac
 
MongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big DataMongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big Data
 
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
 

Más de MongoDB

Más de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Último

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

The Weather of the Century Part 2: High Performance

  • 1. Consulting Engineer, MongoDB André Spiegel #MongoDBWorld The Weather of the Century Part II: High Performance
  • 2. What was the weather when you were born?
  • 3.
  • 4. Data Format: Raw and in MongoDB 0303725053947282013060322517+40779-073969FM-15+0048KNYC V0309999C00005030485MN0080475N5+02115+02005100975 ADDAA101000095AU100001015AW1105GA1025+016765999GA2045+024385999 GA3075+030485999GD11991+0167659GD22991+0243859GD33991+0304859... { "st" : "u725053", "ts" : ISODate("2013-06-03T22:51:00Z"), "airTemperature" : { "value" : 21.1, "quality" : "5" }, "atmosphericPressure" : { "value" : 1009.7, "quality" : "5" } } Station Identifier (»NYC Central Park«)
  • 5. How Big Is It? • 2.5 billion data points • 4 Terabyte (1.6k per document) • “moderately big”
  • 6. How to do this with MongoDB?
  • 7. First Deployment • Asingle server with a really big disk Application mongod i2.8xlarge 251 GB RAM 6 TB SSD c3.8xlarge
  • 8. Second Deployment • Areally big cluster where everything is in RAM Application / mongos ... 100 x r3.2xlarge 61 GB RAM @ 100 GB disk mongod c3.8xlarge
  • 9. Second Deployment • Areally big cluster where everything is in RAM Application / mongos ... 100 x r3.2xlarge 61 GB RAM @ 100 GB disk mongod
  • 10. Now... how much would you pay? .. $60,000 / yr $700,000 / yr
  • 11. Use Cases • Bulk loading – getting all data into the system • Latency and throughput for queries – point in space-time – one station, one year – the whole world, once upon a time • Aggregation and Exploration – warmest and coldest day ever, etc.
  • 12. Bulk Loading: Principles • On the application side: – batch size – number of client threads – use unordered bulk writes • On the server side: – Journaling off ( temporarily! ) – Index later – In cluster: pre-split, no balancing
  • 13. Bulk Loading: Single Server batch size threads through put 8 threads, batch size 100 → 85,000 doc/s
  • 14. Bulk Loading: Single Server • Settings: 8 threads batch size 100 • Total loading time: 10 h 20 min • Documents per second: 70,000 • Index build time: 7 h 40 min (ts_1_st_1)
  • 15. Bulk Loading: Cluster 144 threads, batch size 200 → 220,000 doc/s
  • 16. Bulk Loading: Cluster • Shard Key: Station ID, hashed • Settings: 10 mongos @ 144 threads batch size 200 • Total loading time: 3 h 10 min • Documents per second: 228,000 • Index build time: 5 min (ts_1_st_1)
  • 17. Queries: Point in Space-Time db.data.find({"st" : "u747940", "ts" : ISODate("1969-07-16T12:00:00Z")})
  • 18. Queries: Point in Space-Time 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 single server cluster ms avg 95th 99th max. throughput: 40,000/s 610,000/s (10 mongos) db.data.find({"st" : "u747940", "ts" : ISODate("1969-07-16T12:00:00Z")})
  • 19. Queries: One Station, One Year db.data.find({"st" : "u103840", "ts" : {"$gte": ISODate("1989-01-01"), "$lt" : ISODate("1990-01-01")}})
  • 20. 0 1000 2000 3000 4000 5000 single server cluster ms avg 95th 99th Queries: One Station, One Year max. throughput: 20/s 430/s (10 mongos) targeted query db.data.find({"st" : "u103840", "ts" : {"$gte": ISODate("1989-01-01"), "$lt" : ISODate("1990-01-01")}})
  • 21. Queries: The Whole World, Once Upon... db.data.find({"ts" : ISODate("2000-01-01T00:00:00Z")})
  • 22. 0 2000 4000 6000 8000 10000 single server cluster ms avg 95th 99th Queries: The Whole World, Once Upon... max. throughput: 8/s 310/s (10 mongos) scatter/gather query db.data.find({"ts" : ISODate("2000-01-01T00:00:00Z")})
  • 23. Analytics and Exploration • Analytics means ad-hoc queries for which we do not have an index – Find all tornados – Maximum reported temperature • We cannot just index everything – memory – write performance
  • 24. Analytics: Find all Tornados db.data.find ({ "presentWeatherObservation.condition" : "99" }) 47 s Cluster 1 h 28 min Single Server
  • 25. Analytics: Maximum Temperature db.data.aggregate ([ { "$match" : { "airTemperature.quality" : { "$in" : [ "1", "5" ] } } }, { "$group" : { "_id" : null, "maxTemp" : { "$max" : "$airTemperature.value" } } } ]) 61.8 °C = 143 °F 2 min Cluster 4 h 45 min Single Server
  • 26. Summary: Single Server Pro • Cost-effective • Very good latency for single queries Con • Some operations are prohibitive: – Indexing – Table Scans
  • 27. Summary: Cluster Con • High cost Pro • High throughput • Very good latency for single queries • Scatter-gather yields significant speed-up • Analytics are possible ..