SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
MongoDB 3.0, Wired Tiger,
and the era of pluggable storage engines
Kenny Gorman
Chief Technologist; Data at Rackspace
Co-Founder, ObjectRocket
@rackspace @kennygorman
Whats up with MongoDB 3.0+
● Storage Engines
○ Pluggable storage engine API
○ Wired Tiger, mmapV1,
TokuSE, …
○ MongoDB ships with
mmapV1, and Wired Tiger
● Why do I care?
● It’s early
Wired Tiger
● Acquisition of Wired Tiger Inc
● Technology plus Architects
○ Sleepycat Software
○ Berkeley DB
● Proper DB Engine
○ MVCC
○ Durability
○ Compression
○ Performance
○ Full docs
○ LSM and Btree indexing
Wired Tiger and MongoDB
● New configuration options
● Different file format and layout
● Allocates files at write time.
● Write ahead log (WAL)
● Wire protocol stays same
● Oplog format changes
● Memory allocated for page cache
● Subset of WT features implemented
● Traditional tools are aware of different
engines
New WT stuff to consider
● files allocated at write time, no preallocation.
● no padding factor, no powerof2sizes
● indexes and data locations may be separate
● no huge pages, keep it off (mostly like before)
● NUMA configs: zone_reclaim == 0
● SSD or spinning disk, you have choices now
● A whole new set of configurables
● oh %^$! faults gone in mongostat!
● no LSM indexing yet
● driver updates
Wired Tiger configuration
--storageEngine wiredtiger
--wiredTigerCacheSizeGB 100
--wiredTigerEngineConfig "<option>=<setting>,<option>=<setting>"
--wiredTigerCollectionConfig "<option>=<setting>,<option>=<setting>"
--wiredTigerIndexConfig "<option>=<setting>,<option>=<setting>"
db.createCollection("<collectionName>",
{storageEngine: {wiredtiger: {configString:"<option>=<setting>,<option>=<setting>"}}});
Wired Tiger configuration
storage:
dbPath: "/data/mongodb"
journal:
enabled: true
engine: "wiredTiger"
wiredTiger:
engineConfig:
cacheSizeGB: 99
journalCompressor: none
directoryForIndexes: "/indexes/mongodb/"
collectionConfig:
blockCompressor: snappy
indexConfig:
prefixCompression: true
systemLog:
destination: file
path: "/tmp/mongodb.log"
logAppend: true
processManagement:
fork: true
net:
port: 9005
unixDomainSocket:
enabled : true
Wired Tiger Performance
● sysbench-mongodb
● 2.6.8 vs 3.0.0
● Mix I/U/D/Q
● Concurrency == Awesome
Wired Tiger Performance
sysbench-mongodb benchmark.
https://gist.github.com/kgorman/13bd5613a2c45e131edf
~120GB data, 99GB cache, PCIe Flash, Bare Metal
Wired Tiger Performance
sysbench-mongodb benchmark.
https://gist.github.com/kgorman/13bd5613a2c45e131edf
~120GB data, 99GB cache, PCIe Flash, Bare Metal
Wired Tiger Compression
● snappy / zlib for data (page/block level data compression)
● copy on write … ish
● prefix for indexes
● no dictionary compression nor Huffman encoding
● memory implications, expansion at read and CPU
The cache generally stores uncompressed changes (the exception is for very large documents). The default snappy compression is
fairly straightforward: it gathers data up to a maximum of 32KB, compresses it, and if compression is successful, writes the block
rounded up to the nearest 4KB.
The alternative zlib compression works a little differently: it will gather more data and compress enough to fill a 32KB block on disk.
This is more CPU intensive but generally results in better compression ratios (independent of the inherent differences between snappy
and zlib).
-Michael Cahill
Compression Tradeoffs
● CPU vs on disk compression vs performance
● memory usage profile, expansion of data in memory
● SSD vs spinning disk
YMMV
● Test yer shit
Compression Realities?
{
"_id" : ObjectId("53e4fcc42239c23dce3cb7bc"),
"station_id" : 8461490,
"loc" : {"type" : "Point","coordinates" : [-72.09,41.3614]},
"name" : "New London",
"products" : [
{"v" : 69.4,"t" : ISODate("2014-08-08T16:24:00Z"),"name" : "water_temperature","f" : "0,0,0"},
{"v" : 77,"t" : ISODate("2014-08-08T16:24:00Z"),"name" : "air_temperature","f" : "0,0,0"},
{"d" : "360.00","g" : "8.75","f" : "0,0","s" : "4.08","t" : ISODate("2014-08-08T16:24:00Z"),"dr" : "N","name" : "wind"},
{"v" : 1015.8,"t" : ISODate("2014-08-08T16:24:00Z"),"name" : "air_pressure","f" : "0,0,0"}
],
"fetch_date" : ISODate("2014-08-08T16:37:22.640Z"),
"id" : 8461490
}
● Sensor data, typical type of workload
● Compresses to 78% in memory, and 26% on disk
● https://gist.github.com/kgorman/7d48f37ec5b23efb1475
Various compression results
as measured with:
db.<collection>.stats[“size”] #memory size
db.<collection>.stats[“storageSize”] # on disk size
Wired Tiger Fragmentation
● Say it isn’t so!
● block_allocation and file_extend tunables
● in-place updates don’t exist in WT, so is fragmentation worse?
● Compaction w/o blocking?
http://source.wiredtiger.com/2.5.0/compact.html
Demo
Getting there from here
● Play today; *but wait a while for gods sake*
● 2.4 -> 2.6 -> 3.0
● Use replica sets to migrate seamlessly
● Can fallback if needed using replica sets
● http://docs.mongodb.org/manual/release-notes/3.0-upgrade/
● Go slow, be methodical
Key take-aways
● It’s a whole new world
● Start testing now()
● Understand the tradeoffs of compression, performance, and
CPU usage for *your* workloads and datasets
● Performance likely easy win
● Compression likely not
● YMMV
ObjectRocket/Rackspace + MongoDB 3.0
● Play today on cloud servers:
https://data.rackspace.com/blog/mongodb-3.0-getting-started/
● ObjectRocket is evaluating 3.0 just like everyone else
● Test in dev, move to prod
More reading
● http://docs.mongodb.org/v3.0/release-notes/3.0/
● http://www.wiredtiger.org
● https://data.rackspace.com/blog/mongodb-3.0-getting-started/
Contact
@kennygorman
@rackspace
kenny.gorman@rackspace.com
http://data.rackspace.com/databases/

Más contenido relacionado

La actualidad más candente

Storage talk
Storage talkStorage talk
Storage talk
christkv
 

La actualidad más candente (20)

WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 
Storage talk
Storage talkStorage talk
Storage talk
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 
WiredTiger Overview
WiredTiger OverviewWiredTiger Overview
WiredTiger Overview
 
Concurrency Control in MongoDB 3.0
Concurrency Control in MongoDB 3.0Concurrency Control in MongoDB 3.0
Concurrency Control in MongoDB 3.0
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
Ambry : Linkedin's Scalable Geo-Distributed Object Store
Ambry : Linkedin's Scalable Geo-Distributed Object StoreAmbry : Linkedin's Scalable Geo-Distributed Object Store
Ambry : Linkedin's Scalable Geo-Distributed Object Store
 
Mashing the data
Mashing the dataMashing the data
Mashing the data
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
 
Improvements in Bitsy 1.5
Improvements in Bitsy 1.5Improvements in Bitsy 1.5
Improvements in Bitsy 1.5
 
Inside CynosDB: MariaDB optimized for the cloud at Tencent
Inside CynosDB: MariaDB optimized for the cloud at TencentInside CynosDB: MariaDB optimized for the cloud at Tencent
Inside CynosDB: MariaDB optimized for the cloud at Tencent
 
Hybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxioHybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxio
 
M|18 How to use MyRocks with MariaDB Server
M|18 How to use MyRocks with MariaDB ServerM|18 How to use MyRocks with MariaDB Server
M|18 How to use MyRocks with MariaDB Server
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
 

Destacado

Destacado (8)

Introduction to new high performance storage engines in mongodb 3.0
Introduction to new high performance storage engines in mongodb 3.0Introduction to new high performance storage engines in mongodb 3.0
Introduction to new high performance storage engines in mongodb 3.0
 
Back to Basics: My First MongoDB Application
Back to Basics: My First MongoDB ApplicationBack to Basics: My First MongoDB Application
Back to Basics: My First MongoDB Application
 
Back to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica SetsBack to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica Sets
 
Back to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to ShardingBack to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to Sharding
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
 

Similar a Mongo db3.0 wired_tiger_storage_engine

Similar a Mongo db3.0 wired_tiger_storage_engine (20)

Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on Kubernetes
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
Log analytics with ELK stack
Log analytics with ELK stackLog analytics with ELK stack
Log analytics with ELK stack
 
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
 
Welcome to icehouse
Welcome to icehouseWelcome to icehouse
Welcome to icehouse
 
Varnish - PLNOG 4
Varnish - PLNOG 4Varnish - PLNOG 4
Varnish - PLNOG 4
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Ceph Tech Talk: Bluestore
Ceph Tech Talk: BluestoreCeph Tech Talk: Bluestore
Ceph Tech Talk: Bluestore
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheus
 
Fluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at ScaleFluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at Scale
 
Benchmarking your cloud performance with top 4 global public clouds
Benchmarking your cloud performance with top 4 global public cloudsBenchmarking your cloud performance with top 4 global public clouds
Benchmarking your cloud performance with top 4 global public clouds
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
 
PLNOG 4: Leszek Urbański - A modern HTTP accelerator for content providers
PLNOG 4: Leszek Urbański - A modern HTTP accelerator for content providersPLNOG 4: Leszek Urbański - A modern HTTP accelerator for content providers
PLNOG 4: Leszek Urbański - A modern HTTP accelerator for content providers
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql Loader
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Mongo db3.0 wired_tiger_storage_engine

  • 1. MongoDB 3.0, Wired Tiger, and the era of pluggable storage engines Kenny Gorman Chief Technologist; Data at Rackspace Co-Founder, ObjectRocket @rackspace @kennygorman
  • 2. Whats up with MongoDB 3.0+ ● Storage Engines ○ Pluggable storage engine API ○ Wired Tiger, mmapV1, TokuSE, … ○ MongoDB ships with mmapV1, and Wired Tiger ● Why do I care? ● It’s early
  • 3. Wired Tiger ● Acquisition of Wired Tiger Inc ● Technology plus Architects ○ Sleepycat Software ○ Berkeley DB ● Proper DB Engine ○ MVCC ○ Durability ○ Compression ○ Performance ○ Full docs ○ LSM and Btree indexing
  • 4. Wired Tiger and MongoDB ● New configuration options ● Different file format and layout ● Allocates files at write time. ● Write ahead log (WAL) ● Wire protocol stays same ● Oplog format changes ● Memory allocated for page cache ● Subset of WT features implemented ● Traditional tools are aware of different engines
  • 5. New WT stuff to consider ● files allocated at write time, no preallocation. ● no padding factor, no powerof2sizes ● indexes and data locations may be separate ● no huge pages, keep it off (mostly like before) ● NUMA configs: zone_reclaim == 0 ● SSD or spinning disk, you have choices now ● A whole new set of configurables ● oh %^$! faults gone in mongostat! ● no LSM indexing yet ● driver updates
  • 6. Wired Tiger configuration --storageEngine wiredtiger --wiredTigerCacheSizeGB 100 --wiredTigerEngineConfig "<option>=<setting>,<option>=<setting>" --wiredTigerCollectionConfig "<option>=<setting>,<option>=<setting>" --wiredTigerIndexConfig "<option>=<setting>,<option>=<setting>" db.createCollection("<collectionName>", {storageEngine: {wiredtiger: {configString:"<option>=<setting>,<option>=<setting>"}}});
  • 7. Wired Tiger configuration storage: dbPath: "/data/mongodb" journal: enabled: true engine: "wiredTiger" wiredTiger: engineConfig: cacheSizeGB: 99 journalCompressor: none directoryForIndexes: "/indexes/mongodb/" collectionConfig: blockCompressor: snappy indexConfig: prefixCompression: true systemLog: destination: file path: "/tmp/mongodb.log" logAppend: true processManagement: fork: true net: port: 9005 unixDomainSocket: enabled : true
  • 8. Wired Tiger Performance ● sysbench-mongodb ● 2.6.8 vs 3.0.0 ● Mix I/U/D/Q ● Concurrency == Awesome
  • 9. Wired Tiger Performance sysbench-mongodb benchmark. https://gist.github.com/kgorman/13bd5613a2c45e131edf ~120GB data, 99GB cache, PCIe Flash, Bare Metal
  • 10. Wired Tiger Performance sysbench-mongodb benchmark. https://gist.github.com/kgorman/13bd5613a2c45e131edf ~120GB data, 99GB cache, PCIe Flash, Bare Metal
  • 11. Wired Tiger Compression ● snappy / zlib for data (page/block level data compression) ● copy on write … ish ● prefix for indexes ● no dictionary compression nor Huffman encoding ● memory implications, expansion at read and CPU The cache generally stores uncompressed changes (the exception is for very large documents). The default snappy compression is fairly straightforward: it gathers data up to a maximum of 32KB, compresses it, and if compression is successful, writes the block rounded up to the nearest 4KB. The alternative zlib compression works a little differently: it will gather more data and compress enough to fill a 32KB block on disk. This is more CPU intensive but generally results in better compression ratios (independent of the inherent differences between snappy and zlib). -Michael Cahill
  • 12. Compression Tradeoffs ● CPU vs on disk compression vs performance ● memory usage profile, expansion of data in memory ● SSD vs spinning disk YMMV ● Test yer shit
  • 13. Compression Realities? { "_id" : ObjectId("53e4fcc42239c23dce3cb7bc"), "station_id" : 8461490, "loc" : {"type" : "Point","coordinates" : [-72.09,41.3614]}, "name" : "New London", "products" : [ {"v" : 69.4,"t" : ISODate("2014-08-08T16:24:00Z"),"name" : "water_temperature","f" : "0,0,0"}, {"v" : 77,"t" : ISODate("2014-08-08T16:24:00Z"),"name" : "air_temperature","f" : "0,0,0"}, {"d" : "360.00","g" : "8.75","f" : "0,0","s" : "4.08","t" : ISODate("2014-08-08T16:24:00Z"),"dr" : "N","name" : "wind"}, {"v" : 1015.8,"t" : ISODate("2014-08-08T16:24:00Z"),"name" : "air_pressure","f" : "0,0,0"} ], "fetch_date" : ISODate("2014-08-08T16:37:22.640Z"), "id" : 8461490 } ● Sensor data, typical type of workload ● Compresses to 78% in memory, and 26% on disk ● https://gist.github.com/kgorman/7d48f37ec5b23efb1475
  • 14. Various compression results as measured with: db.<collection>.stats[“size”] #memory size db.<collection>.stats[“storageSize”] # on disk size
  • 15. Wired Tiger Fragmentation ● Say it isn’t so! ● block_allocation and file_extend tunables ● in-place updates don’t exist in WT, so is fragmentation worse? ● Compaction w/o blocking? http://source.wiredtiger.com/2.5.0/compact.html
  • 16. Demo
  • 17. Getting there from here ● Play today; *but wait a while for gods sake* ● 2.4 -> 2.6 -> 3.0 ● Use replica sets to migrate seamlessly ● Can fallback if needed using replica sets ● http://docs.mongodb.org/manual/release-notes/3.0-upgrade/ ● Go slow, be methodical
  • 18. Key take-aways ● It’s a whole new world ● Start testing now() ● Understand the tradeoffs of compression, performance, and CPU usage for *your* workloads and datasets ● Performance likely easy win ● Compression likely not ● YMMV
  • 19. ObjectRocket/Rackspace + MongoDB 3.0 ● Play today on cloud servers: https://data.rackspace.com/blog/mongodb-3.0-getting-started/ ● ObjectRocket is evaluating 3.0 just like everyone else ● Test in dev, move to prod
  • 20. More reading ● http://docs.mongodb.org/v3.0/release-notes/3.0/ ● http://www.wiredtiger.org ● https://data.rackspace.com/blog/mongodb-3.0-getting-started/