SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
Approaching 1 Billion
       Documents in MongoDB




                    David Mytton
1/30      david@boxedice.com / @davidmytton
Server Density Monitoring


       Processing       Database            UI




2/30
                    www.serverdensity.com
Cache / Data Store
                      Postback




       checksLatest              checksHistorical



3/30
db.stats()
       Documents                  937,393,315

       Collections                      27,566

       Indexes                          45,277

       Stored data                      638GB

       Inserts                    5000-8000/s


4/30
                 As of 17th Jun 2010.
13 months ago




5/30
       Why we moved: http://bit.ly/mysqltomongo
Initial Setup

                     Replication




       Master                       Slave
         DC1                         DC2
       8GB RAM                     8GB RAM

6/30
Vertical Scaling

                  Replication




        Master                   Slave
         DC1                      DC2
       72GB RAM                 8GB RAM

7/30
Tip #1

       Keep your indexes in
       memory at all times.

           db.stats()


8/30
i/o not an issue




9/30
Tip #2
         Data is flushed to disk every 60s.


        db.runCommand({fsync:1});


             --syncdelay [60]

10/30
Sharding solves
          everything




11/30
Manual Partitioning
                      Replication




           Master A                   Slave A
            DC1                        DC2
          16GB RAM                  16GB RAM


                      Replication




           Master B                   Slave B
            DC1                        DC2
12/30     16GB RAM                  16GB RAM
Sustained Traffic



                  Master                      Slave
        Avg out:       2.4Mbit/s   Avg out:       4.0Mbit/s
        Avg in:        3.8Mbit/s   Avg in:      111.2Kbit/s

13/30
Database vs collections


        • Many databases = many data files (small but
          quickly get large).
        • Many collections = watch namespace limit.

14/30
Namespaces = Number of collections +
                number of indexes




15/30
Tip #3


        Monitor the 24,000
         namespace limit.


16/30
Using Server Density




17/30
Console

        db.system.namespaces.count()




18/30
Replica Pairs = Failover
                        Replica Pair




             Master A                    Slave A
              DC1                         DC2
            16GB RAM                   16GB RAM


                        Replica Pair




             Master B                    Slave B
              DC1                         DC2
19/30       16GB RAM                   16GB RAM
Tip #4


        Pre-provision your oplog files.



20/30
A shell script to generate 75GB oplog files




          for i in {0..40}
        do echo $i
        head -c 2146435072 /dev/zero > local.$i
        done




21/30
Tip #5


        Expect slower performance
         during initial replica sync.


22/30
Tip #6


        You can rotate your log files
             from the console.


23/30
Rotating your log files

         db.runCommand("logRotate")




24/30
Tip #7

            Index creation blocks by
            default. Use background
              indexing if necessary.



25/30
        MongoDB Manual: http://bit.ly/mongobgindex
Tip #8

         Increase your OS file
         descriptor limit + use
        persistent connections.


26/30
Too many open files!
                 /etc/security/limits.conf
         mongo hard nofile 10000
         mongo soft nofile 10000
          user                 type          limit




                  /etc/ssh/sshd_config

                     UsePAM yes
27/30
Space is not reused
        Data + indexes                  551GB

        Actual disk usage               638GB


                         Fixed in
        1.1.4 1.3.x 1.5.0 1.5.1 1.5.2 1.5.3 1.5.4?

28/30
                     JIRA: SERVER-366
Summary
        1. Keep indexes in memory.
        2. Data is flushed to disk every 60s.
        3. Monitor the 24k namespace limit.
        4. Pre-provision oplog files.
        5. Expect slower performance on replica sync.
        6. Rotate logs from the console.
        7. Index creation blocks by default.
29/30   8. OS file descriptor limit + persistent connections.
Slides
          blog.boxedice.com/mongodb




                  David Mytton
30/30   david@boxedice.com / @davidmytton

Más contenido relacionado

La actualidad más candente

深入了解Redis
深入了解Redis深入了解Redis
深入了解Redisiammutex
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Jeremy Zawodny
 
分析mysql acid 设计实现
分析mysql acid 设计实现分析mysql acid 设计实现
分析mysql acid 设计实现rfyiamcool
 
No sql but even less security
No sql but even less securityNo sql but even less security
No sql but even less securityiammutex
 
Strategies for Backing Up MongoDB
Strategies for Backing Up MongoDBStrategies for Backing Up MongoDB
Strategies for Backing Up MongoDBMongoDB
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisArnab Mitra
 
Oreilly Webcast 01 19 10
Oreilly Webcast 01 19 10Oreilly Webcast 01 19 10
Oreilly Webcast 01 19 10Sean Hull
 
Hypertable Berlin Buzzwords
Hypertable Berlin BuzzwordsHypertable Berlin Buzzwords
Hypertable Berlin Buzzwordshypertable
 
XtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance AlgorithmsXtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance AlgorithmsLaurynas Biveinis
 
This is redis - feature and usecase
This is redis - feature and usecaseThis is redis - feature and usecase
This is redis - feature and usecaseKris Jeong
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHPfwso
 
5 Tips for Getting Started with Pivotal GemFire
5 Tips for Getting Started with Pivotal GemFire5 Tips for Getting Started with Pivotal GemFire
5 Tips for Getting Started with Pivotal GemFireVMware Tanzu
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringShapeBlue
 
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...Ontico
 
Oreilly Webcast Jun17
Oreilly Webcast Jun17Oreilly Webcast Jun17
Oreilly Webcast Jun17Sean Hull
 
TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)Ontico
 

La actualidad más candente (20)

深入了解Redis
深入了解Redis深入了解Redis
深入了解Redis
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
 
分析mysql acid 设计实现
分析mysql acid 设计实现分析mysql acid 设计实现
分析mysql acid 设计实现
 
No sql but even less security
No sql but even less securityNo sql but even less security
No sql but even less security
 
Strategies for Backing Up MongoDB
Strategies for Backing Up MongoDBStrategies for Backing Up MongoDB
Strategies for Backing Up MongoDB
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Oreilly Webcast 01 19 10
Oreilly Webcast 01 19 10Oreilly Webcast 01 19 10
Oreilly Webcast 01 19 10
 
Hypertable Berlin Buzzwords
Hypertable Berlin BuzzwordsHypertable Berlin Buzzwords
Hypertable Berlin Buzzwords
 
MongoDB Shard Cluster
MongoDB Shard ClusterMongoDB Shard Cluster
MongoDB Shard Cluster
 
XtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance AlgorithmsXtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance Algorithms
 
This is redis - feature and usecase
This is redis - feature and usecaseThis is redis - feature and usecase
This is redis - feature and usecase
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHP
 
5 Tips for Getting Started with Pivotal GemFire
5 Tips for Getting Started with Pivotal GemFire5 Tips for Getting Started with Pivotal GemFire
5 Tips for Getting Started with Pivotal GemFire
 
Containers > VMs
Containers > VMsContainers > VMs
Containers > VMs
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
 
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
 
Erlang scheduler
Erlang schedulerErlang scheduler
Erlang scheduler
 
Oreilly Webcast Jun17
Oreilly Webcast Jun17Oreilly Webcast Jun17
Oreilly Webcast Jun17
 
Redis acc 2015_eng
Redis acc 2015_engRedis acc 2015_eng
Redis acc 2015_eng
 
TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)
 

Similar a Approaching 1 Billion Documents in MongoDB

Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data DeduplicationRedWireServices
 
Some analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDBSome analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDBXiao Yan Li
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQLlefredbe
 
XT Best Practices
XT Best PracticesXT Best Practices
XT Best PracticesJeff Larkin
 
High Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go AssemblyHigh Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go AssemblyMinio
 
Database performance tuning for SSD based storage
Database  performance tuning for SSD based storageDatabase  performance tuning for SSD based storage
Database performance tuning for SSD based storageAngelo Rajadurai
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Виталий Стародубцев
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databasesAngelo Rajadurai
 
Ivan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene Codec
Ivan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene CodecIvan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene Codec
Ivan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene CodecGrid Dynamics
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia DatabasesJaime Crespo
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderGregSmith458515
 
SDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxSDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxssuserabc741
 
ZFS and MySQL on Linux, the Sweet Spots
ZFS and MySQL on Linux, the Sweet SpotsZFS and MySQL on Linux, the Sweet Spots
ZFS and MySQL on Linux, the Sweet SpotsJervin Real
 
Dmx3 950-technical specifications
Dmx3 950-technical specificationsDmx3 950-technical specifications
Dmx3 950-technical specificationsRaghul P
 
The Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesThe Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesDave Stokes
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 

Similar a Approaching 1 Billion Documents in MongoDB (20)

Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data Deduplication
 
Some analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDBSome analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDB
 
Stabilizing Ceph
Stabilizing CephStabilizing Ceph
Stabilizing Ceph
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQL
 
XT Best Practices
XT Best PracticesXT Best Practices
XT Best Practices
 
High Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go AssemblyHigh Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go Assembly
 
Database performance tuning for SSD based storage
Database  performance tuning for SSD based storageDatabase  performance tuning for SSD based storage
Database performance tuning for SSD based storage
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databases
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 
Ivan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene Codec
Ivan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene CodecIvan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene Codec
Ivan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene Codec
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql Loader
 
SDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxSDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptx
 
ZFS and MySQL on Linux, the Sweet Spots
ZFS and MySQL on Linux, the Sweet SpotsZFS and MySQL on Linux, the Sweet Spots
ZFS and MySQL on Linux, the Sweet Spots
 
Dmx3 950-technical specifications
Dmx3 950-technical specificationsDmx3 950-technical specifications
Dmx3 950-technical specifications
 
The Smug Mug Tale
The Smug Mug TaleThe Smug Mug Tale
The Smug Mug Tale
 
The Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesThe Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL Databases
 
NoSQL with MySQL
NoSQL with MySQLNoSQL with MySQL
NoSQL with MySQL
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 

Más de Boxed Ice

MongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingMongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingBoxed Ice
 
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDBMongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDBBoxed Ice
 
MongoDB - Monitoring and queueing
MongoDB - Monitoring and queueingMongoDB - Monitoring and queueing
MongoDB - Monitoring and queueingBoxed Ice
 
MongoDB - Monitoring & queueing
MongoDB - Monitoring & queueingMongoDB - Monitoring & queueing
MongoDB - Monitoring & queueingBoxed Ice
 
Monitoring MongoDB (MongoUK)
Monitoring MongoDB (MongoUK)Monitoring MongoDB (MongoUK)
Monitoring MongoDB (MongoUK)Boxed Ice
 
Monitoring MongoDB (MongoSV)
Monitoring MongoDB (MongoSV)Monitoring MongoDB (MongoSV)
Monitoring MongoDB (MongoSV)Boxed Ice
 
MongoUK - PHP Development
MongoUK - PHP DevelopmentMongoUK - PHP Development
MongoUK - PHP DevelopmentBoxed Ice
 
MongoUK - PHP Development
MongoUK - PHP DevelopmentMongoUK - PHP Development
MongoUK - PHP DevelopmentBoxed Ice
 

Más de Boxed Ice (8)

MongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingMongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and Queueing
 
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDBMongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
 
MongoDB - Monitoring and queueing
MongoDB - Monitoring and queueingMongoDB - Monitoring and queueing
MongoDB - Monitoring and queueing
 
MongoDB - Monitoring & queueing
MongoDB - Monitoring & queueingMongoDB - Monitoring & queueing
MongoDB - Monitoring & queueing
 
Monitoring MongoDB (MongoUK)
Monitoring MongoDB (MongoUK)Monitoring MongoDB (MongoUK)
Monitoring MongoDB (MongoUK)
 
Monitoring MongoDB (MongoSV)
Monitoring MongoDB (MongoSV)Monitoring MongoDB (MongoSV)
Monitoring MongoDB (MongoSV)
 
MongoUK - PHP Development
MongoUK - PHP DevelopmentMongoUK - PHP Development
MongoUK - PHP Development
 
MongoUK - PHP Development
MongoUK - PHP DevelopmentMongoUK - PHP Development
MongoUK - PHP Development
 

Último

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Approaching 1 Billion Documents in MongoDB

  • 1. Approaching 1 Billion Documents in MongoDB David Mytton 1/30 david@boxedice.com / @davidmytton
  • 2. Server Density Monitoring Processing Database UI 2/30 www.serverdensity.com
  • 3. Cache / Data Store Postback checksLatest checksHistorical 3/30
  • 4. db.stats() Documents 937,393,315 Collections 27,566 Indexes 45,277 Stored data 638GB Inserts 5000-8000/s 4/30 As of 17th Jun 2010.
  • 5. 13 months ago 5/30 Why we moved: http://bit.ly/mysqltomongo
  • 6. Initial Setup Replication Master Slave DC1 DC2 8GB RAM 8GB RAM 6/30
  • 7. Vertical Scaling Replication Master Slave DC1 DC2 72GB RAM 8GB RAM 7/30
  • 8. Tip #1 Keep your indexes in memory at all times. db.stats() 8/30
  • 9. i/o not an issue 9/30
  • 10. Tip #2 Data is flushed to disk every 60s. db.runCommand({fsync:1}); --syncdelay [60] 10/30
  • 11. Sharding solves everything 11/30
  • 12. Manual Partitioning Replication Master A Slave A DC1 DC2 16GB RAM 16GB RAM Replication Master B Slave B DC1 DC2 12/30 16GB RAM 16GB RAM
  • 13. Sustained Traffic Master Slave Avg out: 2.4Mbit/s Avg out: 4.0Mbit/s Avg in: 3.8Mbit/s Avg in: 111.2Kbit/s 13/30
  • 14. Database vs collections • Many databases = many data files (small but quickly get large). • Many collections = watch namespace limit. 14/30
  • 15. Namespaces = Number of collections + number of indexes 15/30
  • 16. Tip #3 Monitor the 24,000 namespace limit. 16/30
  • 18. Console db.system.namespaces.count() 18/30
  • 19. Replica Pairs = Failover Replica Pair Master A Slave A DC1 DC2 16GB RAM 16GB RAM Replica Pair Master B Slave B DC1 DC2 19/30 16GB RAM 16GB RAM
  • 20. Tip #4 Pre-provision your oplog files. 20/30
  • 21. A shell script to generate 75GB oplog files for i in {0..40} do echo $i head -c 2146435072 /dev/zero > local.$i done 21/30
  • 22. Tip #5 Expect slower performance during initial replica sync. 22/30
  • 23. Tip #6 You can rotate your log files from the console. 23/30
  • 24. Rotating your log files db.runCommand("logRotate") 24/30
  • 25. Tip #7 Index creation blocks by default. Use background indexing if necessary. 25/30 MongoDB Manual: http://bit.ly/mongobgindex
  • 26. Tip #8 Increase your OS file descriptor limit + use persistent connections. 26/30
  • 27. Too many open files! /etc/security/limits.conf mongo hard nofile 10000 mongo soft nofile 10000 user type limit /etc/ssh/sshd_config UsePAM yes 27/30
  • 28. Space is not reused Data + indexes 551GB Actual disk usage 638GB Fixed in 1.1.4 1.3.x 1.5.0 1.5.1 1.5.2 1.5.3 1.5.4? 28/30 JIRA: SERVER-366
  • 29. Summary 1. Keep indexes in memory. 2. Data is flushed to disk every 60s. 3. Monitor the 24k namespace limit. 4. Pre-provision oplog files. 5. Expect slower performance on replica sync. 6. Rotate logs from the console. 7. Index creation blocks by default. 29/30 8. OS file descriptor limit + persistent connections.
  • 30. Slides blog.boxedice.com/mongodb David Mytton 30/30 david@boxedice.com / @davidmytton