SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
Approaching 1 Billion
       Documents in MongoDB




                    David Mytton
1/30      david@boxedice.com / @davidmytton
Server Density Monitoring


       Processing       Database            UI




2/30
                    www.serverdensity.com
Cache / Data Store
                      Postback




       checksLatest              checksHistorical



3/30
db.stats()
       Documents                  937,393,315

       Collections                      27,566

       Indexes                          45,277

       Stored data                      638GB

       Inserts                    5000-8000/s


4/30
                 As of 17th Jun 2010.
13 months ago




5/30
       Why we moved: http://bit.ly/mysqltomongo
Initial Setup

                     Replication




       Master                       Slave
         DC1                         DC2
       8GB RAM                     8GB RAM

6/30
Vertical Scaling

                  Replication




        Master                   Slave
         DC1                      DC2
       72GB RAM                 8GB RAM

7/30
Tip #1

       Keep your indexes in
       memory at all times.

           db.stats()


8/30
i/o not an issue




9/30
Tip #2
         Data is flushed to disk every 60s.


        db.runCommand({fsync:1});


             --syncdelay [60]

10/30
Sharding solves
          everything




11/30
Manual Partitioning
                      Replication




           Master A                   Slave A
            DC1                        DC2
          16GB RAM                  16GB RAM


                      Replication




           Master B                   Slave B
            DC1                        DC2
12/30     16GB RAM                  16GB RAM
Sustained Traffic



                  Master                      Slave
        Avg out:       2.4Mbit/s   Avg out:       4.0Mbit/s
        Avg in:        3.8Mbit/s   Avg in:      111.2Kbit/s

13/30
Database vs collections


        • Many databases = many data files (small but
          quickly get large).
        • Many collections = watch namespace limit.

14/30
Namespaces = Number of collections +
                number of indexes




15/30
Tip #3


        Monitor the 24,000
         namespace limit.


16/30
Using Server Density




17/30
Console

        db.system.namespaces.count()




18/30
Replica Pairs = Failover
                        Replica Pair




             Master A                    Slave A
              DC1                         DC2
            16GB RAM                   16GB RAM


                        Replica Pair




             Master B                    Slave B
              DC1                         DC2
19/30       16GB RAM                   16GB RAM
Tip #4


        Pre-provision your oplog files.



20/30
A shell script to generate 75GB oplog files




          for i in {0..40}
        do echo $i
        head -c 2146435072 /dev/zero > local.$i
        done




21/30
Tip #5


        Expect slower performance
         during initial replica sync.


22/30
Tip #6


        You can rotate your log files
             from the console.


23/30
Rotating your log files

         db.runCommand("logRotate")




24/30
Tip #7

            Index creation blocks by
            default. Use background
              indexing if necessary.



25/30
        MongoDB Manual: http://bit.ly/mongobgindex
Tip #8

         Increase your OS file
         descriptor limit + use
        persistent connections.


26/30
Too many open files!
                 /etc/security/limits.conf
         mongo hard nofile 10000
         mongo soft nofile 10000
          user                 type          limit




                  /etc/ssh/sshd_config

                     UsePAM yes
27/30
Space is not reused
        Data + indexes                  551GB

        Actual disk usage               638GB


                         Fixed in
        1.1.4 1.3.x 1.5.0 1.5.1 1.5.2 1.5.3 1.5.4?

28/30
                     JIRA: SERVER-366
Summary
        1. Keep indexes in memory.
        2. Data is flushed to disk every 60s.
        3. Monitor the 24k namespace limit.
        4. Pre-provision oplog files.
        5. Expect slower performance on replica sync.
        6. Rotate logs from the console.
        7. Index creation blocks by default.
29/30   8. OS file descriptor limit + persistent connections.
Slides
          blog.boxedice.com/mongodb




                  David Mytton
30/30   david@boxedice.com / @davidmytton

Más contenido relacionado

La actualidad más candente

深入了解Redis
深入了解Redis深入了解Redis
深入了解Redisiammutex
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Jeremy Zawodny
 
分析mysql acid 设计实现
分析mysql acid 设计实现分析mysql acid 设计实现
分析mysql acid 设计实现rfyiamcool
 
No sql but even less security
No sql but even less securityNo sql but even less security
No sql but even less securityiammutex
 
Strategies for Backing Up MongoDB
Strategies for Backing Up MongoDBStrategies for Backing Up MongoDB
Strategies for Backing Up MongoDBMongoDB
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisArnab Mitra
 
Oreilly Webcast 01 19 10
Oreilly Webcast 01 19 10Oreilly Webcast 01 19 10
Oreilly Webcast 01 19 10Sean Hull
 
Hypertable Berlin Buzzwords
Hypertable Berlin BuzzwordsHypertable Berlin Buzzwords
Hypertable Berlin Buzzwordshypertable
 
XtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance AlgorithmsXtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance AlgorithmsLaurynas Biveinis
 
This is redis - feature and usecase
This is redis - feature and usecaseThis is redis - feature and usecase
This is redis - feature and usecaseKris Jeong
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHPfwso
 
5 Tips for Getting Started with Pivotal GemFire
5 Tips for Getting Started with Pivotal GemFire5 Tips for Getting Started with Pivotal GemFire
5 Tips for Getting Started with Pivotal GemFireVMware Tanzu
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringShapeBlue
 
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...Ontico
 
Oreilly Webcast Jun17
Oreilly Webcast Jun17Oreilly Webcast Jun17
Oreilly Webcast Jun17Sean Hull
 
TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)Ontico
 

La actualidad más candente (20)

深入了解Redis
深入了解Redis深入了解Redis
深入了解Redis
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
 
分析mysql acid 设计实现
分析mysql acid 设计实现分析mysql acid 设计实现
分析mysql acid 设计实现
 
No sql but even less security
No sql but even less securityNo sql but even less security
No sql but even less security
 
Strategies for Backing Up MongoDB
Strategies for Backing Up MongoDBStrategies for Backing Up MongoDB
Strategies for Backing Up MongoDB
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Oreilly Webcast 01 19 10
Oreilly Webcast 01 19 10Oreilly Webcast 01 19 10
Oreilly Webcast 01 19 10
 
Hypertable Berlin Buzzwords
Hypertable Berlin BuzzwordsHypertable Berlin Buzzwords
Hypertable Berlin Buzzwords
 
MongoDB Shard Cluster
MongoDB Shard ClusterMongoDB Shard Cluster
MongoDB Shard Cluster
 
XtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance AlgorithmsXtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance Algorithms
 
This is redis - feature and usecase
This is redis - feature and usecaseThis is redis - feature and usecase
This is redis - feature and usecase
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHP
 
5 Tips for Getting Started with Pivotal GemFire
5 Tips for Getting Started with Pivotal GemFire5 Tips for Getting Started with Pivotal GemFire
5 Tips for Getting Started with Pivotal GemFire
 
Containers > VMs
Containers > VMsContainers > VMs
Containers > VMs
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
 
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
 
Erlang scheduler
Erlang schedulerErlang scheduler
Erlang scheduler
 
Oreilly Webcast Jun17
Oreilly Webcast Jun17Oreilly Webcast Jun17
Oreilly Webcast Jun17
 
Redis acc 2015_eng
Redis acc 2015_engRedis acc 2015_eng
Redis acc 2015_eng
 
TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)
 

Similar a Approaching 1 Billion Documents in MongoDB

Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data DeduplicationRedWireServices
 
Some analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDBSome analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDBXiao Yan Li
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQLlefredbe
 
XT Best Practices
XT Best PracticesXT Best Practices
XT Best PracticesJeff Larkin
 
High Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go AssemblyHigh Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go AssemblyMinio
 
Database performance tuning for SSD based storage
Database  performance tuning for SSD based storageDatabase  performance tuning for SSD based storage
Database performance tuning for SSD based storageAngelo Rajadurai
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Виталий Стародубцев
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databasesAngelo Rajadurai
 
Ivan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene Codec
Ivan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene CodecIvan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene Codec
Ivan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene CodecGrid Dynamics
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia DatabasesJaime Crespo
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderGregSmith458515
 
SDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxSDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxssuserabc741
 
ZFS and MySQL on Linux, the Sweet Spots
ZFS and MySQL on Linux, the Sweet SpotsZFS and MySQL on Linux, the Sweet Spots
ZFS and MySQL on Linux, the Sweet SpotsJervin Real
 
Dmx3 950-technical specifications
Dmx3 950-technical specificationsDmx3 950-technical specifications
Dmx3 950-technical specificationsRaghul P
 
The Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesThe Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesDave Stokes
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 

Similar a Approaching 1 Billion Documents in MongoDB (20)

Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data Deduplication
 
Some analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDBSome analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDB
 
Stabilizing Ceph
Stabilizing CephStabilizing Ceph
Stabilizing Ceph
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQL
 
XT Best Practices
XT Best PracticesXT Best Practices
XT Best Practices
 
High Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go AssemblyHigh Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go Assembly
 
Database performance tuning for SSD based storage
Database  performance tuning for SSD based storageDatabase  performance tuning for SSD based storage
Database performance tuning for SSD based storage
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databases
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 
Ivan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene Codec
Ivan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene CodecIvan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene Codec
Ivan mamontov and Mikhail Khuldnev Present : Fast Decompression Lucene Codec
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql Loader
 
SDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxSDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptx
 
ZFS and MySQL on Linux, the Sweet Spots
ZFS and MySQL on Linux, the Sweet SpotsZFS and MySQL on Linux, the Sweet Spots
ZFS and MySQL on Linux, the Sweet Spots
 
Dmx3 950-technical specifications
Dmx3 950-technical specificationsDmx3 950-technical specifications
Dmx3 950-technical specifications
 
The Smug Mug Tale
The Smug Mug TaleThe Smug Mug Tale
The Smug Mug Tale
 
The Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesThe Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL Databases
 
NoSQL with MySQL
NoSQL with MySQLNoSQL with MySQL
NoSQL with MySQL
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 

Más de Boxed Ice

MongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingMongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingBoxed Ice
 
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDBMongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDBBoxed Ice
 
MongoDB - Monitoring and queueing
MongoDB - Monitoring and queueingMongoDB - Monitoring and queueing
MongoDB - Monitoring and queueingBoxed Ice
 
MongoDB - Monitoring & queueing
MongoDB - Monitoring & queueingMongoDB - Monitoring & queueing
MongoDB - Monitoring & queueingBoxed Ice
 
Monitoring MongoDB (MongoUK)
Monitoring MongoDB (MongoUK)Monitoring MongoDB (MongoUK)
Monitoring MongoDB (MongoUK)Boxed Ice
 
Monitoring MongoDB (MongoSV)
Monitoring MongoDB (MongoSV)Monitoring MongoDB (MongoSV)
Monitoring MongoDB (MongoSV)Boxed Ice
 
MongoUK - PHP Development
MongoUK - PHP DevelopmentMongoUK - PHP Development
MongoUK - PHP DevelopmentBoxed Ice
 
MongoUK - PHP Development
MongoUK - PHP DevelopmentMongoUK - PHP Development
MongoUK - PHP DevelopmentBoxed Ice
 

Más de Boxed Ice (8)

MongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingMongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and Queueing
 
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDBMongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
 
MongoDB - Monitoring and queueing
MongoDB - Monitoring and queueingMongoDB - Monitoring and queueing
MongoDB - Monitoring and queueing
 
MongoDB - Monitoring & queueing
MongoDB - Monitoring & queueingMongoDB - Monitoring & queueing
MongoDB - Monitoring & queueing
 
Monitoring MongoDB (MongoUK)
Monitoring MongoDB (MongoUK)Monitoring MongoDB (MongoUK)
Monitoring MongoDB (MongoUK)
 
Monitoring MongoDB (MongoSV)
Monitoring MongoDB (MongoSV)Monitoring MongoDB (MongoSV)
Monitoring MongoDB (MongoSV)
 
MongoUK - PHP Development
MongoUK - PHP DevelopmentMongoUK - PHP Development
MongoUK - PHP Development
 
MongoUK - PHP Development
MongoUK - PHP DevelopmentMongoUK - PHP Development
MongoUK - PHP Development
 

Último

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Último (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Approaching 1 Billion Documents in MongoDB

  • 1. Approaching 1 Billion Documents in MongoDB David Mytton 1/30 david@boxedice.com / @davidmytton
  • 2. Server Density Monitoring Processing Database UI 2/30 www.serverdensity.com
  • 3. Cache / Data Store Postback checksLatest checksHistorical 3/30
  • 4. db.stats() Documents 937,393,315 Collections 27,566 Indexes 45,277 Stored data 638GB Inserts 5000-8000/s 4/30 As of 17th Jun 2010.
  • 5. 13 months ago 5/30 Why we moved: http://bit.ly/mysqltomongo
  • 6. Initial Setup Replication Master Slave DC1 DC2 8GB RAM 8GB RAM 6/30
  • 7. Vertical Scaling Replication Master Slave DC1 DC2 72GB RAM 8GB RAM 7/30
  • 8. Tip #1 Keep your indexes in memory at all times. db.stats() 8/30
  • 9. i/o not an issue 9/30
  • 10. Tip #2 Data is flushed to disk every 60s. db.runCommand({fsync:1}); --syncdelay [60] 10/30
  • 11. Sharding solves everything 11/30
  • 12. Manual Partitioning Replication Master A Slave A DC1 DC2 16GB RAM 16GB RAM Replication Master B Slave B DC1 DC2 12/30 16GB RAM 16GB RAM
  • 13. Sustained Traffic Master Slave Avg out: 2.4Mbit/s Avg out: 4.0Mbit/s Avg in: 3.8Mbit/s Avg in: 111.2Kbit/s 13/30
  • 14. Database vs collections • Many databases = many data files (small but quickly get large). • Many collections = watch namespace limit. 14/30
  • 15. Namespaces = Number of collections + number of indexes 15/30
  • 16. Tip #3 Monitor the 24,000 namespace limit. 16/30
  • 18. Console db.system.namespaces.count() 18/30
  • 19. Replica Pairs = Failover Replica Pair Master A Slave A DC1 DC2 16GB RAM 16GB RAM Replica Pair Master B Slave B DC1 DC2 19/30 16GB RAM 16GB RAM
  • 20. Tip #4 Pre-provision your oplog files. 20/30
  • 21. A shell script to generate 75GB oplog files for i in {0..40} do echo $i head -c 2146435072 /dev/zero > local.$i done 21/30
  • 22. Tip #5 Expect slower performance during initial replica sync. 22/30
  • 23. Tip #6 You can rotate your log files from the console. 23/30
  • 24. Rotating your log files db.runCommand("logRotate") 24/30
  • 25. Tip #7 Index creation blocks by default. Use background indexing if necessary. 25/30 MongoDB Manual: http://bit.ly/mongobgindex
  • 26. Tip #8 Increase your OS file descriptor limit + use persistent connections. 26/30
  • 27. Too many open files! /etc/security/limits.conf mongo hard nofile 10000 mongo soft nofile 10000 user type limit /etc/ssh/sshd_config UsePAM yes 27/30
  • 28. Space is not reused Data + indexes 551GB Actual disk usage 638GB Fixed in 1.1.4 1.3.x 1.5.0 1.5.1 1.5.2 1.5.3 1.5.4? 28/30 JIRA: SERVER-366
  • 29. Summary 1. Keep indexes in memory. 2. Data is flushed to disk every 60s. 3. Monitor the 24k namespace limit. 4. Pre-provision oplog files. 5. Expect slower performance on replica sync. 6. Rotate logs from the console. 7. Index creation blocks by default. 29/30 8. OS file descriptor limit + persistent connections.
  • 30. Slides blog.boxedice.com/mongodb David Mytton 30/30 david@boxedice.com / @davidmytton