SlideShare una empresa de Scribd logo
1 de 42
#mongodbdays




Deployment Best
Practices
Sandeep Parikh
Solutions Architect, 10gen
Prototype


     Script               Test




         Scale      Monitor

The Cycle of Deployment
Prep
Prototype




Prototype Your Deployment                      Script                     Test




                                                    Scale           Monitor




• You have to start somewhere
• Development is complete, deployment is next
• Sketch out some initial deployment parameters
   Hardware sizing
   Operating system
   Disk setup
   Storage layout, data vs. journal vs. log
Prototype




Prototyping Considerations                               Script                     Test




                                                              Scale           Monitor




• Additional considerations
   – Horizontal vs. vertical scale options
   – Multiple datacenters

• Start thinking about data growth
   – Do you know how your data will evolve?
   – Does your data live in multiple collections/databases
   – Read-centric, write-centric or both?

• The more you start thinking about it, the better
Prototype




Test, Test, Test                                       Script                     Test




                                                            Scale           Monitor




• Generate a lot of data
   – Write tests to measure bulk loading throughput
   – Scaffolding can be used for staging, validation

• Build your indexes
   – All in the beginning
   – On the fly

• Script your app
   – Can you simulate “expected” usage?
Prototype




Monitor Your Resources                      Script                     Test




                                                 Scale           Monitor




• Watch everything
• The goal is to understand the numbers before
 deploying
• Monitor using
   – SNMP, munin, nagios
   – mongostat, mongotop, iostat, cpustat
   – MongoDB Monitoring Service (MMS)

• Other stats
   – Database, Collection level
Prototype




Monitoring Key Metrics              Script                     Test




                                         Scale           Monitor




• Op Counters
  – Inserts, updates, deletes, re
    ads (more is generally
    better)
  – Some differences in primary
    vs. secondary ops

• Resident memory
  – Want this lower than
    available physical memory
  – Correlated with page faults
    and index misses

• Queues
  – Readers and writers
Prototype




Monitoring Key Metrics                Script                     Test




                                           Scale           Monitor




• Page faults and B-Tree
   – How often are you having to
     hit the disk
   – Persistently non-zero?
     Working set might not fit.

• Lock Percentage
   – If high and queues are
     filled, hitting write capacity

• IO and CPU Stats
   – IO Sustained or fluctuating
     => IO bound
   – CPU hitting IOWAITs
Prototype




Scale Your Setup                          Script                     Test




                                               Scale           Monitor




• Monitor those metrics while testing
• Should tell you where to add capacity
   – CPU, RAM, Disks

• Storage configuration
   –   RAID levels (10 preferred)
   –   Filesystem selection
   –   Block sizing
   –   Readahead setting
Prototype




Script Your Plays                                      Script                     Test




                                                            Scale           Monitor




• Backups
• Restores (backups are not enough)
• Maintenance and Upgrades
• Replica Set operations
   – Stepping primaries down, adding new secondaries

• Sharding operations
   – Consistent backups, balancer operations

• Check out the Backup talk later today
Prototype


      Script               Test




          Scale      Monitor


Lather, Rinse, Repeat
Perfect. I know what to do.
How Do I Do It?
Product     Infrastructure
       Development   Development




Balancing Priorities
The Scale Tips To One Side
• Product development is the priority
   – As it should be, but…

• Infrastructure development can’t be overlooked
• Know the downsides of not being prepared
   – Downtime
   – Data safety

• Disaster will strike
Integrate With The Dev Cycle
• Why are ops typically skipped over until it’s too
 late?
   – Planning

• Make operations development a part of the dev
 cycle
   – Put it into the schedule
   – Make it a development milestone

• Use it to your advantage
   – Script deployment of development and test systems
That’s all well and good but
we are already deployed
Let’s Avoid This Situation
Prototype


      Script               Test




          Scale      Monitor


Start The Cycle Again
Prototype




Start With Monitoring                     Script                     Test




                                               Scale           Monitor




• Monitor your deployment
   – Munin, nagios
   – MMS

• Instrument your app
   – Know your queries
   – Read/write/update/delete behaviors
   – Index utilization

• Database and collection stats
Prototype




Scaling Deployment                                     Script                     Test




                                                            Scale           Monitor




• The numbers don’t lie
   – But individual measurements don’t always tell the whole
     story
• Are you hardware bound?
   – Memory, Disks, CPU

• Is your app the problem?
• What about system settings?
   – Low Resident Memory > Readahead > Page Faults
Prototype




Basic Solutions                                           Script                     Test




                                                               Scale           Monitor




• Low opcounters + high page faults
   – More memory

• High paddingFactor and fragmentation
   – Data model changes

• Balancer running a lot, chunks always migrating
   – Better shard key

• Persistent b-tree misses, high page faults
   – Queries aren’t hitting the indexes or aren’t using them
Prototype




Continue Through the Cycle                             Script                     Test




                                                            Scale           Monitor




• Script your setup
   – This will save time as you iterate

• Prototype the fixes
   – Evaluate queries, how documents change, expected
     usage
• Test the new setup
   – Scripts to build the deployment and model usage
Deployment is about
Not being surprised
Questions?
How To Get Help
• Ask the Experts sessions
• We are here to help, come find us
• Refer to our docs: docs.mongodb.org (hint:
 they’re great!)
• Other things we monitor
   – mongodb-user Google group
   – Stack Overflow

• Submit a ticket
Backup
Problem > Diagnosis >
Solution
Problem 1: Social Networking
• Suboptimal write throughput
• Where is the bottleneck?
   – Check the metrics
Diagnosis 1
• Are opcounters reasonably accurate?
• Check the queues
• Examine lock percentages
• How does resident memory look?
• How large are your indexes?
Solution 1
• Opcounters aren’t as high as you’d expect but
 memory is saturated
• Correlated with high page faults
• You might need more memory
• MongoDB wants to fit your working set into
 memory
Problem 2: Tracking FB
Friends
• Update-heavy workload is slow
• Document paddingFactor is increasing
Diagnosis 2
• High paddingFactor
   – Fragmentation!

• More memory/disk is taken up by new
 documents
   – Inefficient space usage

• Documents are having to be relocated regularly
Solution 2
• Check your queries
   – Are your documents growing because of arrays or added
    fields?
• Pre-create required document structure or…
• Kick growing elements individual objects in a
 separate collection
   – Data model changes, app changes
Problem 3: Status Updates
• Write-heavy sharded deployment
   – Is one shard getting burned
   – Balancer locked all the time

• Balancer is constantly migrating chunks
Diagnosis 3
• Check the mongos logs
   – How often is migration occurring?
   – Are chunks constantly moving from one shard to the next?

• Shard key distribution
   – Sequential keys?
   – One shard always getting new writes?
Solution 3
• Consider using hash, byte swapping, etc. if no
 “natural” key that distributes well
   – Avoids the “hot” shard problem

• High writes and high balancer lock
   – Manage balancer window
   – Run it during low utilization
Problem 4: File Sharing
• Storing files in GridFS
• Uploads are taking too long
Diagnosis 4
• Check CPU and IO stats
• Is the CPU stuck in IOWAITS?
• High sustained IO operations
• Lots of queued operations
• IO bound workload
Solution 4
• Ensure storage is in good health
   – RAID status
   – SAN or NAS devices functioning properly
   – Virtualized disks

• Consider separating data and journal
   – --directoryperdb
   – Symlink journal to another location

• Ensure other processes aren’t hitting storage
Problem 5: Reading Logs
• Indexes are underperforming
• Queries are using indexes but yielding quite a bit
Diagnosis 5
• Use .explain() and .hint() with your queries
• Check out the b-tree metrics
   – Persistent non-zero misses?
   – Correlated with memory, page faults, IO stats

• B-trees best for range queries over single
 dimension
   – Range queries on {A} if index is {A,B} could be suboptimal
Solution 5
• Revisit your indexing strategy
• Consider data model changes to optimize
 queries and indexes
• Some functionality doesn’t hit the index
   – $where javascript clauses
   – $mod, $not, $ne
   – Complex regular expressions
Miscellaneous Deployment
Notes
• Warm the cache
   – Use touch via db.runCommand()

• Dynamically change log levels
• Synchronize all clocks to the same NTP server

Más contenido relacionado

Destacado (7)

How to choose the right projects to meet your objectives and resources?
How to choose the right projects to meet your objectives and resources?How to choose the right projects to meet your objectives and resources?
How to choose the right projects to meet your objectives and resources?
 
OpenERP Implementation Assistance - Customers
OpenERP Implementation Assistance  -  CustomersOpenERP Implementation Assistance  -  Customers
OpenERP Implementation Assistance - Customers
 
How to sell OpenERP out of-the-box vs implementation. Francois Pietquin, OpenERP
How to sell OpenERP out of-the-box vs implementation. Francois Pietquin, OpenERPHow to sell OpenERP out of-the-box vs implementation. Francois Pietquin, OpenERP
How to sell OpenERP out of-the-box vs implementation. Francois Pietquin, OpenERP
 
Note on Implementation Strategy -A Harvard Business Review Kenneth R. An...
Note on Implementation Strategy -A Harvard Business Review      Kenneth R. An...Note on Implementation Strategy -A Harvard Business Review      Kenneth R. An...
Note on Implementation Strategy -A Harvard Business Review Kenneth R. An...
 
Strategy Implementation and Control
Strategy Implementation and ControlStrategy Implementation and Control
Strategy Implementation and Control
 
Policy Deployment
Policy DeploymentPolicy Deployment
Policy Deployment
 
A3 Thinking Applied to Policy Deployment
A3 Thinking Applied to Policy DeploymentA3 Thinking Applied to Policy Deployment
A3 Thinking Applied to Policy Deployment
 

Similar a Deployment Best Practices

Ohio 2012-help-sysad-out
Ohio 2012-help-sysad-outOhio 2012-help-sysad-out
Ohio 2012-help-sysad-out
mralexjuarez
 
Loadtesting wuc2009v2
Loadtesting wuc2009v2Loadtesting wuc2009v2
Loadtesting wuc2009v2
ravneetraman
 

Similar a Deployment Best Practices (20)

Ohio 2012-help-sysad-out
Ohio 2012-help-sysad-outOhio 2012-help-sysad-out
Ohio 2012-help-sysad-out
 
Scaling apps for the big time
Scaling apps for the big timeScaling apps for the big time
Scaling apps for the big time
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
 
Capacityplanning
Capacityplanning Capacityplanning
Capacityplanning
 
071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
 
Webinar: Capacity Planning
Webinar: Capacity PlanningWebinar: Capacity Planning
Webinar: Capacity Planning
 
Load testing with Visual Studio and Azure - Andrew Siemer
Load testing with Visual Studio and Azure - Andrew SiemerLoad testing with Visual Studio and Azure - Andrew Siemer
Load testing with Visual Studio and Azure - Andrew Siemer
 
Software Defects and SW Reliability Assessment
Software Defects and SW Reliability AssessmentSoftware Defects and SW Reliability Assessment
Software Defects and SW Reliability Assessment
 
Monitoring Oracle SOA Suite - UKOUG Tech15 2015
Monitoring Oracle SOA Suite - UKOUG Tech15 2015Monitoring Oracle SOA Suite - UKOUG Tech15 2015
Monitoring Oracle SOA Suite - UKOUG Tech15 2015
 
Mobile gotcha
Mobile gotchaMobile gotcha
Mobile gotcha
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
Road to Continuous Delivery - Wix.com
Road to Continuous Delivery - Wix.comRoad to Continuous Delivery - Wix.com
Road to Continuous Delivery - Wix.com
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
 
Migration Concentrate
Migration ConcentrateMigration Concentrate
Migration Concentrate
 
ApoorvaTiwari
ApoorvaTiwariApoorvaTiwari
ApoorvaTiwari
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Loadtesting wuc2009v2
Loadtesting wuc2009v2Loadtesting wuc2009v2
Loadtesting wuc2009v2
 

Más de MongoDB

Más de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Deployment Best Practices

  • 2. Prototype Script Test Scale Monitor The Cycle of Deployment Prep
  • 3. Prototype Prototype Your Deployment Script Test Scale Monitor • You have to start somewhere • Development is complete, deployment is next • Sketch out some initial deployment parameters  Hardware sizing  Operating system  Disk setup  Storage layout, data vs. journal vs. log
  • 4. Prototype Prototyping Considerations Script Test Scale Monitor • Additional considerations – Horizontal vs. vertical scale options – Multiple datacenters • Start thinking about data growth – Do you know how your data will evolve? – Does your data live in multiple collections/databases – Read-centric, write-centric or both? • The more you start thinking about it, the better
  • 5. Prototype Test, Test, Test Script Test Scale Monitor • Generate a lot of data – Write tests to measure bulk loading throughput – Scaffolding can be used for staging, validation • Build your indexes – All in the beginning – On the fly • Script your app – Can you simulate “expected” usage?
  • 6. Prototype Monitor Your Resources Script Test Scale Monitor • Watch everything • The goal is to understand the numbers before deploying • Monitor using – SNMP, munin, nagios – mongostat, mongotop, iostat, cpustat – MongoDB Monitoring Service (MMS) • Other stats – Database, Collection level
  • 7. Prototype Monitoring Key Metrics Script Test Scale Monitor • Op Counters – Inserts, updates, deletes, re ads (more is generally better) – Some differences in primary vs. secondary ops • Resident memory – Want this lower than available physical memory – Correlated with page faults and index misses • Queues – Readers and writers
  • 8. Prototype Monitoring Key Metrics Script Test Scale Monitor • Page faults and B-Tree – How often are you having to hit the disk – Persistently non-zero? Working set might not fit. • Lock Percentage – If high and queues are filled, hitting write capacity • IO and CPU Stats – IO Sustained or fluctuating => IO bound – CPU hitting IOWAITs
  • 9. Prototype Scale Your Setup Script Test Scale Monitor • Monitor those metrics while testing • Should tell you where to add capacity – CPU, RAM, Disks • Storage configuration – RAID levels (10 preferred) – Filesystem selection – Block sizing – Readahead setting
  • 10. Prototype Script Your Plays Script Test Scale Monitor • Backups • Restores (backups are not enough) • Maintenance and Upgrades • Replica Set operations – Stepping primaries down, adding new secondaries • Sharding operations – Consistent backups, balancer operations • Check out the Backup talk later today
  • 11. Prototype Script Test Scale Monitor Lather, Rinse, Repeat
  • 12. Perfect. I know what to do. How Do I Do It?
  • 13. Product Infrastructure Development Development Balancing Priorities
  • 14. The Scale Tips To One Side • Product development is the priority – As it should be, but… • Infrastructure development can’t be overlooked • Know the downsides of not being prepared – Downtime – Data safety • Disaster will strike
  • 15. Integrate With The Dev Cycle • Why are ops typically skipped over until it’s too late? – Planning • Make operations development a part of the dev cycle – Put it into the schedule – Make it a development milestone • Use it to your advantage – Script deployment of development and test systems
  • 16. That’s all well and good but we are already deployed
  • 17. Let’s Avoid This Situation
  • 18. Prototype Script Test Scale Monitor Start The Cycle Again
  • 19. Prototype Start With Monitoring Script Test Scale Monitor • Monitor your deployment – Munin, nagios – MMS • Instrument your app – Know your queries – Read/write/update/delete behaviors – Index utilization • Database and collection stats
  • 20. Prototype Scaling Deployment Script Test Scale Monitor • The numbers don’t lie – But individual measurements don’t always tell the whole story • Are you hardware bound? – Memory, Disks, CPU • Is your app the problem? • What about system settings? – Low Resident Memory > Readahead > Page Faults
  • 21. Prototype Basic Solutions Script Test Scale Monitor • Low opcounters + high page faults – More memory • High paddingFactor and fragmentation – Data model changes • Balancer running a lot, chunks always migrating – Better shard key • Persistent b-tree misses, high page faults – Queries aren’t hitting the indexes or aren’t using them
  • 22. Prototype Continue Through the Cycle Script Test Scale Monitor • Script your setup – This will save time as you iterate • Prototype the fixes – Evaluate queries, how documents change, expected usage • Test the new setup – Scripts to build the deployment and model usage
  • 23. Deployment is about Not being surprised
  • 25. How To Get Help • Ask the Experts sessions • We are here to help, come find us • Refer to our docs: docs.mongodb.org (hint: they’re great!) • Other things we monitor – mongodb-user Google group – Stack Overflow • Submit a ticket
  • 27. Problem 1: Social Networking • Suboptimal write throughput • Where is the bottleneck? – Check the metrics
  • 28. Diagnosis 1 • Are opcounters reasonably accurate? • Check the queues • Examine lock percentages • How does resident memory look? • How large are your indexes?
  • 29. Solution 1 • Opcounters aren’t as high as you’d expect but memory is saturated • Correlated with high page faults • You might need more memory • MongoDB wants to fit your working set into memory
  • 30. Problem 2: Tracking FB Friends • Update-heavy workload is slow • Document paddingFactor is increasing
  • 31. Diagnosis 2 • High paddingFactor – Fragmentation! • More memory/disk is taken up by new documents – Inefficient space usage • Documents are having to be relocated regularly
  • 32. Solution 2 • Check your queries – Are your documents growing because of arrays or added fields? • Pre-create required document structure or… • Kick growing elements individual objects in a separate collection – Data model changes, app changes
  • 33. Problem 3: Status Updates • Write-heavy sharded deployment – Is one shard getting burned – Balancer locked all the time • Balancer is constantly migrating chunks
  • 34. Diagnosis 3 • Check the mongos logs – How often is migration occurring? – Are chunks constantly moving from one shard to the next? • Shard key distribution – Sequential keys? – One shard always getting new writes?
  • 35. Solution 3 • Consider using hash, byte swapping, etc. if no “natural” key that distributes well – Avoids the “hot” shard problem • High writes and high balancer lock – Manage balancer window – Run it during low utilization
  • 36. Problem 4: File Sharing • Storing files in GridFS • Uploads are taking too long
  • 37. Diagnosis 4 • Check CPU and IO stats • Is the CPU stuck in IOWAITS? • High sustained IO operations • Lots of queued operations • IO bound workload
  • 38. Solution 4 • Ensure storage is in good health – RAID status – SAN or NAS devices functioning properly – Virtualized disks • Consider separating data and journal – --directoryperdb – Symlink journal to another location • Ensure other processes aren’t hitting storage
  • 39. Problem 5: Reading Logs • Indexes are underperforming • Queries are using indexes but yielding quite a bit
  • 40. Diagnosis 5 • Use .explain() and .hint() with your queries • Check out the b-tree metrics – Persistent non-zero misses? – Correlated with memory, page faults, IO stats • B-trees best for range queries over single dimension – Range queries on {A} if index is {A,B} could be suboptimal
  • 41. Solution 5 • Revisit your indexing strategy • Consider data model changes to optimize queries and indexes • Some functionality doesn’t hit the index – $where javascript clauses – $mod, $not, $ne – Complex regular expressions
  • 42. Miscellaneous Deployment Notes • Warm the cache – Use touch via db.runCommand() • Dynamically change log levels • Synchronize all clocks to the same NTP server