2. Prototype
Script Test
Scale Monitor
The Cycle of Deployment
Prep
3. Prototype
Prototype Your Deployment Script Test
Scale Monitor
• You have to start somewhere
• Development is complete, deployment is next
• Sketch out some initial deployment parameters
Hardware sizing
Operating system
Disk setup
Storage layout, data vs. journal vs. log
4. Prototype
Prototyping Considerations Script Test
Scale Monitor
• Additional considerations
– Horizontal vs. vertical scale options
– Multiple datacenters
• Start thinking about data growth
– Do you know how your data will evolve?
– Does your data live in multiple collections/databases
– Read-centric, write-centric or both?
• The more you start thinking about it, the better
5. Prototype
Test, Test, Test Script Test
Scale Monitor
• Generate a lot of data
– Write tests to measure bulk loading throughput
– Scaffolding can be used for staging, validation
• Build your indexes
– All in the beginning
– On the fly
• Script your app
– Can you simulate “expected” usage?
6. Prototype
Monitor Your Resources Script Test
Scale Monitor
• Watch everything
• The goal is to understand the numbers before
deploying
• Monitor using
– SNMP, munin, nagios
– mongostat, mongotop, iostat, cpustat
– MongoDB Monitoring Service (MMS)
• Other stats
– Database, Collection level
7. Prototype
Monitoring Key Metrics Script Test
Scale Monitor
• Op Counters
– Inserts, updates, deletes, re
ads (more is generally
better)
– Some differences in primary
vs. secondary ops
• Resident memory
– Want this lower than
available physical memory
– Correlated with page faults
and index misses
• Queues
– Readers and writers
8. Prototype
Monitoring Key Metrics Script Test
Scale Monitor
• Page faults and B-Tree
– How often are you having to
hit the disk
– Persistently non-zero?
Working set might not fit.
• Lock Percentage
– If high and queues are
filled, hitting write capacity
• IO and CPU Stats
– IO Sustained or fluctuating
=> IO bound
– CPU hitting IOWAITs
9. Prototype
Scale Your Setup Script Test
Scale Monitor
• Monitor those metrics while testing
• Should tell you where to add capacity
– CPU, RAM, Disks
• Storage configuration
– RAID levels (10 preferred)
– Filesystem selection
– Block sizing
– Readahead setting
10. Prototype
Script Your Plays Script Test
Scale Monitor
• Backups
• Restores (backups are not enough)
• Maintenance and Upgrades
• Replica Set operations
– Stepping primaries down, adding new secondaries
• Sharding operations
– Consistent backups, balancer operations
• Check out the Backup talk later today
11. Prototype
Script Test
Scale Monitor
Lather, Rinse, Repeat
13. Product Infrastructure
Development Development
Balancing Priorities
14. The Scale Tips To One Side
• Product development is the priority
– As it should be, but…
• Infrastructure development can’t be overlooked
• Know the downsides of not being prepared
– Downtime
– Data safety
• Disaster will strike
15. Integrate With The Dev Cycle
• Why are ops typically skipped over until it’s too
late?
– Planning
• Make operations development a part of the dev
cycle
– Put it into the schedule
– Make it a development milestone
• Use it to your advantage
– Script deployment of development and test systems
18. Prototype
Script Test
Scale Monitor
Start The Cycle Again
19. Prototype
Start With Monitoring Script Test
Scale Monitor
• Monitor your deployment
– Munin, nagios
– MMS
• Instrument your app
– Know your queries
– Read/write/update/delete behaviors
– Index utilization
• Database and collection stats
20. Prototype
Scaling Deployment Script Test
Scale Monitor
• The numbers don’t lie
– But individual measurements don’t always tell the whole
story
• Are you hardware bound?
– Memory, Disks, CPU
• Is your app the problem?
• What about system settings?
– Low Resident Memory > Readahead > Page Faults
21. Prototype
Basic Solutions Script Test
Scale Monitor
• Low opcounters + high page faults
– More memory
• High paddingFactor and fragmentation
– Data model changes
• Balancer running a lot, chunks always migrating
– Better shard key
• Persistent b-tree misses, high page faults
– Queries aren’t hitting the indexes or aren’t using them
22. Prototype
Continue Through the Cycle Script Test
Scale Monitor
• Script your setup
– This will save time as you iterate
• Prototype the fixes
– Evaluate queries, how documents change, expected
usage
• Test the new setup
– Scripts to build the deployment and model usage
25. How To Get Help
• Ask the Experts sessions
• We are here to help, come find us
• Refer to our docs: docs.mongodb.org (hint:
they’re great!)
• Other things we monitor
– mongodb-user Google group
– Stack Overflow
• Submit a ticket
27. Problem 1: Social Networking
• Suboptimal write throughput
• Where is the bottleneck?
– Check the metrics
28. Diagnosis 1
• Are opcounters reasonably accurate?
• Check the queues
• Examine lock percentages
• How does resident memory look?
• How large are your indexes?
29. Solution 1
• Opcounters aren’t as high as you’d expect but
memory is saturated
• Correlated with high page faults
• You might need more memory
• MongoDB wants to fit your working set into
memory
30. Problem 2: Tracking FB
Friends
• Update-heavy workload is slow
• Document paddingFactor is increasing
31. Diagnosis 2
• High paddingFactor
– Fragmentation!
• More memory/disk is taken up by new
documents
– Inefficient space usage
• Documents are having to be relocated regularly
32. Solution 2
• Check your queries
– Are your documents growing because of arrays or added
fields?
• Pre-create required document structure or…
• Kick growing elements individual objects in a
separate collection
– Data model changes, app changes
33. Problem 3: Status Updates
• Write-heavy sharded deployment
– Is one shard getting burned
– Balancer locked all the time
• Balancer is constantly migrating chunks
34. Diagnosis 3
• Check the mongos logs
– How often is migration occurring?
– Are chunks constantly moving from one shard to the next?
• Shard key distribution
– Sequential keys?
– One shard always getting new writes?
35. Solution 3
• Consider using hash, byte swapping, etc. if no
“natural” key that distributes well
– Avoids the “hot” shard problem
• High writes and high balancer lock
– Manage balancer window
– Run it during low utilization
36. Problem 4: File Sharing
• Storing files in GridFS
• Uploads are taking too long
37. Diagnosis 4
• Check CPU and IO stats
• Is the CPU stuck in IOWAITS?
• High sustained IO operations
• Lots of queued operations
• IO bound workload
38. Solution 4
• Ensure storage is in good health
– RAID status
– SAN or NAS devices functioning properly
– Virtualized disks
• Consider separating data and journal
– --directoryperdb
– Symlink journal to another location
• Ensure other processes aren’t hitting storage
39. Problem 5: Reading Logs
• Indexes are underperforming
• Queries are using indexes but yielding quite a bit
40. Diagnosis 5
• Use .explain() and .hint() with your queries
• Check out the b-tree metrics
– Persistent non-zero misses?
– Correlated with memory, page faults, IO stats
• B-trees best for range queries over single
dimension
– Range queries on {A} if index is {A,B} could be suboptimal
41. Solution 5
• Revisit your indexing strategy
• Consider data model changes to optimize
queries and indexes
• Some functionality doesn’t hit the index
– $where javascript clauses
– $mod, $not, $ne
– Complex regular expressions
42. Miscellaneous Deployment
Notes
• Warm the cache
– Use touch via db.runCommand()
• Dynamically change log levels
• Synchronize all clocks to the same NTP server