14. assign a shard name per cluster, per role
treat them like ordinary replica sets
Thursday, June 20, 13
15. Arbiters
• Mongod processes that do nothing but vote
• Highly reliable
• To provision an arbiter, use the LWRP
• Easy to run multiple arbiters on a single host
Thursday, June 20, 13
22. Provisioning tips
• Memory is your primary scaling constraint
• Your working set must fit in to memory
• in 2.4, estimate with:
• Page faults? Your working set may not fit
Thursday, June 20, 13
23. Disk options
• If you’re on Amazon:
• EBS
• Dedicated SSD
• Provisioned IOPS
• Ephemeral
• If not:
• use SSDs!
Thursday, June 20, 13
25. SSD
(hi1.4xlarge)
• 8 cores
• 60 gigs RAM
• 2 1-TB SSD drives
• 120k random reads/sec
• 85k random writes/sec
• expensive! $2300/mo on demand
Thursday, June 20, 13
26. PIOPS
• Up to 2000 IOPS/volume
• Up to 1024 GB/volume
• Variability of < 0.1%
• Costs double regular EBS
• Supports snapshots
• RAID together multiple volumes
for more storage/performance
Thursday, June 20, 13
27. • multiply that by 2-3x depending on your spikiness
Estimating PIOPS
• estimate how many IOPS to provision with the “tps”
column of sar -d 1
Thursday, June 20, 13
28. Ephemeral
Storage
• Cheap
• Fast
• No network latency
• No snapshot capability
• Data is lost forever if you stop or
resize the instance
Thursday, June 20, 13
29. Filesystem and limits
• Raise file descriptor limits
• Raise connection limits
• Mount with noatime and nodiratime
• Consider putting the journal on a separate volume
Thursday, June 20, 13
30. Blockdev
• Your default blockdev is probably wrong
• Too large? you will underuse memory
• Too small? you will hit the disk too much
• Experiment.
Thursday, June 20, 13
31. Snapshot best practices
• Set priority = 0
• Set hidden = 1
• Consider setting votes = 0
• Lock mongo or stop mongod before snapshot
• Consider running continuous compaction on
snapshot node
Thursday, June 20, 13
32. Restoring from snapshot
• EBS snapshot will lazily-load blocks from S3
• run “dd” on each of the data files to pull blocks down
• Always warm up a secondary before promoting
• warm up both indexes and data
• http://blog.parse.com/2013/03/07/techniques-for-warming-up-mongodb/
• in mongodb 2.2 and above you can use the touch command:
Thursday, June 20, 13
33. Fragmentation
• Your RAM gets fragmented too!
• Leads to underuse of memory
• Deletes are not the only source of fragmentation
• Repair, compact, or resync regularly
Thursday, June 20, 13
34. 3 ways to fix fragmentation:
• Re-sync a secondary from scratch
• hard on your primary; rs.syncFrom() a secondary
• Repair a secondary
• can cause small discrepancies in your data
• Run continuous compaction on your snapshot
node
• won’t reset padding factors
• not appropriate if you do lots of deletes
Thursday, June 20, 13
38. Finding bad queries
• db.currentOp()
• mongodb.log
• profiling collection
Thursday, June 20, 13
39. db.currentOp()
• Check the queue size
• Any indexes building?
• Sort by num_seconds
• Sort by num_yields, locktype
• Consider adding comments to your queries
• Run explain() on queries that are long-running
Thursday, June 20, 13
40. mongodb.log
• Configure output with --slowms
• Look for high execution time, nscanned, ntoreturn
• See which queries are holding long locks
• Match connection ids to IPs
Thursday, June 20, 13
41. system.profile collection
• Enable profiling with db.setProfiling()
• Does not persist through restarts
• Like mongodb.log, but queryable
• Writes to this collection incur some cost
• Use db.system.profile.find() to get slow queries for
a certain collection, time range, execution time, etc
Thursday, June 20, 13
42. • Know what your tipping point looks like
• Don’t switch your primary or restart
• Do kill queries before the tipping point
• Write your kill script before you need it
• Don’t kill internal mongo operations, only queries.
... when queries pile up ...
Thursday, June 20, 13
43. can’t elect a master?
• Never run with an even number of votes (max 7)
• You need > 50% of votes to elect a primary
• Set your priority levels explicitly if you need
warmup
• Consider delegating voting to arbiters
• Set snapshot nodes to be nonvoting if possible.
• Check your mongo log. Is something vetoing? Do
they have an inconsistent view of the cluster state?
Thursday, June 20, 13
44. secondaries crashing?
• Some rare mongo bugs will cause all secondaries
to crash unrecoverably
• Never kill oplog tailers or other internal database
operations, this can also trash secondaries
• Arbiters are more stable than secondaries,
consider using them to form a quorum with your
primary
Thursday, June 20, 13
45. replication stops?
• Other rare bugs will stop replication or cause
secondaries to exit without a corrupt op
• The correct way to fix this is to re-snapshot off
the primary and rebuild your secondaries.
• However, you can sometimes *dangerously* repair
a secondary:
1. stop mongo
2. bring it back up in standalone mode
3. repair the offending collection
4. restart mongo again as part of the replica set
Thursday, June 20, 13
46. • Everything is getting vaguely slower?
• check your padding factor, try compaction
• You rs.remove() a node and get weird driver
errors?
• always shut down mongod after removing from replica set
• Huge background flush spike?
• probably an EBS or disk problem
• You run out of connection limits?
• possibly a driver bug
• hard-coded to 80% of soft ulimit until 20k is reached.
Thursday, June 20, 13
47. • It looks like all I/O stops for a while?
• check your mongodb.log for large newExtent warnings
• also make sure you aren’t reaching PIOPS limits
• You get weird driver errors after adding/removing/
re-electing?
• some drivers have problems with this, you may have to restart
Thursday, June 20, 13