The document discusses various techniques for optimizing and scaling MongoDB deployments. It covers topics like schema design, indexing, monitoring workload, vertical scaling using resources like RAM and SSDs, and horizontal scaling using sharding. The key recommendations are to optimize the schema and indexes first before scaling, understand the workload, and ensure proper indexing when using sharding for horizontal scaling.
4. Premature Optimization
• There is no doubt that the grail of efficiency leads to abuse.
Programmers waste enormous amounts of time thinking about,
or worrying about, the speed of noncritical parts of their
programs, and these attempts at efficiency actually have a strong
negative impact when debugging and maintenance are
considered. We should forget about small efficiencies, say about
97% of the time: premature optimization is the root of all evil.
Yet we should not pass up our opportunities in that critical 3%.
- Donald Knuth, 1974
5. Premature Optimization
• "There is no doubt that the grail of efficiency leads to abuse.
Programmers waste enormous amounts of time thinking about,
or worrying about, the speed of noncritical parts of their
programs, and these attempts at efficiency actually have a strong
negative impact when debugging and maintenance are
considered. We should forget about small efficiencies, say about
97% of the time: premature optimization is the root of all evil.
Yet we should not pass up our opportunities in that critical 3%."
- Donald Knuth, 1974
6. Premature Optimization
• "There is no doubt that the grail of efficiency leads to abuse.
Programmers waste enormous amounts of time thinking about,
or worrying about, the speed of noncritical parts of their
programs, and these attempts at efficiency actually have a strong
negative impact when debugging and maintenance are
considered. We should forget about small efficiencies, say
about 97% of the time: premature optimization is the root of
all evil. Yet we should not pass up our opportunities in that
critical 3%."
- Donald Knuth, 1974
8. The Importance of Schema Design
• MongoDB schemas are built oppositely than relational
schemas!
• Relational Schema:
– normalize data
– write complex queries to join the data
– let the query planner figure out how to make queries efficient
• MongoDB Schema:
– denormalize the data
– create a (potentially complex) schema with prior knowledge of your
actual (not just predicted) query patterns
– write simple queries
9. Real World Example: Optimizing Schema for
Scale
Product catalog schema for retailer selling in 20 countries
{
_id: 375,
en_US: { name: …, description: …, <etc…> },
en_GB: { name: …, description: …, <etc…> },
fr_FR: { name: …, description: …, <etc…> },
fr_CA: { name: …, description: …, <etc…> },
de_DE: …,
de_CH: …,
<… and so on for other locales …>
}
10. What's good about this schema?
• Each document contains all the data about the
product across all possible locales.
• It is the most efficient way to retrieve all translations of
a product in a single query (English, French, German,
etc).
11. But that's not how the data was accessed
db.catalog.find( { _id: 375 }, { en_US: true } );
db.catalog.find( { _id: 375 }, { fr_FR: true } );
db.catalog.find( { _id: 375 }, { de_DE: true } );
… and so forth for other locales
The data model did not fit the access pattern.
12. Why is this inefficient?
Data in RED are
being used. Data in
BLUE take up
memory but are not in
demand.
{
_id: 375,
en_US: { name: …, description: …, <etc…> },
en_GB: { name: …, description: …, <etc…> },
fr_FR: { name: …, description: …, <etc…> },
fr_CA: { name: …, description: …, <etc…> },
de_DE: …,
de_CH: …,
<… and so on for other locales …>
}
{
_id: 42,
en_US: { name: …, description: …, <etc…> },
en_GB: { name: …, description: …, <etc…> },
fr_FR: { name: …, description: …, <etc…> },
fr_CA: { name: …, description: …, <etc…> },
de_DE: …,
de_CH: …,
<… and so on for other locales …>
}
13. Consequences of the schema
• Each document contained 20x more data than the
common use case requires
• Disk IO was too high for the relatively modest query
load on the dataset
• MongoDB lets you request a subset of a document's
contents via projection…
• … but the entire document must be loaded into RAM
to service the request
14. Consequences of the schema redesign
• Queries induced minimal memory overhead
• 20x as many distinct products fit in RAM at once
• Disk IO utilization reduced
• Application latency reduced
{
_id: "375-en_GB",
name: …,
description: …,
<… the rest of the document …>
}
15. Schema Design Patterns
• Pattern: pre-computing interesting quantities, ideally with each
write operation
• Pattern: putting unrelated items in different collections to take
advantage of indexing
• Anti-pattern: appending to arrays ad infinitum
• Anti-pattern: importing relational schemas directly into
MongoDB
16. Schema Design Tips
• Avoid inherently slow operations
– Updates of unindexed arrays of several thousand elements
– Updates of indexed arrays of several hundred elements
– Document moves
• Arrays are great, but know how to use them
17. Schema Design resources
• Blog series, "6 rules of thumb"
– Part 1: http://goo.gl/TFJ3dr
– Part 2: http://goo.gl/qTdGhP
– Part 3: http://goo.gl/JFO1pI
18. Indexing
• Indexes are tree-structured sets of references to your
documents
• Indexes are the single biggest tunable performance factor in
the database
• Indexing and schema design go hand in hand
19. Indexing Mistakes
• Failing to build necessary indexes
• Building unnecessary indexes
• Running ad-hoc queries in production
20. Indexing Fixes
• Failing to build necessary indexes
– Run .explain(), examine slow query log, mtools, system.profile
collection
• Building unnecessary indexes
– Talk to your application developers about usage
• Running ad-hoc queries in production
– Use a staging environment, use secondaries
24. mtools
• http://github.com/rueckstiess/mtools
• log file analysis for poorly performing queries
– Show me queries that took more than 1000 ms from 6 am to 6 pm:
– mlogfilter mongodb.log --from 06:00 --to 18:00 --slow
1000 > mongodb-filtered.log
29. But there's an index!?!
db.system.indexes.find().toArray()
[{
"v" : 1,
"key" : {
"company" : 1,
"employeeId" : 1
},
"ns" : "test.docs",
"name" : "company_1_employeeId_1"
}]
This isn't
the index
you're
looking for.
30. Did you see the problem?
{
_id: ObjectId("53b9ab7e939f1e229b4f574c"),
firstName: "Alice",
lastName: "Smith",
parent: {
company: 22794,
employeeId: 83881
}
}
31. The index was created incorrectly
db.system.indexes.find().toArray()
[{
"v" : 1,
"key" : {
"parent.company" : 1,
"parent.employeeId" : 1
},
"ns" : "test.docs",
"name" :
"parent.company_1_parent.employeeId_1"
}]
Subdocument
needed
32. Indexing Strategies
• Create indexes that support your queries!
• Create highly selective indexes
• Eliminate duplicate indexes with a compound index, if possible
– db.collection.ensureIndex({A:1, B:1, C:1})
– allows queries using leftmost prefix
• Order compound index fields thusly: equality, sort, then range
– see http://emptysqua.re/blog/optimizing-mongodb-compound-indexes/
• Create indexes that support covered queries
• Prevent collection scans in pre-production environments
– mongod --notablescan
– db.getSiblingDB("admin").runCommand( { setParameter: 1, notablescan: 1 } )
33. Monitoring Your Workload
• Log files, iostat, mtools, mongotop are for debugging
• MongoDB Management Service (MMS) can do metrics
collection and reporting
38. Cloud Version of MMS
1. Go to http://mms.mongodb.com
2. Create an account
3. Install one agent in your datacenter
4. Add hosts from the web interface
5. Enjoy!
42. RAM - Measure your working set and index
sizes
• db.serverStatus({workingSet:1}).workingSet
{ "computationTimeMicros": 2751,
"note": "thisIsAnEstimate",
"overSeconds": 1084,
"pagesInMemory": 2041
}
• db.stats().indexSize
2032880640
• In this example,
(2041 * 4096) + 2032880640 = 2041240576 bytes
= 1.9 GB
• Note: this is a subset of the virtual memory used by mongod
43. Real World Example: Vertical Scaling
• System that tracked status information for entities in the
business
• State changes happen in batches; sometimes 10% of entities
get updated, sometimes 100% get updated
45. Adding shards to scale horizontally
• Application was a success! Business entities grew by a factor of
5
• Cluster capacity multiplied by 5, but so did the TCO
Application / mongos
mongod
…16 more shards…
46. More success means more shards
• 10x growth means … 200 shards
• Horizontal scaling with sharding is linear scaling, but an order
of magnitude was needed
• Bulk updates of random documents approaches speed of
disks
47. Final architecture
• Scaling the random IOPS with SSDs was a vertical scaling
approach
Application / mongos
mongod SSD
48. Before you add hardware…
• Make sure you are solving the right scaling problem
• Remedy schema and index problems first
– schema and index problems can look like hardware problems
• Tune the Operating System
– ulimits, swap, NUMA, NOOP scheduler with hypervisors
• Tune the IO subsystem
– ext4 or XFS vs SAN, RAID10, readahead, noatime
• See MongoDB "production notes" page
• Heed logfile startup warnings
49. Today’s Webinar Agenda
Achieve Scale
1 Optimization Tips
2 Scale Vertically
The Horizontal Basics of Sharding
Scaling
3
53. Rule of Thumb
To make good decisions about
MongoDB implementations, you
must understand MongoDB and your
applications and the workload your
applications generate and your
business requirements.
54. Summary
• Don't throw hardware at the problem until you examine all
other possibilities (schema, indexes, OS, IO subsystem)
• Know what is considered "normal" performance by monitoring
• Horizontal scaling in MongoDB is implemented with sharding,
but you must understand schema design and indexing before
you shard
Sharding a sub-optimally designed
database will not make it performant
55. Today’s Webinar Agenda
Achieve Scale
1 Optimization Tips
The Horizontal Basics of Sharding
Scaling
3
Schema Design
Indexes
Monitoring your Workload
2 Scale Vertically
56. Limited Time: Get Expert Advice for Free
If you’re thinking about
scaling, why reinvent the
wheel?
Our experts can collaborate
with you to provide detailed
guidance.
Sign Up For a Free One Hour
Consult:
http://bit.ly/1rkXcfN
57. Questions?
Stay tuned after the webinar and take our survey
for your chance to win MongoDB schwag.
58. Thank You
Jake Angerman
Sr. Solutions Architect, MongoDB
Notas del editor
trap: concern about correctness overrides optimization at scale
importing a relational schema directly into MongoDB is an anti-pattern!
different parts of the world are awake and shopping at a given time
Anti-pattern: embedding highly volatile data in an array
these may look like performance tips instead of schema design tips
sub-optimal query might be $unwind followed by $match instead of projection
100ms threshold by default
shard key aside
Indexes should be contained in working set.
In this case I had a 50GB database but only ~2GB were needed in RAM
this applies to both vertical and horizontal scaling
The order presented is the order you should analyze