3. Horizontal Scaling
• Vertical scaling is limited
• Hard to scale vertically in the cloud
• Can scale wider than higher
4. Replica Sets
• One master at any time
• Programmer determines if read hits master
or a slave
• Easy to setup to scale reads
5. db.people.find( { state : “NY” } ).addOption( SlaveOK )
• routed to a secondary automatically
• will use master if no secondary is available
6. Not Enough
• Writes don’t scale
• Reads are out of date on slaves
• RAM/Data Size doesn’t scale
7. Why Shard?
• Distribute write load
• Keep working set in RAM
• Consistent reads
• Preserve functionality
8. Sharding Design Goals
• Scale linearly
• Increase capacity with no downtime
• Transparent to the application
• Low administration to add capacity
9. Sharding and
Documents
• Rich documents reduce need for joins
• No joins makes sharding solvable
10. Basics
• Choose how you partition data
• Convert from single replica set to sharding
with no downtime
• Full feature set
• Fully consistent by default
12. Typical Basic Setup
Data Center Primary Data Center Secondary
S1 p=1 S1 p=1 S1 p=0
S2 p=1 S2 p=1 S2 p=0
S3 p=1 S3 p=1 S3 p=0
Config 1 Config 2 Config 2
mongos mongos mongos mongos
13. Range Based
• collection is broken into chunks by range
• chunks default to 64mb or 100,000 objects
14. Choosing a Shard Key
• Shard key determines how data is
partitioned
• Hard to change
• Most important performance decision
15. Use Case: Photos
{ photo_id : ???? , data : <binary> }
What’s the right key?
• auto increment
• MD5( data )
• month() + MD5(data)
16. Initial Loading
• System start with 1 chunk
• Writes will hit 1 shard and then move
• Pre-splitting for initial bulk loading can
dramatically improve bulk load time
17. Administering a
Cluster
• Do not wait too long to add capacity
• Need capacity for normal workload + cost
of moving data
• Stay < 70% operational capacity
18. Hardware
Considerations
• Understand working set and make sure it
can fit in RAM
• Choose appropriate sized boxes for shards
• Too small and admin/overhead goes up
• Too large, and you can’t add capacity
smoothly
19. Download MongoDB
http://www.mongodb.org
and let us know what you think
@eliothorowitz @mongodb
10gen is hiring!
http://www.10gen.com/jobs
20. Use Case: User Profiles
{ email : “eliot@10gen.com” ,
addresses : [ { state : “NY” } ]
}
• Shard by email
• Lookup by email hits 1 node
• Index on { “addresses.state” : 1 }
21. Use Case: Activity
Stream
{ user_id : XXX, event_id : YYY , data : ZZZ }
• Shard by user_id
• Looking up an activity stream hits 1 node
• Writing even is distributed
• Index on { “event_id” : 1 } for deletes
Notas del editor
\n
\n
\n
ec2 goes up to 64gb, maybe mention 256gb box here??? ($30-40k)\nmaybe can but 256gb box, but i spin up 10 ec2 64gb boxes in 10 minutes\n
\n
\n
\n
\n
\n
Don&#x2019;t pre-emptively shard - easy to add later\n