MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
2. Data Challenge
“I want my data...”
• Now
• Secure
• All varieties
• Fast and interactive
• Scalable to “Big”
• Agile to develop and deploy operationally
• Cloud and edge
2
iStock licensed (pixelfit)
3. Scalability with MongoDB
Metric Meaning Examples
Operations per
Second
3
Concurrent reads and writes per
second
> 1 Million per second
Nodes per
Cluster
Horizontal scale-out, distributed to
multiple data centers worldwide, with
high availability, using inexpensive
cloud resources
> 1000 nodes
Records /
Documents
Data objects in any number of
schemas or structures
> 10 billion
Data Volume Total amount of data: documents X
size
> 1 Petabyte
= 10^15
= 1,000,000,000,000,000
≈ 2^50
7. Documents are Rich Data Structures
7
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: ‘+447557505611’
city: ‘London’,
location: [45.123,47.232],
Profession: [banking, finance, trader],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
}
}
Fields can contain an array of
sub-documents
Fields
Typed field values
Fields can
contain arrays
8. Document Model Benefits
• Agility and flexibility
8
– Data model supports business change
– Rapidly iterate to meet new requirements
• Intuitive, natural data representation
– Eliminates ORM layer
– Developers are more productive
• Reduces the need for joins, disk seeks
– Programming is more simple
– Performance delivered at scale
25. Foursquare
• 50M users.
• 6B check-ins to date (6M per day growth).
• 55M points of interest / venues.
• 1.7M merchants using the platform for marketing
• Operations Per Second: 300,000
• Documents: 5.5B (~16.5B with replication).*
29
26. Foursquare clusters
• 11 MongoDB clusters
30
– 8 are sharded
• Largest cluster for check-ins
• 15 shards (check ins)
• Shard key user_id
27. Facebook / parse.com mobile apps
• Persistent database for 270,000 mobile applications
• 200 M end-user mobile devices
• 250% annual growth in client apps
• 500% growth in requests
• 1.5 M collections
• Key differentiators:
31
– Document data model
– High perf. & avail.
– Geospatial query and index
• Charity Majors operations: j.mp/X3jVRC
– Understand your database and your data, and build for them.
36. Shard Key characteristics
41
• A good shard key has:
– sufficient cardinality
– distributed writes
– targeted reads ("query isolation")
• Shard key should be in every query if possible
– scatter gather otherwise
• Choosing a good shard key is important!
– affects performance and scalability
– changing it later is expensive
37. Hashed shard key
42
• Pros:
– Evenly distributed writes
• Cons:
– Random data (and index) updates can be IO intensive
– Range-based queries turn into scatter gather
Shard 1
mongos
Shard 2 Shard 3 Shard N
38. Low cardinality shard key
43
• Induces "jumbo chunks"
• Examples: boolean field
Shard 1
mongos
Shard 2 Shard 3 Shard N
[ a,
b )
41. Success Factors
• Storage: random seeks (IOPS)
• RAM: working set based on query patterns
• Query: indexing
• Delete: most expensive operation
• Real-time vs. bulk operations
• Continuity: HA, DR, backup, restore
• Agile process: iterate by powers of 4
• Sharding: shard key and strategy
• Resources: don’t go it alone!
46