2. Who am I
Juan Roy
Twitter: @juanroycouto
Email: juanroycouto@gmail.com
MongoDB DBA at Grupo Undanet
2
3. Agenda
MongoDB Schema Design
● What is MongoDB
● What is a JSON Document
● What a Document Must Contain
● Relational Approach vs
Document Model
● Normalization vs
Denormalization
● Embedding Documents
● Things to Keep in Mind
● Goals
● Over Normalization
3
● Overloaded Documents
● Working Set
● Historic Information
● 1-1
● 1-Few (Embedding & Referencing)
● N-1
● 1-Many
● Many-Many
● Recap
4. What is MongoDB
MongoDB Schema Design
● Non-Relational Database
● NoSQL Multipurpose Database
● Main Characteristics:
○ Scalability
○ High Availability
○ Automatic Failover
○ …
● Document-based (JSON)
4
SQL MongoDB
Database Database
Table Collection
Register Document
5. What is a JSON Document
MongoDB Schema Design
5
{
"_id" : ObjectId("59400587962fe33db2194129"),
"description" : "MICHELIN 285/30 ZR21 PILOT SUPER SPORT 2012",
"date" : ISODate("2017-08-28T04:02:32Z"),
"property" : {
"tag" : {
"noisebands" : "1",
"rollingresistance" : "B",
"noise" : "69",
"wetgrip" : "A"
},
"ratio" : 30,
},
"ecotasa" : [
{
"country" : "724",
"price" : NumberDecimal("1.380000"),
},
{
"country" : "620",
"price" : NumberDecimal("0.000000"),
}
],
"location" : {
"type" : Point,
"coordinates" : [ -5.724332, 40.959219 ]
}
}
_id
string
array
date
subdocument
geo-location
number
6. What a Document must Contain
MongoDB Schema Design
● Ideally
○ All (principal application) item-related data
○ 1 Doc per Item
6
Application Principal Item
Catalog Article
Finance Client
● Really
○ Most frequently accessed data
8. Normalization vs Denormalization
MongoDB Schema Design
8
People
{
_id : 1,
name : 'Peter',
city : 'Salamanca'
}
Motorbikes
{
_id : 1,
owner : 1,
color : 'red',
model : 'Suzuki'
}
{
_id : 2,
owner : 1,
color : 'black',
model : 'Harley Davidson'
}
People
{
_id : 1,
name : 'Peter',
city : 'Salamanca',
motorbikes : [
{
model : 'Suzuki',
color : 'red'
},
{
model : 'Harley Davidson',
color : 'black'
}
]
}
Denormalization
Normalization
9. Embedding Documents
MongoDB Schema Design
9
People
{
_id : 1,
name : 'Peter',
city : 'Salamanca'
}
Motorbikes
{
_id : 1,
owner : 1,
color : 'red',
model : 'Suzuki'
}
{
_id : 2,
owner : 1,
color : 'black',
model : 'Harley Davidson'
}
People
{
_id : 1,
name : 'Peter',
city : 'Salamanca',
motorbikes : [
{
model : 'Suzuki',
color : 'red'
},
{
model : 'Harley Davidson',
color : 'black'
}
]
}
10. Things to Keep in Mind
MongoDB Schema Design
10
● Avoid Relational Approach
● What will happen if we scale
● Size of:
○ Data
○ Index
○ Document
● How will users access the data
○ Normal users
○ Machine Learning
○ Business Intelligence
12. Over Normalization
MongoDB Schema Design
● The relational model has been moved directly to the MongoDB model.
● In the relational world is common to have one table per concept. They do not
have arrays.
● Only one action implies multiple queries, instead of just querying the data
once.
12
13. Overloaded Documents
MongoDB Schema Design
● This problem can arise if the application is packing lots of rarely used data
into its frequently accessed documents.
● If your application is packing rarely used data into a document that needs to
be touched frequently, that means it is more likely to evict other important
data from the cache when that document gets read.
● Multiply this across a collection and the net result is that the server could be
paging a lot more data than necessary in order to service the application.
13
14. Working Set
MongoDB Schema Design
14
The Working Set is the size of:
● Our Data *
plus
● Our Indexes
* But only the size of our most accessed data
The Working Set must fit in RAM!
15. Working Set
MongoDB Schema Design
15
The Working Set does not fit in RAM, what should I do?
● Add more RAM to our machine
● Shard
● Reduce the size of our Working Set:
○ Limit our arrays
○ Limit our embedded documents
○ …
○ Benefits:
■ Fast data retrieval
■ One query brings all the information needed
16. Historic Information
MongoDB Schema Design
16
● When our data grows up continuously (historical) and we embed them on our
main collection, our document will own a lot of information not needed
habitually. But maybe, I want to store that for analytics purposes. So we’ll
keep it away from the user document.
● That is not the case of information with a limited growth (addresses, phone
numbers, etc).
17. 1-1
MongoDB Schema Design
17
id name phone_number zip_code
1 Rick 555-111-1234 01209
2 Mike 555-222-2345 30062
Users
{
_id : 1,
name : 'Rick',
phone_number : '555-111-1234',
zip_code : '01209'
}
{
_id : 2,
name : 'Mike',
phone_number : '555-222-2345',
zip_code : '30062'
}
18. 1-Few
MongoDB Schema Design
18
● Referencing (or Normalization)
○ To show a user’s information we need to do joins (or more than one query), this implies
random seeks, a very low-performance operation!
● Embedding (or Denormalization)
○ We can avoid joins via denormalization. This implies redundancy data and more complex
applications for not to generate inconsistencies.
○ Arrays help us to get no redundancy. This solution gives us perform benefits.
○ With denormalization, we have a lot of data model possibilities and this makes more difficult to
define our model.
19. 1-Few
MongoDB Schema Design
19
id name zip_code
1 Rick 01209
2 Mike 30062
id user_id phone_number
1 1 555-111-1234
2 2 555-222-2345
3 2 555-333-3456
20. 1-Few (MongoDB-Embedding)
MongoDB Schema Design
● The approach that gives us the best performance and data consistency guarantees.
● Locality: MongoDB stores documents contiguously on disk, putting all the data you
need into one document means that you’re never more than one seek away from
everything you need.
● Atomicity and Isolation: Embedding we get atomicity (transactionality).
20
{
_id : 2,
name : 'Mike',
zip_code : '30062',
phone_numbers : [ '555-222-2345', '555-333-3456' ]
}
21. 1-Few (MongoDB-Referencing)
MongoDB Schema Design
21
{
_id : 2,
name : 'Mike',
zip_code : '30062',
phone_numbers : [ 2, 3 ]
}
{
_id : 2,
user_id : 2,
phone_number : '555-222-2345'
}
{
_id : 3,
user_id : 2,
phone_number : '555-333-3456'
}
● Referencing we lose transactionality.
● We need:
○ More than one query
○ To use $lookup (joins)
● This approach is worst than embedding
for performance.
● If we have to read our data frequently is
better to embed it.
● Flexibility in order to project desired
fields.
22. N-1
MongoDB Schema Design
22
{
_id : 2,
name : 'Mike',
zip_code : '30062',
phone_numbers : [ 2, 3 ],
address : '13, Rue del Percebe'
}
{
_id : 1,
name : 'Rick',
zip_code : '01209',
phone_numbers : [ 2, 3 ],
address : '13, Rue del Percebe'
}
What if two people share an address?
● Does that mean that you have to
store the address twice? Yes, you
do have to store it twice, three
times, etc.
● This is better than make
unnecessary joins. This extra
space on the disk you are going to
need will make your queries faster.
23. 1-Many
MongoDB Schema Design
Case: A blog with hundreds, or even thousands, of comments for a given post.
Embedding carries significant penalties:
● The larger a document is, the more RAM it uses. The fewer documents in RAM, the more likely the
server is to page fault to retrieve documents, and ultimately page faults lead to random disk I/O.
● Growing documents must eventually be copied to larger spaces.
● The document never stops growing up.
● MongoDB documents have a hard size limit of 16MB.
Referencing:
● The document will not grow up because we will have one document per comment in a second
collection.
● For very high or unpredictable one-to-many relationships.
Solution: We may only wish to display the first three comments when showing a blog entry, more is simply
wasting RAM.
23
24. Many-Many
MongoDB Schema Design
● We will embed a list of _id values in both directions
● We no longer have redundant information
24
Product
{ _id : 'My product',
category_ids : [ 'My category',... ]
}
Category
{ _id : 'My category',
product_ids : [ 'My product', … ]
}
25. Recap
MongoDB Schema Design
● Avoid round trips to the database.
● User events should only generate a small number of queries.
● Use arrays when needed and of course when they won’t grow indefinitely.
● Don’t just migrate relational schemas.
● Data that is queried together should be in the same document whenever possible.
● Store the last login time, plus the shopping cart, in the user document since that is all
we need for the landing page.
● Embedding for performance and atomicity (transactionality).
● Referencing for huge relationships.
Ultimately, the decision depends on the access patterns of your application.
25