MongoDB Schema Design Tips & Tricks

MongoDB Schema Design
Tips & Tricks
Grupo Undanet
August 2017, Salamanca

Who am I
Juan Roy
Twitter: @juanroycouto
Email: juanroycouto@gmail.com
MongoDB DBA at Grupo Undanet
2

Agenda
● What is MongoDB
● What is a JSON Document
● What a Document Must Contain
● Relational Approach vs
Document Model
● Normalization vs
Denormalization
● Embedding Documents
● Things to Keep in Mind
● Goals
● Over Normalization
3
● Overloaded Documents
● Working Set
● Historic Information
● 1-1
● 1-Few (Embedding & Referencing)
● N-1
● 1-Many
● Many-Many
● Recap

What is MongoDB
● Non-Relational Database
● NoSQL Multipurpose Database
● Main Characteristics:
○ Scalability
○ High Availability
○ Automatic Failover
○ …
● Document-based (JSON)
4
SQL MongoDB
Database Database
Table Collection
Register Document

What is a JSON Document
5
{
"_id" : ObjectId("59400587962fe33db2194129"),
"description" : "MICHELIN 285/30 ZR21 PILOT SUPER SPORT 2012",
"date" : ISODate("2017-08-28T04:02:32Z"),
"property" : {
"tag" : {
"noisebands" : "1",
"rollingresistance" : "B",
"noise" : "69",
"wetgrip" : "A"
},
"ratio" : 30,
},
"ecotasa" : [
{
"country" : "724",
"price" : NumberDecimal("1.380000"),
},
{
"country" : "620",
}
],
"location" : {
"type" : Point,
"coordinates" : [ -5.724332, 40.959219 ]
}
}
_id
string
array
date
subdocument
geo-location
number

What a Document must Contain
● Ideally
○ All (principal application) item-related data
○ 1 Doc per Item
6
Application Principal Item
Catalog Article
Finance Client
● Really
○ Most frequently accessed data

Relational Approach vs Document Model
7
{
"_id" : ObjectId("59400587962fe33db2194129"),
"description" : "MICHELIN 285/30 ZR21 PILOT SUPER SPORT 2012",
"date" : ISODate("2017-08-28T04:02:32Z")
"property" : {
"tag" : {
"noisebands" : "1",
"rollingresistance" : "B",
"noise" : "69",
"wetgrip" : "A"
},
"ratio" : "30",
},
"ecotasa" : [
{
"country" : "724",
},
{
"country" : "620",
}
],
"location" : {
"type" : Point,
"coordinates" : [ -5.724332, 40.959219 ]
}
}

Normalization vs Denormalization
8
People
{
_id : 1,
name : 'Peter',
city : 'Salamanca'
}
Motorbikes
{
_id : 1,
owner : 1,
color : 'red',
model : 'Suzuki'
}
{
_id : 2,
owner : 1,
color : 'black',
model : 'Harley Davidson'
}
People
{
_id : 1,
name : 'Peter',
city : 'Salamanca',
motorbikes : [
{
model : 'Suzuki',
color : 'red'
},
{
model : 'Harley Davidson',
color : 'black'
}
]
}
Denormalization
Normalization

Embedding Documents
9
People
{
_id : 1,
name : 'Peter',
city : 'Salamanca'
}
Motorbikes
{
_id : 1,
owner : 1,
color : 'red',
model : 'Suzuki'
}
{
_id : 2,
owner : 1,
color : 'black',
model : 'Harley Davidson'
}
People
{
_id : 1,
name : 'Peter',
city : 'Salamanca',
motorbikes : [
{
model : 'Suzuki',
color : 'red'
},
{
model : 'Harley Davidson',
color : 'black'
}
]
}

Things to Keep in Mind
10
● Avoid Relational Approach
● What will happen if we scale
● Size of:
○ Data
○ Index
○ Document
● How will users access the data
○ Normal users
○ Machine Learning
○ Business Intelligence

Goals
11
● Performance
● Scalability
● Simplicity

Over Normalization
● The relational model has been moved directly to the MongoDB model.
● In the relational world is common to have one table per concept. They do not
have arrays.
● Only one action implies multiple queries, instead of just querying the data
once.
12

Overloaded Documents
● This problem can arise if the application is packing lots of rarely used data
into its frequently accessed documents.
● If your application is packing rarely used data into a document that needs to
be touched frequently, that means it is more likely to evict other important
data from the cache when that document gets read.
● Multiply this across a collection and the net result is that the server could be
paging a lot more data than necessary in order to service the application.
13

Working Set
14
The Working Set is the size of:
● Our Data *
plus
● Our Indexes
* But only the size of our most accessed data
The Working Set must fit in RAM!

Working Set
15
The Working Set does not fit in RAM, what should I do?
● Add more RAM to our machine
● Shard
● Reduce the size of our Working Set:
○ Limit our arrays
○ Limit our embedded documents
○ …
○ Benefits:
■ Fast data retrieval
■ One query brings all the information needed

Historic Information
16
● When our data grows up continuously (historical) and we embed them on our
main collection, our document will own a lot of information not needed
habitually. But maybe, I want to store that for analytics purposes. So we’ll
keep it away from the user document.
● That is not the case of information with a limited growth (addresses, phone
numbers, etc).

1-1
17
id name phone_number zip_code
1 Rick 555-111-1234 01209
2 Mike 555-222-2345 30062
Users
{
_id : 1,
name : 'Rick',
phone_number : '555-111-1234',
zip_code : '01209'
}
{
_id : 2,
name : 'Mike',
phone_number : '555-222-2345',
zip_code : '30062'
}

1-Few
18
● Referencing (or Normalization)
○ To show a user’s information we need to do joins (or more than one query), this implies
random seeks, a very low-performance operation!
● Embedding (or Denormalization)
○ We can avoid joins via denormalization. This implies redundancy data and more complex
applications for not to generate inconsistencies.
○ Arrays help us to get no redundancy. This solution gives us perform benefits.
○ With denormalization, we have a lot of data model possibilities and this makes more difficult to
define our model.

1-Few
19
id name zip_code
1 Rick 01209
2 Mike 30062
id user_id phone_number
1 1 555-111-1234
2 2 555-222-2345
3 2 555-333-3456

1-Few (MongoDB-Embedding)
● The approach that gives us the best performance and data consistency guarantees.
● Locality: MongoDB stores documents contiguously on disk, putting all the data you
need into one document means that you’re never more than one seek away from
everything you need.
● Atomicity and Isolation: Embedding we get atomicity (transactionality).
20
{
_id : 2,
name : 'Mike',
zip_code : '30062',
phone_numbers : [ '555-222-2345', '555-333-3456' ]
}

1-Few (MongoDB-Referencing)
21
{
_id : 2,
name : 'Mike',
zip_code : '30062',
phone_numbers : [ 2, 3 ]
}
{
_id : 2,
user_id : 2,
phone_number : '555-222-2345'
}
{
_id : 3,
user_id : 2,
phone_number : '555-333-3456'
}
● Referencing we lose transactionality.
● We need:
○ More than one query
○ To use $lookup (joins)
● This approach is worst than embedding
for performance.
● If we have to read our data frequently is
better to embed it.
● Flexibility in order to project desired
fields.

N-1
22
{
_id : 2,
name : 'Mike',
zip_code : '30062',
phone_numbers : [ 2, 3 ],
address : '13, Rue del Percebe'
}
{
_id : 1,
name : 'Rick',
zip_code : '01209',
phone_numbers : [ 2, 3 ],
address : '13, Rue del Percebe'
}
What if two people share an address?
● Does that mean that you have to
store the address twice? Yes, you
do have to store it twice, three
times, etc.
● This is better than make
unnecessary joins. This extra
space on the disk you are going to
need will make your queries faster.

1-Many
Case: A blog with hundreds, or even thousands, of comments for a given post.
Embedding carries significant penalties:
● The larger a document is, the more RAM it uses. The fewer documents in RAM, the more likely the
server is to page fault to retrieve documents, and ultimately page faults lead to random disk I/O.
● Growing documents must eventually be copied to larger spaces.
● The document never stops growing up.
● MongoDB documents have a hard size limit of 16MB.
Referencing:
● The document will not grow up because we will have one document per comment in a second
collection.
● For very high or unpredictable one-to-many relationships.
Solution: We may only wish to display the first three comments when showing a blog entry, more is simply
wasting RAM.
23

Many-Many
● We will embed a list of _id values in both directions
● We no longer have redundant information
24
Product
{ _id : 'My product',
category_ids : [ 'My category',... ]
}
Category
{ _id : 'My category',
product_ids : [ 'My product', … ]
}

Recap
● Avoid round trips to the database.
● User events should only generate a small number of queries.
● Use arrays when needed and of course when they won’t grow indefinitely.
● Don’t just migrate relational schemas.
● Data that is queried together should be in the same document whenever possible.
● Store the last login time, plus the shopping cart, in the user document since that is all
we need for the landing page.
● Embedding for performance and atomicity (transactionality).
● Referencing for huge relationships.
Ultimately, the decision depends on the access patterns of your application.
25

Questions?
26

Thank you!
Thank you for your attention!
27

MongoDB Schema Design Tips & Tricks

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a MongoDB Schema Design Tips & Tricks

Similar a MongoDB Schema Design Tips & Tricks (20)

Último

Último (20)

MongoDB Schema Design Tips & Tricks