6. Outline
I. Schema design
‣ Relational vs. Document-oriented
‣ Schema-less design
‣ Case study: Publishers & Subscribers
II. Lessons learned for schema design
III. Things to remember about MongoDB
Saturday, May 5, 12
7. Outline
I. Schema design
‣ Relational vs. Document-oriented
‣ Schema-less design
‣ Case study: Publishers & Subscribers
II. Lessons learned for schema design
III. Things to remember about MongoDB
Saturday, May 5, 12
8. Relational vs. Document-
oriented
Users
{ id: 1,
Users Graph name: “Robert”,
from:[2],
id name from to
to: [5,20]}
vs
1 5
1 Robert
1 20 { id: 2,
2 Monica name:”Monica”,
2 1
3 Lucas from:[23],
2 5 to:[1,5]}
... ... ... ...
...
Saturday, May 5, 12
9. Find all the “to” edges for user 5
Graph
from to
Users
1 5 Blocks { id: 5,
name: “Robert”,
vs
1 20 from:[1,2,4],
2 1 to: [1,20,3,7,2]}
2 5
1 disk se
3 4 ek
guarante
3 23 ed !
ny
3 12
4 5 ma
as s
... ... lly s a
tia eek
P ten k s
o is es!
d ”e dg
“to
Saturday, May 5, 12
10. Advantages of doc-oriented schema
•Avoid joins
•Disk locality when fetching relations (everything
is stored within a doc record)
Considerations for schema design
•N to Many relations == Lists
•Denormalization is more common
Saturday, May 5, 12
11. Outline
I. Schema design
‣ Relational vs. Document-oriented
‣ Schema-less design
‣ Case study: Publishers & Subscribers
II. Lessons learned for schema design
III. Things to remember about MongoDB
Saturday, May 5, 12
12. Schema-less design
{id: 1, network: Twitter, name: “Robert”,
from:[2], to: [5,20], screenName: “robertE”}
{id: 2, network: Facebook, name:”Maria”,
from:[23], to:[1,5], likes: [“biking”, “hiking”]}
...
he sche maless
L ev erage t but put
ture of Mongo,
na
n with ty p e s i n
p rotectio
you r code!
Saturday, May 5, 12
13. Outline
I. Schema design
‣ Relational vs. Document-oriented
‣ Schema-less design
‣ Case study: Publishers & Subscribers
II. Lessons learned for schema design
III. Things to remember about MongoDB
Saturday, May 5, 12
14. Read-Friendly
Case Study: Publishers & Subscribers
Saturday, May 5, 12
22. Hybrid Approach
db.posts.find({recipients: uId})
Sharding Key:
random :)
Fast writes, slim storage,
reasonable read speed
Saturday, May 5, 12
23. Random sharding is not
random! t he
Best -- Impossible for our data ize disk
nim of
Mi e r
b r sha rd!
num pe
seeks
Worse
Optimal solution
Saturday, May 5, 12
24. Outline
I. Schema design
II. Lessons learned for schema design
‣ Indexes
‣ Concurrency
‣ Reducing collection size
III. Things to remember about MongoDB
Saturday, May 5, 12
25. Outline
I. Schema design
II. Lessons learned for schema design
‣ Indexes
‣ Concurrency
‣ Reducing collection size
III. Things to remember about MongoDB
Saturday, May 5, 12
26. Indexes
Primary Key
link: { ral
atu e
a n f th
_id: ObjectId(...),
url: “www.jetlore.com”,
has content”,
title: “Jetlore is a search platform for social ad o
ata ste
r d t in tId
you se i
description: “...”
j ec
} If
, u fault Ob
PK de
link: {
_id: “www.jetlore.com”,
title: “Jetlore is a search platform for social content”,
description: “...”
}
Saturday, May 5, 12
27. Indexes
Augment your schema to enable the
most selective index
ount”
ik esC
w “l
post: {
a ne ient
s: 1
,
_id: ObjectId(...),
recipients: [...], Add r ec ip
ex ( {
likes: [...], fie ld! r eInd
likesCount: ..., s.e nsu )
p ost nt: -1}
...} db. Cou
s
lik e
Want all posts that a user can view sorted by
the number of likes
Saturday, May 5, 12
28. Indexes
Make sure to use the proper index
db.posts.find({recipients: uId}).sort({date: -1})
ith
tw
tes ()
a y s lain
db.posts.ensureIndex({recipients: 1}) Alw exp
db.posts.ensureIndex({date: 1})
vs date: -1
db.posts.ensureIndex({recipients: 1, date:1})
Saturday, May 5, 12
29. Outline
I. Schema design
II. Lessons learned for schema design
‣ Indexes
‣ Concurrency
‣ Reducing collection size
III. Things to remember about MongoDB
Saturday, May 5, 12
31. Concurrency
Atomic Commutative Operators
db.users.update({_id: u1}, {$pull {to: u2}})
db.posts.update({_id: pId}, {$inc: {likesCount: 1}})
When updating lists and counters, instead of
using $set, rely on
$inc, $addToSet, $pull
Saturday, May 5, 12
32. Concurrency
No Transactions
user1: { _id: u1,
User1 wants to
to: [u2, u3], unsubscribe from user2.
from: [...], ...}
user2: { _id: u2, Ideally we would update
to: [...],
from: [u1, ...], ...}
both users in one
transaction ur
yo
ti t in
en e
lem c o d
I mp
Saturday, May 5, 12
33. Outline
I. Schema design
II. Lessons learned for schema design
‣ Indexes
‣ Concurrency
‣ Reducing collection size
III. Things to remember about MongoDB
Saturday, May 5, 12
34. Reducing collection size
Name your fields with short
names!
post: {
owner: ObjectId,
messageText: “loving Jetlore”,
mediaUrl: “www.jetlore.com”,
mediaTitle: “Jetlore is a user analytics & search platform for social content”
}
vs
post: {
o: ObjectId,
t: “loving Jetlore”,
mu: “www.jetlore.com”,
mt: “Jetlore is a user analytics & search platform for social content”
}
Saturday, May 5, 12
35. Outline
I. Schema design
II. Lessons learned for schema design
III. Things to remember about MongoDB
‣ Single lock
‣ ($or + sort) query doesn’t use indexes properly
‣ Indexes with 2 list fields
‣ Record iterators + update
Saturday, May 5, 12
36. $or & sort query doesn’t use the proper
index
db.posts.find({$or: [{recipients: uId}, {privacy: Public}]}).sort({date: -1})
db.posts.ensureIndex({recipients: 1, date: -1})
db.posts.ensureIndex({privacy: 1, date: -1})
Indexes with 2 list fields
post: { _id: ObjectId(...),
recipients: [...],
db.posts.ensureIndex({recipients: 1, links: 1})
links: [...],
... }
Saturday, May 5, 12
37. Record iterators +
updating
var posts = db.posts.find().skip(n).limit(t)
while (posts.hasNext()) {
var post = posts.next()
db.posts.update({_id: post._id}, {$set: {text: NewText}})
}
Sort by a field that will not change
or rename the old collection
var posts = db.posts.find().sort({date: 1}).skip(n).limit(t)
db.posts.renameCollection(“oldPosts”)
var posts = db.oldPosts.find().skip(n).limit(t)
while (posts.hasNext()) {
var post = posts.next()
db.posts.update({_id: post._id}, {$set: {text: NewText}})
}
Saturday, May 5, 12
38. The take aways
I. What is more important?
• Writes: Optimize for easy inserts/updates
• Reads: Optimize for easy querying
II. Denormalize to enable the most selective index
III. Concurrency: design to leverage commutative
operators
Saturday, May 5, 12
39. Thank you!
Try our tech
powered by
Saturday, May 5, 12