3. Topics
Introduction
• Working with documents
• Evolving a schema
• Queries and indexes
• Rich Documents
Common patterns
• Single table inheritance
• One-to-Many & Many-to-Many
• Trees
• Queues
4. Ways to model data:
http://www.flickr.com/photos/42304632@N00/493639870/
10. A simple start
post = {author: "Hergé",
date: new Date(),
text: "Destination Moon",
tags: ["comic", "adventure"]}
> db.blog.save(post)
Map the documents to your application.
11. Find the document
> db.blog.find()
{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
author: "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
text: "Destination Moon",
tags: [ "comic", "adventure" ]
}
Note:
• _id must be unique, but can be anything you'd like
• Default BSON ObjectId if one is not supplied
12. Add an index, find via index
> db.blog.ensureIndex({author: 1})
> db.blog.find({author: 'Hergé'})
{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
author: "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
...
}
Secondary index on "author"
14. Multi-key indexes
// Build an index on the 'tags' array
> db.blog.ensureIndex({tags: 1})
// find posts with a specific tag
// (This will use an index!)
> db.blog.find({tags: 'comic'})
{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
author: "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
...
}
20. The 'dot' operator
// create index on nested documents:
> db.blog.ensureIndex({"comments.author": 1})
> db.blog.find({"comments.author":"Chris"})
{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
author: "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
...
}
21. The 'dot' operator
// create index comment votes:
> db.blog.ensureIndex({"comments.votes": 1})
// find all posts with any comments with
// more than 50 votes
> db.blog.find({"comments.votes": {$gt: 50}})
22. The 'dot' operator
// find last 5 posts:
> db.blog.find().sort({"date":-1}).limit(5)
// find the top 10 commented posts:
> db.blog.find().sort({"comments_count":-1}).limit(10)
When sorting, check if you need an index...
36. One to Many
Embedded Array / Array Keys
• $slice operator to return subset of array
• some queries hard
e.g find latest comments across all documents
37. One to Many
Embedded Array / Array Keys
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
text : "Destination Moon",
tags : [ "comic", "adventure" ],
comments : [{
author : "Chris",
date : ISODate("2012-01-23T14:31:53.848Z"),
text : "great book",
votes : 5
}],
comments_count: 1
}
38. One to Many
Embedded Array / Array Keys
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
text : "Destination Moon",
tags : [ "comic", "adventure" ],
comments : [{
author : "Chris",
date : ISODate("2012-01-23T14:31:53.848Z"),
text : "great book",
votes : 5
}],
comments_count: 1
}
39. One to Many
Embedded Array / Array Keys
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
text : "Destination Moon",
tags : [ "comic", "adventure" ],
comments : [{
author : "Chris",
date : ISODate("2012-01-23T14:31:53.848Z"),
text : "great book",
votes : 5
}],
comments_count: 1
}
54. Trees
Embedded Tree
{ comments : [{
author : "Chris", text : "...",
replies : [{
author : "Fred", text : "..."
replies : [],
}]
}]
}
Pros: Single Document, Performance, Intuitive
Cons: Hard to search, Partial Results, 16MB limit
55. Array of Ancestors
A B C
// Store all ancestors of a node
{ _id: "a" } E D
{ _id: "b", thread: [ "a" ], replyTo: "a" }
{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F
{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }
{ _id: "e", thread: [ "a" ], replyTo: "a" }
{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }
56. Array of Ancestors
A B C
// Store all ancestors of a node
{ _id: "a" } E D
{ _id: "b", thread: [ "a" ], replyTo: "a" }
{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F
{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }
{ _id: "e", thread: [ "a" ], replyTo: "a" }
{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }
// find all threads where 'b" is in
> db.msg_tree.find({"thread": "b"})
57. Array of Ancestors
A B C
// Store all ancestors of a node
{ _id: "a" } E D
{ _id: "b", thread: [ "a" ], replyTo: "a" }
{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F
{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }
{ _id: "e", thread: [ "a" ], replyTo: "a" }
{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }
// find all threads where 'b" is in
> db.msg_tree.find({"thread": "b"})
// find all direct message "b: replied to
> db.msg_tree.find({"replyTo": "b"})
58. Array of Ancestors
A B C
// Store all ancestors of a node
{ _id: "a" } E D
{ _id: "b", thread: [ "a" ], replyTo: "a" }
{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F
{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }
{ _id: "e", thread: [ "a" ], replyTo: "a" }
{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }
// find all threads where 'b" is in
> db.msg_tree.find({"thread": "b"})
// find all direct message "b: replied to
> db.msg_tree.find({"replyTo": "b"})
//find all ancestors of f:
> threads = db.msg_tree.findOne({"_id": "f"}).thread
> db.msg_tree.find({"_id ": { $in : threads})
59. Array of Ancestors
Store hierarchy as a path expression
• Separate each node by a delimiter, e.g. "/"
• Use text search for find parts of a tree
{ comments: [
{ author: "Kyle", text: "initial post",
path: "" },
{ author: "Jim", text: "jim’s comment",
path: "jim" },
{ author: "Kyle", text: "Kyle’s reply to Jim",
path : "jim/kyle"} ] }
// Find the conversations Jim was part of
> db.blogs.find({path: /^jim/i})
61. Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
// Queue document
{ in_progress: false,
priority: 1,
message: "Rich documents FTW!"
...
}
62. Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
// Queue document
{ in_progress: false,
priority: 1,
message: "Rich documents FTW!"
...
}
// find highest priority job and mark as in-progress
job = db.jobs.findAndModify({
query: {in_progress: false},
sort: {priority: -1),
update: {$set: {in_progress: true,
started: new Date()}}})
63. Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
// Queue document
{ in_progress: false,
priority: 1,
message: "Rich documents FTW!"
...
}
// find highest priority job and mark as in-progress
job = db.jobs.findAndModify({
query: {in_progress: false},
sort: {priority: -1),
update: {$set: {in_progress: true,
started: new Date()}}})
65. Anti patterns
• Careless indexing
• Large, deeply nested documents
• Multiple types for a key
• One size fits all collections
• One collection per user
66. Summary
• Schema design is different in MongoDB
• Basic data design principals stay the same
• Focus on how the apps manipulates data
• Rapidly evolve schema to meet your requirements
• Enjoy your new freedom, use it wisely :-)
67. download at mongodb.org
conferences, appearances, and meetups
http://www.10gen.com/events
Facebook | Twitter | LinkedIn
http://bit.ly/mongofb @mongodb http://linkd.in/joinmongo
support, training, and this talk brought to you by
Notas del editor
\n
\n
* EXplain why..\n
* 3rd Normal Form - determining a table's degree of vulnerability to logical inconsistencies\n* The higher the normal form applicable to a table, the less vulnerable it is to inconsistencies and anomalies\n
* Scaling RDMS path tends towards denormalization\n
* No joins for scalability - Doing joins across shards in SQL highly inefficient and difficult to perform.\n* MongoDB is geared for easy scaling - going from a single node to a distributed cluster is easy.\n* Little or no application code changes are needed to scale from a single node to a sharded cluster.\n
* Questions about database features inform our schema design\nAccess Patterns\n* Less of an issue for Normalized databases\n* MongoDB document models can be rich, its flexible\n
* To review simple schema design we'll use a simple blog example..\n
* Notice Hergé - UTF-8 support is native\n
\n
\n
\n
* Can create indexes for arrays / objects\n* In the Relational world - you'd have to do joins\n* Object modelled directly to MongoDB\n
* Rich query language\n* Powerful - can do range queries $lt and $gt\n* Update - can update parts of documents\n
\n
* upserts - $push, $inc\n\n
\n
* Allows easy access to embedded documents / arrays\n* Also can do positional: comments.0.author\n
* range queries still use indexes\n
\n
* Full collection scan\n* scanAndOrder - reorders\n
\n
* If document is always presented as a whole - a single doc gives performance benefits\n* A single doc is not a panacea - as we'll see\n
*As with nature common patterns emerge when modeling data\n
\n
\n
* Leaves nulls in the table\n* Not intuitive\n
* Single Table inheritance is clean and initiative in mongodb\n
* Single Table inheritance is clean and initiative in mongodb\n
\n
* One author one Blog Entry\n* Many authors for one Blog Entry\n** Delete the blog - don't delete the author(s)\n** Delete the blog - delete the author(s) - aka Cascading delete\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
* Also one to many pattern\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
* Update: will update in_progress and add started\n
* Update: will update in_progress and add started\n
\n
* limits on number of namespaces\n
* Schema is specific to application / data usage\n* Think future - data change / how you are going to query\n