Schema design

Schema Design

Christian Kvalheim - christkv@10gen.com

Topics
Introduction
• Working with documents
• Evolving a schema
• Queries and indexes
• Rich Documents

Topics
Introduction
• Working with documents
• Evolving a schema
• Queries and indexes
• Rich Documents

Common patterns
• Single table inheritance
• One-to-Many & Many-to-Many
• Trees
• Queues

Ways to model data:

http://www.ﬂickr.com/photos/42304632@N00/493639870/

Terminology
RDBMS MongoDB

Table Collection

Row(s) JSON Document

Index Index

Join Embedding & Linking

Schema-design criteria
How can we manipulate Access Patterns?
this data?
• Dynamic Queries • Read / Write Ratio
• Secondary Indexes • Types of updates
• Atomic Updates • Types of queries
• Map Reduce • Data life-cycle
• Aggregation (coming soon)
Considerations

• No Joins
• Document writes are atomic

A simple start
post = {author: "Hergé",
date: new Date(),
text: "Destination Moon",
tags: ["comic", "adventure"]}

> db.blog.save(post)

Map the documents to your application.

Find the document
> db.blog.find()
{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
author: "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
text: "Destination Moon",
tags: [ "comic", "adventure" ]
}

Note:
• _id must be unique, but can be anything you'd like
• Default BSON ObjectId if one is not supplied

Add an index, ﬁnd via index
> db.blog.ensureIndex({author: 1})
> db.blog.find({author: 'Hergé'})

author: "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
...
}

Secondary index on "author"

Examine the query plan
> db.blogs.find({"author": 'Hergé'}).explain()
{
"cursor" : "BtreeCursor author_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"indexBounds" : {
"author" : [
[
"Hergé",
"Hergé"
]
]
}
}

Multi-key indexes
// Build an index on the 'tags' array
> db.blog.ensureIndex({tags: 1})

// find posts with a specific tag
// (This will use an index!)
> db.blog.find({tags: 'comic'})
author: "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
...
}

Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists,$type, ..
$lt, $lte, $gt, $gte, $ne

Update operators:
$set, $inc, $push, $pop, $pull, $pushAll, $pullAll

Extending the schema

http://nysi.org.uk/kids_stuff/rocket/rocket.htm

Extending the Schema
new_comment = {author: "Chris",
date: new Date(),
text: "great book",
votes: 5}

> db.blog.update(
{text: "Destination Moon" },

{"$push": {comments: new_comment},
"$inc": {comments_count: 1}
})

Extending the Schema
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
text : "Destination Moon",
tags : [ "comic", "adventure" ],
comments : [{
author : "Chris",
date : ISODate("2012-01-23T14:31:53.848Z"),
text : "great book",
votes : 5
}],
comments_count: 1
}

The 'dot' operator
// create index on nested documents:
> db.blog.ensureIndex({"comments.author": 1})

> db.blog.find({"comments.author":"Chris"})
author: "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
...
}

The 'dot' operator

// create index comment votes:
> db.blog.ensureIndex({"comments.votes": 1})

// find all posts with any comments with
// more than 50 votes
> db.blog.find({"comments.votes": {$gt: 50}})

The 'dot' operator

// find last 5 posts:
> db.blog.find().sort({"date":-1}).limit(5)

// find the top 10 commented posts:
> db.blog.find().sort({"comments_count":-1}).limit(10)

When sorting, check if you need an index...

Watch for full table scans
{
"cursor" : "BasicCursor",
"nscanned" : 250003,
"nscannedObjects" : 250003,
"n" : 10,
"scanAndOrder" : true,
"millis" : 335,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}

Rich Documents

http://www.ﬂickr.com/photos/diorama_sky/2975796332

Rich Documents

• Intuitive
• Developer friendly
• Encapsulates whole objects
• Performant
• They are scalable

Common Patterns

http://www.ﬂickr.com/photos/colinwarren/158628063

Inheritance

http://www.ﬂickr.com/photos/dysonstarr/5098228295

Single Table Inheritance - RDBMS
• Shapes table
id type area radius d length width

1 circle 3.14 1

2 square 4 2

3 rect 10 5 2

Single Table Inheritance - MongoDB
> db.shapes.find()
{ _id: "1", type: "circle", area: 3.14, radius: 1}
{ _id: "2", type: "square", area: 4, d: 2}
{ _id: "3", type: "rect", area: 10, length: 5,
width: 2}

> db.shapes.find()
width: 2}
// find shapes where radius > 0
> db.shapes.find({radius: {$gt: 0}})

> db.shapes.find()
width: 2}
// find shapes where radius > 0
> db.shapes.find({radius: {$gt: 0}})

// create sparse index
> db.shapes.ensureIndex({radius: 1}, {sparse: true})

One to Many

http://www.ﬂickr.com/photos/j-ﬁsh/6502708899/

One to Many
Embedded Array / Array Keys

• $slice operator to return subset of array
• some queries hard
e.g ﬁnd latest comments across all documents

One to Many
Embedded Array / Array Keys
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
tags : [ "comic", "adventure" ],
comments : [{
author : "Chris",
date : ISODate("2012-01-23T14:31:53.848Z"),
text : "great book",
votes : 5
}],
comments_count: 1
}

One to Many
Normalized (2 collections)

• Most ﬂexible
• More queries

One to Many - Normalized
// Posts collection
{ _id : 1000,
author : "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
}
// Comments collection
{ _id : 1,
blog : 1000,
author : "Chris",
date : ISODate("2012-01-23T14:31:53.848Z"),
...
}
> blog = db.blogs.find({text: "Destination Moon"});
> db.comments.find({blog: blog._id});

One to Many - patterns

• Embedded Array / Array Keys

• Embedded Array / Array Keys
• Normalized

Embedding vs. Referencing

• Embed when the 'many' objects always appear
with their parent.

• Reference when you need more ﬂexibility.

Many to Many

http://www.ﬂickr.com/photos/pats0n/6013379192

Many - Many
Example:

• Product can be in many categories
• Category can have many products

Many to Many
// Products
{ _id: 10,
name: "Destination Moon",
category_ids: [20, 30]}

Many to Many
// Products
{ _id: 10,
// Categories
{ _id: 20,
name: "comic",
product_ids:[10, 11, 12]}
{ _id: 30,
name: "adventure",
product_ids:[10]}

Many to Many
// Products
{ _id: 10,
// Categories
{ _id: 20,
name: "comic",
product_ids:[10, 11, 12]}
{ _id: 30,
name: "adventure",
product_ids:[10]}

//All categories for a given product
> db.categories.find({"product_ids": 10})

Alternative
// Products
{ _id: 10,
// Categories
{ _id: 20,
name: "comic"}

Alternative
// Products
{ _id: 10,
// Categories
{ _id: 20,
name: "comic"}

//All products for a given category
> db.products.find({"category_ids": 20})

Alternative
// Products
{ _id: 10,
// Categories
{ _id: 20,
name: "comic"}

//All products for a given category
> db.products.find({"category_ids": 20})

// All categories for a given product
product = db.products.find(_id : some_id)
> db.categories.find({_id : {$in : product.category_ids}})

Trees

http://www.ﬂickr.com/photos/cubagallery/5949819558

Trees
Hierarchical information

Trees
Embedded Tree
{ comments : [{
author : "Chris", text : "...",
replies : [{
author : "Fred", text : "..."
replies : [],
}]
}]
}

Pros: Single Document, Performance, Intuitive

Cons: Hard to search, Partial Results, 16MB limit

Array of Ancestors
A B C
// Store all ancestors of a node
{ _id: "a" } E D
{ _id: "b", thread: [ "a" ], replyTo: "a" }
{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F
{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }
{ _id: "e", thread: [ "a" ], replyTo: "a" }
{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }

Array of Ancestors
A B C
{ _id: "a" } E D
// find all threads where 'b" is in
> db.msg_tree.find({"thread": "b"})

Array of Ancestors
A B C
{ _id: "a" } E D
// find all direct message "b: replied to
> db.msg_tree.find({"replyTo": "b"})

Array of Ancestors
A B C
{ _id: "a" } E D
// find all direct message "b: replied to
> db.msg_tree.find({"replyTo": "b"})
//find all ancestors of f:
> threads = db.msg_tree.findOne({"_id": "f"}).thread
> db.msg_tree.find({"_id ": { $in : threads})

Array of Ancestors
Store hierarchy as a path expression

• Separate each node by a delimiter, e.g. "/"
• Use text search for ﬁnd parts of a tree
{ comments: [
{ author: "Kyle", text: "initial post",
path: "" },
{ author: "Jim", text: "jim’s comment",
path: "jim" },
{ author: "Kyle", text: "Kyle’s reply to Jim",
path : "jim/kyle"} ] }

// Find the conversations Jim was part of
> db.blogs.find({path: /^jim/i})

Queues

http://www.ﬂickr.com/photos/deanspic/4960440218

Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
// Queue document
{ in_progress: false,
priority: 1,
message: "Rich documents FTW!"
...
}

Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
// Queue document
{ in_progress: false,
priority: 1,
message: "Rich documents FTW!"
...
}
// find highest priority job and mark as in-progress
job = db.jobs.findAndModify({
query: {in_progress: false},
sort: {priority: -1),
update: {$set: {in_progress: true,
started: new Date()}}})

Anti Patterns

http://www.ﬂickr.com/photos/51838104@N02/5841690990

Anti patterns
• Careless indexing
• Large, deeply nested documents
• Multiple types for a key
• One size ﬁts all collections
• One collection per user

Summary
• Schema design is different in MongoDB
• Basic data design principals stay the same
• Focus on how the apps manipulates data
• Rapidly evolve schema to meet your requirements
• Enjoy your new freedom, use it wisely :-)

download at mongodb.org

conferences, appearances, and meetups
http://www.10gen.com/events

Facebook | Twitter | LinkedIn
http://bit.ly/mongofb @mongodb http://linkd.in/joinmongo

support, training, and this talk brought to you by

Schema design

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (17)

Similar a Schema design

Similar a Schema design (20)

Último

Último (20)

Schema design

Notas del editor