SlideShare una empresa de Scribd logo
1 de 54
Descargar para leer sin conexión
Schema Design
 with MongoDB

     Antoine Girbal

   antoine@10gen.com
      @antoinegirbal
So why model data?




  http://www.flickr.com/photos/42304632@N00/493639870/
Normalization
• Goals
• Avoid anomalies when inserting, updating or
  deleting
• Minimize redesign when extending the
  schema
• Avoid bias toward a particular query
• Make use of all SQL features
• In MongoDB
• Similar goals apply but rules are different
• Denormalization for optimization is an option:
  most features still exist, contrary to BLOBS
Terminology

 RDBMS           MongoDB
 Table           Collection
 Row(s)          JSON Document
 Index           Index
 Join            Embedding & Linking
 Partition       Shard
 Partition Key   Shard Key
Collections Basics
• Equivalent to a Table in SQL
• Cheap to create (max 24000)
• Collections don’t have a fixed schema
• Common for documents in a collection
  to share a schema
• Document schema can evolve
• Consider using multiple related
  collections tied together by a naming
  convention:
 •  e.g. LogData-2011-02-08
Document basics
• Elements are name/value pairs,
  equivalent to column value in SQL
• elements can be nested
• Rich data types for values
• JSON for the human eye
• BSON for all internals
• 16MB maximum size (many books..)
• What you see is what is stored
Schema Design - Relational
Schema Design - MongoDB
Schema Design - MongoDB
                  embedding
Schema Design - MongoDB
                  embedding




       linking
Design Session

Design documents that simply map to your application

> post = { author: "Hergé",
       date: ISODate("2011-09-18T09:56:06.298Z"),
       text: "Destination Moon",
       tags: ["comic", "adventure"]
     }

> db.blogs.save(post)
Find the document
> db.blogs.find()

 { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
   author: "Hergé",
   date: ISODate("2011-09-18T09:56:06.298Z"),
   text: "Destination Moon",
   tags: [ "comic", "adventure" ]
 }

Notes:
• ID must be unique, but can be anything you’d like
• MongoDB will generate a default ID if one is not supplied
Add and index, find via Index

Secondary index for “author”

// 1 means ascending, -1 means descending
> db.blogs.ensureIndex( { author: 1 } )

> db.blogs.find( { author: 'Hergé' } )

 { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
   date: ISODate("2011-09-18T09:56:06.298Z"),
   author: "Hergé",
 ... }
Examine the query plan

> db.blogs.find( { author: "Hergé" } ).explain()
{
      "cursor" : "BtreeCursor author_1",
      "nscanned" : 1,
      "nscannedObjects" : 1,
      "n" : 1,
      "millis" : 5,
      "indexBounds" : {
             "author" : [
                    [
                          "Hergé",
                          "Hergé"
                    ]
             ]
      }
}
Examine the query plan

> db.blogs.find( { author: "Hergé" } ).explain()
{
      "cursor" : "BtreeCursor author_1",
      "nscanned" : 1,
      "nscannedObjects" : 1,
      "n" : 1,
      "millis" : 5,
      "indexBounds" : {
             "author" : [
                    [
                          "Hergé",
                          "Hergé"
                    ]
             ]
      }
}
Query operators
Conditional operators:
 $ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
 $lt, $lte, $gt, $gte, $ne...

// find posts with any tags
> db.blogs.find( { tags: { $exists: true } } )
Query operators
Conditional operators:
 $ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
 $lt, $lte, $gt, $gte, $ne...

// find posts with any tags
> db.blogs.find( { tags: { $exists: true } } )

Regular expressions:
// posts where author starts with h
> db.blogs.find( { author: /^h/ } )
Query operators
Conditional operators:
 $ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
 $lt, $lte, $gt, $gte, $ne...

// find posts with any tags
> db.blogs.find( { tags: { $exists: true } } )

Regular expressions:
// posts where author starts with h
> db.blogs.find( { author: /^h/ } )

Counting:
// number of posts written by Hergé
> db.blogs.find( { author: "Hergé" } ).count()
Extending the Schema
> new_comment =
  { author: "Kyle",
    date: new Date(),
    text: "great book" }


> db.blogs.update(
      { text: "Destination Moon" },
      { "$push": { comments: new_comment },
        "$inc": { comments_count: 1 }
      })
Extending the Schema
> db.blogs.find( { author: "Hergé"} )

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
  author : "Hergé",
  date : ISODate("2011-09-18T09:56:06.298Z"),
  text : "Destination Moon",
  tags : [ "comic", "adventure" ],
  comments : [
     {
             author : "Kyle",
             date : ISODate("2011-09-19T09:56:06.298Z"),
             text : "great book"
     }
  ],
  comments_count: 1
}
Extending the Schema
// create index on nested documents:
> db.blogs.ensureIndex( { "comments.author": 1 } )

> db.blogs.find( { "comments.author": "Kyle" } )
Extending the Schema
// create index on nested documents:
> db.blogs.ensureIndex( { "comments.author": 1 } )

> db.blogs.find( { "comments.author": "Kyle" } )

// find last 5 posts:
> db.blogs.find().sort( { date: -1 } ).limit(5)
Extending the Schema
// create index on nested documents:
> db.blogs.ensureIndex( { "comments.author": 1 } )

> db.blogs.find( { "comments.author": "Kyle" } )

// find last 5 posts:
> db.blogs.find().sort( { date: -1 } ).limit(5)

// most commented post:
> db.blogs.find().sort( { comments_count: -1 } ).limit(1)


When sorting, check if you need an index
Common Patterns

 Patterns:
 • Inheritance
 • one to one
 • one to many
 • many to many
Inheritance
Single Table Inheritance -
MongoDB
 shapes table
    id      type   area   radius length   width

   1       circle 3.14    1



   2       square 4              2



   3       rect    10            5        2
Single Table Inheritance -
MongoDB
> db.shapes.find()
{ _id: "1", type: "c", area: 3.14, radius: 1}
{ _id: "2", type: "s", area: 4, length: 2}
{ _id: "3", type: "r", area: 10, length: 5, width: 2}



                                     missing values
                                      not stored!
Single Table Inheritance -
MongoDB
> db.shapes.find()
{ _id: "1", type: "c", area: 3.14, radius: 1}
{ _id: "2", type: "s", area: 4, length: 2}
{ _id: "3", type: "r", area: 10, length: 5, width: 2}

// find shapes where radius > 0
> db.shapes.find( { radius: { $gt: 0 } } )
Single Table Inheritance -
MongoDB
> db.shapes.find()
{ _id: "1", type: "c", area: 3.14, radius: 1}
{ _id: "2", type: "s", area: 4, length: 2}
{ _id: "3", type: "r", area: 10, length: 5, width: 2}

// find shapes where radius > 0
> db.shapes.find( { radius: { $gt: 0 } } )

// create index
> db.shapes.ensureIndex( { radius: 1 }, { sparse:true } )


                                       index only
                                     values present!
One to Many
  Either:

  •Embedded Array / Document:
    • improves read speed
    • simplifies schema
  •Normalize:
    • if list grows significantly
    • if sub items are updated often
    • if sub items are more than 1 level
       deep and need updating
One to Many
Embedded Array:
•$slice operator to return subset of comments
•some queries become harder (e.g find latest comments across all blogs)
blogs: {
   author : "Hergé",
   date : ISODate("2011-09-18T09:56:06.298Z"),
   comments : [
        {
             author : "Kyle",
             date : ISODate("2011-09-19T09:56:06.298Z"),
             text : "great book"
        }
   ]
}
One to Many
Normalized (2 collections)
•most flexible
•more queries
blogs: { _id: 1000,
     author: "Hergé",
     date: ISODate("2011-09-18T09:56:06.298Z") }

comments : { _id : 1,
      blogId: 1000,
      author : "Kyle",
             date : ISODate("2011-09-19T09:56:06.298Z") }

> blog = db.blogs.find( { text: "Destination Moon" } );

> db.ensureIndex( { blogId: 1 } ) // important!
> db.comments.find( { blogId: blog._id } );
Many - Many
Example:

• Product can be in many categories
• Category can have many products
Many - Many
// Each product list the IDs of the categories
products:
   { _id: 10, name: "Destination Moon",
     category_ids: [ 20, 30 ] }
Many - Many
// Each product list the IDs of the categories
products:
   { _id: 10, name: "Destination Moon",
     category_ids: [ 20, 30 ] }

// Each category lists the IDs of the products
categories:
   { _id: 20, name: "adventure",
     product_ids: [ 10, 11, 12 ] }

categories:
  { _id: 21, name: "movie",
    product_ids: [ 10 ] }
Many - Many
// Each product list the IDs of the categories
products:
   { _id: 10, name: "Destination Moon",
     category_ids: [ 20, 30 ] }

// Each category lists the IDs of the products
categories:
   { _id: 20, name: "adventure",
     product_ids: [ 10, 11, 12 ] }

categories:
  { _id: 21, name: "movie",
    product_ids: [ 10 ] }

Cuts mapping table and 2 indexes, but:
• potential consistency issue
• lists can grow too large
Alternative
// Each product list the IDs of the categories
products:
   { _id: 10, name: "Destination Moon",
     category_ids: [ 20, 30 ] }

// Association not stored on the categories
categories:
   { _id: 20,
     name: "adventure"}
Alternative
// Each product list the IDs of the categories
products:
   { _id: 10, name: "Destination Moon",
     category_ids: [ 20, 30 ] }

// Association not stored on the categories
categories:
   { _id: 20,
     name: "adventure"}

// All products for a given category
> db.products.ensureIndex( { category_ids: 1} ) // yes!
> db.products.find( { category_ids: 20 } )
Common Use Cases

 Use cases:
 • Trees
 • Time Series
Trees

Hierarchical information
Trees

Full Tree in Document

{ retweet: [
   { who: “Kyle”, text: “...”,
     retweet: [
        {who: “James”, text: “...”,
          retweet: []}
     ]}
  ]
}

Pros: Single Document, Performance, Intuitive

Cons: Hard to search or update, document can easily get
too large
Array of Ancestors                                 A   B   C
// Store all Ancestors of a node                       E   D
  { _id: "a" }
  { _id: "b", tree: [ "a" ],     retweet: "a" }            F
  { _id: "c", tree: [ "a", "b" ], retweet: "b" }
  { _id: "d", tree: [ "a", "b" ], retweet: "b" }
  { _id: "e", tree: [ "a" ],     retweet: "a" }
  { _id: "f", tree: [ "a", "e" ], retweet: "e" }

// find all direct retweets of "b"
> db.tweets.find( { retweet: "b" } )
Array of Ancestors                                 A   B   C
// Store all Ancestors of a node                       E   D
  { _id: "a" }
  { _id: "b", tree: [ "a" ],     retweet: "a" }            F
  { _id: "c", tree: [ "a", "b" ], retweet: "b" }
  { _id: "d", tree: [ "a", "b" ], retweet: "b" }
  { _id: "e", tree: [ "a" ],     retweet: "a" }
  { _id: "f", tree: [ "a", "e" ], retweet: "e" }

// find all direct retweets of "b"
> db.tweets.find( { retweet: "b" } )

// find all retweets of "e" anywhere in tree
> db.tweets.find( { tree: "e" } )
Array of Ancestors                                  A   B   C
// Store all Ancestors of a node                        E   D
  { _id: "a" }
  { _id: "b", tree: [ "a" ],     retweet: "a" }             F
  { _id: "c", tree: [ "a", "b" ], retweet: "b" }
  { _id: "d", tree: [ "a", "b" ], retweet: "b" }
  { _id: "e", tree: [ "a" ],     retweet: "a" }
  { _id: "f", tree: [ "a", "e" ], retweet: "e" }

// find all direct retweets of "b"
> db.tweets.find( { retweet: "b" } )

// find all retweets of "e" anywhere in tree
> db.tweets.find( { tree: "e" } )

// find tweet history of f:
> tweets = db.tweets.findOne( { _id: "f" } ).tree
> db.tweets.find( { _id: { $in : tweets } } )
Trees as Paths                                 A   B   C
Store hierarchy as a path expression               E   D
• Separate each node by a delimiter, e.g. “,”
• Use text search for find parts of a tree             F
• search must be left-rooted and use an index!
{ retweets: [
    { _id: "a", text: "initial tweet",
      path: "a" },
    { _id: "b", text: "reweet with comment",
      path: "a,b" },
    { _id: "c", text: "reply to retweet",
      path : "a,b,c"} ] }

// Find the conversations "a" started
> db.tweets.find( { path: /^a/ } )
// Find the conversations under a branch
> db.tweets.find( { path: /^a,b/ } )
Time Series

• Records stats by
 • Day, Hour, Minute

• Show time series
Time Series

// Time series buckets, hour and minute sub-docs
{ _id: "20111209-1231",
  ts: ISODate("2011-12-09T00:00:00.000Z")
  daily: 67,
  hourly: { 0: 3, 1: 14, 2: 19 ... 23: 72 },
  minute: { 0: 0, 1: 4, 2: 6 ... 1439: 0 }
}

// Add one to the last minute before midnight
> db.votes.update(
   { _id: "20111209-1231",
     ts: ISODate("2011-12-09T00:00:00.037Z") },
   { $inc: { "hourly.23": 1 },
     $inc: { "minute.1439": 1 })
BSON Storage

• Sequence of key/value pairs
• NOT a hash map
• Optimized to scan quickly


     0 1 2 3 ... 1439
What is the cost of update the minute before
midnight?
BSON Storage

• Can skip sub-documents

             0             ...          23
   0     1    ... 59             1380    ...   1439



How could this change the schema?
Time Series
Use more of a Tree structure by nesting!

// Time series buckets, each hour a sub-document
{ _id: "20111209-1231",
  ts: ISODate("2011-12-09T00:00:00.000Z")
  daily: 67,
  minute: { 0: { 0: 0, 1: 7, ... 59: 2 },
          ...
          23: { 0: 15, ... 59: 6 }
         }
}

// Add one to the last second before midnight
> db.votes.update(
   { _id: "20111209-1231" },
     ts: ISODate("2011-12-09T00:00:00.000Z") },
   { $inc: { "minute.23.59": 1 } })
Duplicate data
Document to represent a shopping order:

{ _id: 1234,
  ts: ISODate("2011-12-09T00:00:00.000Z")
  customerId: 67,
  total_price: 1050,
  items: [{ sku: 123, quantity: 2, price: 50,
        name: “macbook”, thumbnail: “macbook.png” },
        { sku: 234, quantity: 1, price: 20,
        name: “iphone”, thumbnail: “iphone.png” },
        ...
        }
}

The item information is duplicated in every order that reference it.
Mongo’s flexible schema makes it easy!
Duplicate data
• Pros:
   • only 1 query to get all information needed to display
   the order
   • processing on the db is as fast as a BLOB
   • can achieve much higher performance

• Cons:
   • more storage used ... cheap enough
   • updates are much more complicated ... just consider
   fields immutable
Summary
• Basic data design principles stay the same ...
• But MongoDB is more flexible and brings possibilities
• embed or duplicate data to speed up operations, cut down
the number of collections and indexes

• watch for documents growing too large
• make sure to use the proper indexes for querying and sorting
• schema should feel natural to your application!
download at mongodb.org




      conferences, appearances, and meetups
                 http://www.10gen.com/events




  Facebook              |   Twitter    |       LinkedIn
http://bit.ly/mongofb       @mongodb       http://linkd.in/joinmongo

Más contenido relacionado

La actualidad más candente

MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
Alex Litvinok
 
MongoSV Schema Workshop
MongoSV Schema WorkshopMongoSV Schema Workshop
MongoSV Schema Workshop
MongoDB
 
Building web applications with mongo db presentation
Building web applications with mongo db presentationBuilding web applications with mongo db presentation
Building web applications with mongo db presentation
Murat Çakal
 
Mongoid in the real world
Mongoid in the real worldMongoid in the real world
Mongoid in the real world
Kevin Faustino
 

La actualidad más candente (19)

MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
 
2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
 
MongoSV Schema Workshop
MongoSV Schema WorkshopMongoSV Schema Workshop
MongoSV Schema Workshop
 
CouchDB at New York PHP
CouchDB at New York PHPCouchDB at New York PHP
CouchDB at New York PHP
 
Entity Relationships in a Document Database at CouchConf Boston
Entity Relationships in a Document Database at CouchConf BostonEntity Relationships in a Document Database at CouchConf Boston
Entity Relationships in a Document Database at CouchConf Boston
 
Dev Jumpstart: Schema Design Best Practices
Dev Jumpstart: Schema Design Best PracticesDev Jumpstart: Schema Design Best Practices
Dev Jumpstart: Schema Design Best Practices
 
Building web applications with mongo db presentation
Building web applications with mongo db presentationBuilding web applications with mongo db presentation
Building web applications with mongo db presentation
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
 
MongoDB, PHP and the cloud - php cloud summit 2011
MongoDB, PHP and the cloud - php cloud summit 2011MongoDB, PHP and the cloud - php cloud summit 2011
MongoDB, PHP and the cloud - php cloud summit 2011
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
 
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBMap/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDB
 
Dealing with Azure Cosmos DB
Dealing with Azure Cosmos DBDealing with Azure Cosmos DB
Dealing with Azure Cosmos DB
 
Data Modeling for the Real World
Data Modeling for the Real WorldData Modeling for the Real World
Data Modeling for the Real World
 
Mongoid in the real world
Mongoid in the real worldMongoid in the real world
Mongoid in the real world
 
Webinar: Data Modeling Examples in the Real World
Webinar: Data Modeling Examples in the Real WorldWebinar: Data Modeling Examples in the Real World
Webinar: Data Modeling Examples in the Real World
 
Mongo DB schema design patterns
Mongo DB schema design patternsMongo DB schema design patterns
Mongo DB schema design patterns
 

Similar a 10gen Presents Schema Design and Data Modeling

Data Modeling Examples from the Real World
Data Modeling Examples from the Real WorldData Modeling Examples from the Real World
Data Modeling Examples from the Real World
MongoDB
 
Schema design short
Schema design shortSchema design short
Schema design short
MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Alex Bilbie
 

Similar a 10gen Presents Schema Design and Data Modeling (20)

Schema Design (Mongo Austin)
Schema Design (Mongo Austin)Schema Design (Mongo Austin)
Schema Design (Mongo Austin)
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
Starting with MongoDB
Starting with MongoDBStarting with MongoDB
Starting with MongoDB
 
Building your first app with mongo db
Building your first app with mongo dbBuilding your first app with mongo db
Building your first app with mongo db
 
MongoDB at GUL
MongoDB at GULMongoDB at GUL
MongoDB at GUL
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
Full metal mongo
Full metal mongoFull metal mongo
Full metal mongo
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
 
Data Modeling Examples from the Real World
Data Modeling Examples from the Real WorldData Modeling Examples from the Real World
Data Modeling Examples from the Real World
 
2013-03-23 - NoSQL Spartakiade
2013-03-23 - NoSQL Spartakiade2013-03-23 - NoSQL Spartakiade
2013-03-23 - NoSQL Spartakiade
 
Building Your First MongoDB App
Building Your First MongoDB AppBuilding Your First MongoDB App
Building Your First MongoDB App
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
 
Building Apps with MongoDB
Building Apps with MongoDBBuilding Apps with MongoDB
Building Apps with MongoDB
 
Schema design short
Schema design shortSchema design short
Schema design short
 
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
OSDC 2012 | Building a first application on MongoDB by Ross LawleyOSDC 2012 | Building a first application on MongoDB by Ross Lawley
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB Strange Loop 2009
MongoDB Strange Loop 2009MongoDB Strange Loop 2009
MongoDB Strange Loop 2009
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 

Más de DATAVERSITY

The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 

Más de DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Último

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

10gen Presents Schema Design and Data Modeling

  • 1. Schema Design with MongoDB Antoine Girbal antoine@10gen.com @antoinegirbal
  • 2. So why model data? http://www.flickr.com/photos/42304632@N00/493639870/
  • 3. Normalization • Goals • Avoid anomalies when inserting, updating or deleting • Minimize redesign when extending the schema • Avoid bias toward a particular query • Make use of all SQL features • In MongoDB • Similar goals apply but rules are different • Denormalization for optimization is an option: most features still exist, contrary to BLOBS
  • 4. Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Join Embedding & Linking Partition Shard Partition Key Shard Key
  • 5. Collections Basics • Equivalent to a Table in SQL • Cheap to create (max 24000) • Collections don’t have a fixed schema • Common for documents in a collection to share a schema • Document schema can evolve • Consider using multiple related collections tied together by a naming convention: • e.g. LogData-2011-02-08
  • 6. Document basics • Elements are name/value pairs, equivalent to column value in SQL • elements can be nested • Rich data types for values • JSON for the human eye • BSON for all internals • 16MB maximum size (many books..) • What you see is what is stored
  • 7. Schema Design - Relational
  • 8. Schema Design - MongoDB
  • 9. Schema Design - MongoDB embedding
  • 10. Schema Design - MongoDB embedding linking
  • 11. Design Session Design documents that simply map to your application > post = { author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: ["comic", "adventure"] } > db.blogs.save(post)
  • 12. Find the document > db.blogs.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: [ "comic", "adventure" ] } Notes: • ID must be unique, but can be anything you’d like • MongoDB will generate a default ID if one is not supplied
  • 13. Add and index, find via Index Secondary index for “author” // 1 means ascending, -1 means descending > db.blogs.ensureIndex( { author: 1 } ) > db.blogs.find( { author: 'Hergé' } ) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: ISODate("2011-09-18T09:56:06.298Z"), author: "Hergé", ... }
  • 14. Examine the query plan > db.blogs.find( { author: "Hergé" } ).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] } }
  • 15. Examine the query plan > db.blogs.find( { author: "Hergé" } ).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] } }
  • 16. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne... // find posts with any tags > db.blogs.find( { tags: { $exists: true } } )
  • 17. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne... // find posts with any tags > db.blogs.find( { tags: { $exists: true } } ) Regular expressions: // posts where author starts with h > db.blogs.find( { author: /^h/ } )
  • 18. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne... // find posts with any tags > db.blogs.find( { tags: { $exists: true } } ) Regular expressions: // posts where author starts with h > db.blogs.find( { author: /^h/ } ) Counting: // number of posts written by Hergé > db.blogs.find( { author: "Hergé" } ).count()
  • 19. Extending the Schema > new_comment = { author: "Kyle", date: new Date(), text: "great book" } > db.blogs.update( { text: "Destination Moon" }, { "$push": { comments: new_comment }, "$inc": { comments_count: 1 } })
  • 20. Extending the Schema > db.blogs.find( { author: "Hergé"} ) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" } ], comments_count: 1 }
  • 21. Extending the Schema // create index on nested documents: > db.blogs.ensureIndex( { "comments.author": 1 } ) > db.blogs.find( { "comments.author": "Kyle" } )
  • 22. Extending the Schema // create index on nested documents: > db.blogs.ensureIndex( { "comments.author": 1 } ) > db.blogs.find( { "comments.author": "Kyle" } ) // find last 5 posts: > db.blogs.find().sort( { date: -1 } ).limit(5)
  • 23. Extending the Schema // create index on nested documents: > db.blogs.ensureIndex( { "comments.author": 1 } ) > db.blogs.find( { "comments.author": "Kyle" } ) // find last 5 posts: > db.blogs.find().sort( { date: -1 } ).limit(5) // most commented post: > db.blogs.find().sort( { comments_count: -1 } ).limit(1) When sorting, check if you need an index
  • 24. Common Patterns Patterns: • Inheritance • one to one • one to many • many to many
  • 26. Single Table Inheritance - MongoDB shapes table id type area radius length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2
  • 27. Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} missing values not stored!
  • 28. Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } )
  • 29. Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } ) // create index > db.shapes.ensureIndex( { radius: 1 }, { sparse:true } ) index only values present!
  • 30. One to Many Either: •Embedded Array / Document: • improves read speed • simplifies schema •Normalize: • if list grows significantly • if sub items are updated often • if sub items are more than 1 level deep and need updating
  • 31. One to Many Embedded Array: •$slice operator to return subset of comments •some queries become harder (e.g find latest comments across all blogs) blogs: { author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" } ] }
  • 32. One to Many Normalized (2 collections) •most flexible •more queries blogs: { _id: 1000, author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z") } comments : { _id : 1, blogId: 1000, author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z") } > blog = db.blogs.find( { text: "Destination Moon" } ); > db.ensureIndex( { blogId: 1 } ) // important! > db.comments.find( { blogId: blog._id } );
  • 33. Many - Many Example: • Product can be in many categories • Category can have many products
  • 34. Many - Many // Each product list the IDs of the categories products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }
  • 35. Many - Many // Each product list the IDs of the categories products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } // Each category lists the IDs of the products categories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] } categories: { _id: 21, name: "movie", product_ids: [ 10 ] }
  • 36. Many - Many // Each product list the IDs of the categories products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } // Each category lists the IDs of the products categories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] } categories: { _id: 21, name: "movie", product_ids: [ 10 ] } Cuts mapping table and 2 indexes, but: • potential consistency issue • lists can grow too large
  • 37. Alternative // Each product list the IDs of the categories products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } // Association not stored on the categories categories: { _id: 20, name: "adventure"}
  • 38. Alternative // Each product list the IDs of the categories products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } // Association not stored on the categories categories: { _id: 20, name: "adventure"} // All products for a given category > db.products.ensureIndex( { category_ids: 1} ) // yes! > db.products.find( { category_ids: 20 } )
  • 39. Common Use Cases Use cases: • Trees • Time Series
  • 41. Trees Full Tree in Document { retweet: [ { who: “Kyle”, text: “...”, retweet: [ {who: “James”, text: “...”, retweet: []} ]} ] } Pros: Single Document, Performance, Intuitive Cons: Hard to search or update, document can easily get too large
  • 42. Array of Ancestors A B C // Store all Ancestors of a node E D { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } F { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } // find all direct retweets of "b" > db.tweets.find( { retweet: "b" } )
  • 43. Array of Ancestors A B C // Store all Ancestors of a node E D { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } F { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } // find all direct retweets of "b" > db.tweets.find( { retweet: "b" } ) // find all retweets of "e" anywhere in tree > db.tweets.find( { tree: "e" } )
  • 44. Array of Ancestors A B C // Store all Ancestors of a node E D { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } F { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" } // find all direct retweets of "b" > db.tweets.find( { retweet: "b" } ) // find all retweets of "e" anywhere in tree > db.tweets.find( { tree: "e" } ) // find tweet history of f: > tweets = db.tweets.findOne( { _id: "f" } ).tree > db.tweets.find( { _id: { $in : tweets } } )
  • 45. Trees as Paths A B C Store hierarchy as a path expression E D • Separate each node by a delimiter, e.g. “,” • Use text search for find parts of a tree F • search must be left-rooted and use an index! { retweets: [ { _id: "a", text: "initial tweet", path: "a" }, { _id: "b", text: "reweet with comment", path: "a,b" }, { _id: "c", text: "reply to retweet", path : "a,b,c"} ] } // Find the conversations "a" started > db.tweets.find( { path: /^a/ } ) // Find the conversations under a branch > db.tweets.find( { path: /^a,b/ } )
  • 46. Time Series • Records stats by • Day, Hour, Minute • Show time series
  • 47. Time Series // Time series buckets, hour and minute sub-docs { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, hourly: { 0: 3, 1: 14, 2: 19 ... 23: 72 }, minute: { 0: 0, 1: 4, 2: 6 ... 1439: 0 } } // Add one to the last minute before midnight > db.votes.update( { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.037Z") }, { $inc: { "hourly.23": 1 }, $inc: { "minute.1439": 1 })
  • 48. BSON Storage • Sequence of key/value pairs • NOT a hash map • Optimized to scan quickly 0 1 2 3 ... 1439 What is the cost of update the minute before midnight?
  • 49. BSON Storage • Can skip sub-documents 0 ... 23 0 1 ... 59 1380 ... 1439 How could this change the schema?
  • 50. Time Series Use more of a Tree structure by nesting! // Time series buckets, each hour a sub-document { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, minute: { 0: { 0: 0, 1: 7, ... 59: 2 }, ... 23: { 0: 15, ... 59: 6 } } } // Add one to the last second before midnight > db.votes.update( { _id: "20111209-1231" }, ts: ISODate("2011-12-09T00:00:00.000Z") }, { $inc: { "minute.23.59": 1 } })
  • 51. Duplicate data Document to represent a shopping order: { _id: 1234, ts: ISODate("2011-12-09T00:00:00.000Z") customerId: 67, total_price: 1050, items: [{ sku: 123, quantity: 2, price: 50, name: “macbook”, thumbnail: “macbook.png” }, { sku: 234, quantity: 1, price: 20, name: “iphone”, thumbnail: “iphone.png” }, ... } } The item information is duplicated in every order that reference it. Mongo’s flexible schema makes it easy!
  • 52. Duplicate data • Pros: • only 1 query to get all information needed to display the order • processing on the db is as fast as a BLOB • can achieve much higher performance • Cons: • more storage used ... cheap enough • updates are much more complicated ... just consider fields immutable
  • 53. Summary • Basic data design principles stay the same ... • But MongoDB is more flexible and brings possibilities • embed or duplicate data to speed up operations, cut down the number of collections and indexes • watch for documents growing too large • make sure to use the proper indexes for querying and sorting • schema should feel natural to your application!
  • 54. download at mongodb.org conferences, appearances, and meetups http://www.10gen.com/events Facebook | Twitter | LinkedIn http://bit.ly/mongofb @mongodb http://linkd.in/joinmongo