One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for RDBMSs, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense. Understandably, this begets good questions: Are foreign keys permissible, or is it better to represent one-to-many relations withing a single document? Are join tables necessary, or is there another technique for building out many-to-many relationships? What level of denormalization is appropriate? How do my data modeling decisions affect the efficiency of updates and queries? In this session, we'll answer these questions and more, provide a number of data modeling rules of thumb, and discuss the tradeoffs of various data modeling strategies.
13. Schema Design
Considerations
• How do we manipulate the data?
– Dynamic Ad-Hoc Queries
– Atomic Updates
– Map Reduce
• What are the access patterns of the application?
– Read/Write Ratio
– Types of Queries / Updates
– Data life-cycle and growth rate
20. One to One Relations
• Mostly the same as the relational approach
• Generally good idea to embed “contains”
relationships
• Document model provides a holistic
representation of objects
24. Book
MongoDB: The Definitive Guide,
By Kristina Chodorow and Mike Dirolf
Published: 9/24/2010
Pages: 216
Language: English
Publisher: O’Reilly Media, CA
29. Where do you put the foreign
Key?
• Array of books inside of publisher
– Makes sense when many means a handful of items
– Useful when items have bound on potential growth
• Reference to single publisher on books
– Useful when items have unbounded growth (unlimited # of
books)
• SQL doesn’t give you a choice, no arrays
36. Referencing vs. Embedding
• Embedding is a bit like pre-joined data
• Document level ops are easy for server to
handle
• Embed when the “many” objects always appear
with (viewed in the context of) their parents.
• Reference when you need more flexibility
42. Modeling Trees
• Parent Links
- Each node is stored as a document
- Contains the id of the parent
• Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)
In the filing cabinet model, the patient’s x-rays, checkups, and allergies are stored in separate drawers and pulled together (like an RDBMS)In the file folder model, we store all of the patient information in a single folder (like MongoDB)
Flexibility – Ability to represent rich data structuresPerformance – Benefit from data locality
Concrete example of typical blog in typical relational normalized form
Concrete example of typical blog in typical relational normalized form
Concrete example of typical blog using a document oriented de-normalized approach
Concrete example of typical blog in typical relational normalized form
Tools for data manipulation
Tools for data access
Slow to get address data every time you query for a user. Requires an extra operation.
Patron may have two addresses, in this case, you would need a separate table in a relation databaseWith MongoDB, you simply start storing the address field as an arrayOnly patrons which have multiple addresses could have this schema!No migration necessary! but Caution: Additional application logic required!
Publisher is repeated for every book, data duplication!
Publisher is better being a separate entity and having its own collection.
OR: because we are using MongoDB and documents can have arrays you can choose to model the relation by creating and maintaining an array of books within each publisher entity.Careful with mutable, growing arrays. See next slide.
Now to create a relation between the two entities, you can choose to reference the publisher from the book document.This is similar to the relational approach for this very same problem.
Costly for a small number of books because to get the publisher
And data locality provides speed
Book’s kind attribute could be local or loanableNote that we have locations for loanable books but not for localNote that these two separate schemas can co-exist (loanable books / local books are both books)
Note that we partially de-normalize here.To get books by a particular author: - get the author - get books that have that author id in array
Simple solution. The biggest problem with this approach is getting an entire subtree requires several query turnarounds to the database. No intrinsic ordering of children.
It may also be good for storing graphs where a node has multiple parents. This way has intrinsic ordering of children.