7. OpenSky is
a new way to shop
OpenSky connects you with innovators,
trendsetters and tastemakers.You choose
the ones you like and each week they invite
you to their private online sales.
20. Let’s Use an Example
How about we start with books
21. Book Product Schema
Product {
id:
sku: General Product
product dimensions:
shipping weight: attributes
MSRP:
price:
description:
...
author: Orson Scott Card
title: Enders Game
binding: Hardcover
publication date: July 15, 1994 Book Specific
publisher name: Tor Science Fiction attributes
number of pages: 352
ISBN: 0812550706
language: English
...
24. Album Product Schema
Product {
id:
sku: General Product
product dimensions: attributes stay the
shipping weight:
MSRP:
same
price:
description:
...
artist: MxPx
title: Panic Album Specific
release date: June 7, 2005 attributes are
label: Side One Dummy
track listing: [ The Darkest ... different
language: English
format: CD
...
33. EAV
as popularized by Magento
“For purposes of flexibility, the Magneto database heavily utilizes
an Entity-Attribute-Value (EAV) data model.
As is often the case, the cost of flexibility is complexity -
Magento is no exception.
The process of manipulating data in Magento is often more
“involved” than that typically experienced using traditional
relational tables.”
- Varien
34. EAV
• Crazy SQL queries
• Hundreds of joins in a query...
or
• Hundreds of queries joined in
the application
• No database enforced integrity
38. Single Table Inheritance
(insanely wide tables)
• No data integrity enforcement
• Only can use FK for common
elements
• Very wasteful (but disk is cheap!)
• Can’t effectively index
39. Generic Columns
• No data integrity enforcement
• No data type enforcement
• Only can use FK for common
elements
• Wasteful (but disk is cheap!)
• Can’t index
40. Serialized in Blob
• Not searchable
• No integrity
• All the disadvantages of a document
store, but none of the advantages
• Never should be used
• One exception is Oracle XML
which operates similar to a
document store
41. Concrete Table Inheritance
(a table for each product attribute set)
• Allows for data integrity
• Querying across attribute
sets quite hard to do (lots of
joins, OR statements and full
table scanning)
• New table needs to be
created for each new
attribute set
42. Class table inheritance
(single product table,
each attribute set in own table)
• Likely best solution within the
constraint of SQL
• Supports data type enforcement
• No data integrity enforcement
• Easy querying across categories (for
browse pages) since common data
in single table
• Every set needs a new table
• Requires a ton of forsight, as
changes are very complicated
60. Wanna Play?
• grab products.js from
http://github.com/spf13/mongoProducts
• mongo --shell products.js
• > use mongoProducts
61. Embedded documents
are great for orders
• Ordered items need to be fixed at the time
of purchase
• Embed them right in the order
db.order.find( { 'items.sku': '00e8da9f' } );
db.order.find( {
'items.details.actor': 'James Stewart'
} ).count();
76. Consistency
• MongoDB can enforce unique keys
... but only on keys shared by every
document in the collection
77. Consistency
• MongoDB can enforce unique keys
... but only on keys shared by every
document in the collection
• MongoDB can't enforce referential integrity
81. Isolation
• // Pseudo-isolated updates
db.foo.update( { x : 1 } , { $inc : { y : 1 } } , false , true );
• // Isolated updates
db.foo.update( { x : 1 , $atomic : 1 } , { $inc : { y : 1 } } , false ,
true );
• But there are caveats...
82. Isolation
• // Pseudo-isolated updates
db.foo.update( { x : 1 } , { $inc : { y : 1 } } , false , true );
• // Isolated updates
db.foo.update( { x : 1 , $atomic : 1 } , { $inc : { y : 1 } } , false ,
true );
• But there are caveats...
• Despite the $atomic keyword, this is not an atomic update,
since atomicity implies “all or nothing”
83. Isolation
• // Pseudo-isolated updates
db.foo.update( { x : 1 } , { $inc : { y : 1 } } , false , true );
• // Isolated updates
db.foo.update( { x : 1 , $atomic : 1 } , { $inc : { y : 1 } } , false ,
true );
• But there are caveats...
• Despite the $atomic keyword, this is not an atomic update,
since atomicity implies “all or nothing”
• An isolated update can only act on a single collection. Multi-
collection updates are not transactional, thus not isolatable.
88. • Atomic single document writes
• If you need atomic writes across multi-document
transactions don't use Mongo
• Many e-commerce transactions could be
accomplished within a single document write
89. • Atomic single document writes
• If you need atomic writes across multi-document
transactions don't use Mongo
• Many e-commerce transactions could be
accomplished within a single document write
• Unique indexes
• This only works on keys used by the entire collection
90. • Atomic single document writes
• If you need atomic writes across multi-document
transactions don't use Mongo
• Many e-commerce transactions could be
accomplished within a single document write
• Unique indexes
• This only works on keys used by the entire collection
• Isolated (not atomic) single collection updates.
• Mongo does not support locking
• There are ways to work around this
91. • Atomic single document writes
• If you need atomic writes across multi-document
transactions don't use Mongo
• Many e-commerce transactions could be
accomplished within a single document write
• Unique indexes
• This only works on keys used by the entire collection
• Isolated (not atomic) single collection updates.
• Mongo does not support locking
• There are ways to work around this
• It’s durable
92. There are ways to
guarantee ACID properties
in inconsistent databases
93. There are ways to
guarantee ACID properties
in inconsistent databases
(or, as we call them, consistency impaired databases)
96. Optimistic concurrency
• Read the current state of a product
• Make your changes with the assertion that
your product has the same state as it did
when you last read it
99. Optimistic concurrency
in MongoDB
We’ll use an update-if-current strategy.
This example is straight from the documentation:
100. Optimistic concurrency
in MongoDB
We’ll use an update-if-current strategy.
This example is straight from the documentation:
> t = db.inventory
> p = t.findOne({sku:'abc'})
> t.update({_id:p._id, qty:p.qty}, {'$inc': {qty: -1}});
> db.$cmd.findOne({getlasterror:1});
{"err" : , "updatedExisting" : true , "n" : 1 , "ok" : 1}
// it worked
101. Optimistic concurrency
in MongoDB
We’ll use an update-if-current strategy.
This example is straight from the documentation:
> t = db.inventory
> p = t.findOne({sku:'abc'})
> t.update({_id:p._id, qty:p.qty}, {'$inc': {qty: -1}});
> db.$cmd.findOne({getlasterror:1});
{"err" : , "updatedExisting" : true , "n" : 1 , "ok" : 1}
// it worked
... If that didn't work, try again until it does.
102. Optimistic concurrency
• Read the current state of a product.
• Make your changes with the assertion that
your product has the same state as it did
when you last read it.
103. Optimistic concurrency
• Read the current state of a product.
• Make your changes with the assertion that
your product has the same state as it did
when you last read it.
• It's possible to use OCC to bootstrap
pessimistic concurrency and fake row level
locking
104. Optimistic concurrency
• Read the current state of a product.
• Make your changes with the assertion that
your product has the same state as it did
when you last read it.
• It's possible to use OCC to bootstrap
pessimistic concurrency and fake row level
locking
... ask me about this some time
106. OCC works great for
companies like Amazon
• Amazon has a long-tail catalog
• A long tail catalog lends itself well to
optimistic concurrency, because it has low
data contention
126. 1. I go to Barneys and see a pair of shoes I just have to
buy.
127. 1. I go to Barneys and see a pair of shoes I just have to
buy.
2. I call “dibs” (by grabbing them off the shelf).
128. 1. I go to Barneys and see a pair of shoes I just have to
buy.
2. I call “dibs” (by grabbing them off the shelf).
3. I take them up to the cash register and purchase them:
129. 1. I go to Barneys and see a pair of shoes I just have to
buy.
2. I call “dibs” (by grabbing them off the shelf).
3. I take them up to the cash register and purchase them:
• Store inventory has been manually decremented.
130. 1. I go to Barneys and see a pair of shoes I just have to
buy.
2. I call “dibs” (by grabbing them off the shelf).
3. I take them up to the cash register and purchase them:
• Store inventory has been manually decremented.
• I pay for them with my trusty AmEx.
131. 1. I go to Barneys and see a pair of shoes I just have to
buy.
2. I call “dibs” (by grabbing them off the shelf).
3. I take them up to the cash register and purchase them:
• Store inventory has been manually decremented.
• I pay for them with my trusty AmEx.
4. If all goes according to plan, I walk out of the store.
132. 1. I go to Barneys and see a pair of shoes I just have to
buy.
2. I call “dibs” (by grabbing them off the shelf).
3. I take them up to the cash register and purchase them:
• Store inventory has been manually decremented.
• I pay for them with my trusty AmEx.
4. If all goes according to plan, I walk out of the store.
5. If my card was declined, the shoes are “rolled back”
133. 1. I go to Barneys and see a pair of shoes I just have to
buy.
2. I call “dibs” (by grabbing them off the shelf).
3. I take them up to the cash register and purchase them:
• Store inventory has been manually decremented.
• I pay for them with my trusty AmEx.
4. If all goes according to plan, I walk out of the store.
5. If my card was declined, the shoes are “rolled back”
... out onto the shelves and sold to the next customer
who wants them.
137. 1. Select a product.
2. Lock the row or table and confirm inventory.
138. 1. Select a product.
2. Lock the row or table and confirm inventory.
3. Purchase the product:
139. 1. Select a product.
2. Lock the row or table and confirm inventory.
3. Purchase the product:
• Decrement product inventory
140. 1. Select a product.
2. Lock the row or table and confirm inventory.
3. Purchase the product:
• Decrement product inventory
• Process payment
141. 1. Select a product.
2. Lock the row or table and confirm inventory.
3. Purchase the product:
• Decrement product inventory
• Process payment
4. Commit the transaction.
142. 1. Select a product.
2. Lock the row or table and confirm inventory.
3. Purchase the product:
• Decrement product inventory
• Process payment
4. Commit the transaction.
5. Roll back if anything went wrong.
147. Data we store in
MongoDB
• User • Event
• Product • TaxRate
• Product/Sellable • ... and then I got tired of
typing them in
• Address
• Just imagine this list has
• Cart 40 more classes
• CreditCard • ...
153. Inventory is transient
• Product::$inventory is effectively a
transient property
• Note how I said “effectively”? ... we cheat
and persist our transient property to
MongoDB as well
• We can do this because we never really
trust the value stored in Mongo
155. Accuracy is only important
when there’s contention
• For display, sorting and alerts, we can use
the value stashed in MongoDB
• It’s faster
• It’s accurate enough
156. Accuracy is only important
when there’s contention
• For display, sorting and alerts, we can use
the value stashed in MongoDB
• It’s faster
• It’s accurate enough
• For financial transactions, we want the
security and comfort of our RDBMS.
158. We keep inventory in
sync with listeners
• Every time a new product is created, its
inventory is inserted in SQL
159. We keep inventory in
sync with listeners
• Every time a new product is created, its
inventory is inserted in SQL
• Every time an order is placed, inventory is
verified and decremented
160. We keep inventory in
sync with listeners
• Every time a new product is created, its
inventory is inserted in SQL
• Every time an order is placed, inventory is
verified and decremented
• Whenever the SQL inventory changes, it is
saved to MongoDB as well
162. Be careful what you lock
1. Acquire inventory row lock and begin transaction
2. Check current product inventory
3. Decrement product inventory
4. Write the Order to SQL
5. Update affected MongoDB documents
6. Commit the transaction
7. Release product inventory lock
168. So how does an
RDBMS have a
reference to something
outside the database?
169. Setting the Product
class Order {
// ...
public function setProduct(Product $product)
{
$this->productId = $product->getId();
$this->product = $product;
}
}
170. • $productId is mapped and persisted
• $product which stores the Product
instance is not a persistent entity property
172. OrderPostLoadListener
use DoctrineORMEventLifecycleEventArgs;
class OrderPostLoadListener
{
public function postLoad(LifecycleEventArgs $eventArgs)
{
// get the order entity
$order = $eventArgs->getEntity();
// get odm reference to order.product_id
$productId = $order->getProductId();
$product = $this->dm->getReference('MyBundle:DocumentProduct', $productId);
// set the product on the order
$em = $eventArgs->getEntityManager();
$productReflProp = $em->getClassMetadata('MyBundle:EntityOrder')
->reflClass->getProperty('product');
$productReflProp->setAccessible(true);
$productReflProp->setValue($order, $product);
}
}
173. All Together Now
// Create a new product and order
$product = new Product();
$product->setTitle('Test Product');
$dm->persist($product);
$dm->flush();
$order = new Order();
$order->setProduct($product);
$em->persist($order);
$em->flush();
// Find the order later
$order = $em->find('Order', $order->getId());
// Instance of an uninitialized product proxy
$product = $order->getProduct();
// Initializes proxy and queries the monogodb database
echo "Order Title: " . $product->getTitle();
print_r($order);
174. Read more about
this technique
Jon Wage, one of OpenSky’s engineers, first
wrote about this technique on his personal
blog: http://jwage.com
You can read the full article here:
http://jwage.com/2010/08/25/blending-the-
doctrine-orm-and-mongodb-odm/
175. Questions?
http://spf13.com
@spf13
http://justinhileman.com
@bobthecow
http://opensky.com
PS: We’re hiring!! Contact us at jobs@opensky.com
Editor's Notes
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
actually, just the first 1/3 of it. \n
\n
Ironically this is how magento solves the performance problems associated with EAV, by caching the data into insanely wide tables.\n
\n
\n
\n
Can’t create a FK as each set references a different table. “Key” really made of attribute table name id and attribute table name\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Whenever you use a inter system coordination you need to implement your own atomic checks in the application... But SOAP does have transactions.. so not quite accurate. \n\nkyle idea... but we are fairly atomic with authorize.net\n\n
Whenever you use a inter system coordination you need to implement your own atomic checks in the application... But SOAP does have transactions.. so not quite accurate. \n\nkyle idea... but we are fairly atomic with authorize.net\n\n
Mongo has a grip of atomic operations: set, unset, etc.\n
Mongo has a grip of atomic operations: set, unset, etc.\n
Mongo has a grip of atomic operations: set, unset, etc.\n
\n
\n
\n
update( { where }, { values }, upsert?, multiple? )\n\n\n
update( { where }, { values }, upsert?, multiple? )\n\n\n
update( { where }, { values }, upsert?, multiple? )\n\n\n
update( { where }, { values }, upsert?, multiple? )\n\n\n
update( { where }, { values }, upsert?, multiple? )\n\n\n
\n
\n
\n
\n
\n
\n
\n
lemme show you an example\n
lemme show you an example\n
\n
\n
\n
\n
\n
\n
Imagine what would happen if everyone tried to access the same record at the same time. Just think of all those spinning while loops :)\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
And I’ll show you how OpenSky does it.\n
\n
Since we really like MongoDB, we want to keep as much of our data in Mongo as possible.\n
\n
\n
\n
Mind if I tell you a story?\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
But this sounds like it could get COMPLICATED...\n
But this sounds like it could get COMPLICATED...\n
But this sounds like it could get COMPLICATED...\n
But this sounds like it could get COMPLICATED...\n
But this sounds like it could get COMPLICATED...\n
But this sounds like it could get COMPLICATED...\n
But this sounds like it could get COMPLICATED...\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Given that split, we just happen to have the most boring SQL schema ever\n
This is pretty much it.\n\nIt goes on for a few more lines, with a few other properties flattened onto the order table. \n
\n
\n
Back to the schema for a second.\n\n- Product ID here is a fake foreign key.\n- Inventory is a real integer.\n\nThat’s all there is to this table.\n
\n
\n
\n
And here’s why we like Doctrine so much.\n
And here’s why we like Doctrine so much.\n
And here’s why we like Doctrine so much.\n
This will look a bit like when I bought those shoes.\n
This will look a bit like when I bought those shoes.\n
This will look a bit like when I bought those shoes.\n
This will look a bit like when I bought those shoes.\n
This will look a bit like when I bought those shoes.\n
This will look a bit like when I bought those shoes.\n
This will look a bit like when I bought those shoes.\n
\n
\n
The interesting parts here are the annotations.\n\nIf you don’t speak PHP annotation, this stores a document with two properties—ID and title—in the `products` collection of a Mongo database.\n
\n
\n
Did you notice the property named `product`? That’s not just a reference to another document, that’s a reference to an entirely different database paradigm.\n\nCheck out the setter:\n
This is key: we set both the product id and a reference to the product itself.\n
When this document is saved in Mongo, the productId will end up in the database, but the product reference will disappear.\n
\n
This is one of those listeners I was telling you about. At a high level:\n\n1. Every time an Order is loaded from the database, this listener is called.\n2. The listener gets the Order’s product id, and creates a Doctrine proxy object.\n3. It uses magick (e.g. reflection) to set the product property of the order to this new proxy.\n
Here’s our inter-db relationship in action.\n\nNote that the product is lazily loaded from MongoDB. Because $product is a proxy, we don’t actually query Mongo until we try to access a property of $product (in this case the title).\n