3. NoSQL : Various Shapes and Sizes
• Document Databases
• Column-family Oriented Stores
• Key/value Data stores
• XML Databases
• Object Databases
• Graph Databases
4. Key Questions
• How do I model data for my application?
• How do I determine which one is right for me?
• Can I easily shift from one database to the other?
• Is there a standard way of storing, accessing, and querying data?
5. Agenda for this session
• Explore some of the main NoSQL products
• Understand how they are similar and different
• How best to use these products in the stack
•
7. What is a document db?
• One that stores documents
• Popular options:
• MongoDB -- C++
• CouchDB -- Erlang
• Also Amazon’s SimpleDB
• ...what exactly is a document?
8. In the real world
• (Source: http://guide.couchdb.org/draft/why.html)
9. In terms of JSON
• {name: “John Doe”,
• zip: 10001}
10. What about db schema?
• Schema-less
• Different documents could be stored in a single collection
21. CouchDB: RESTful
• Supports REST verbs: GET, HEAD, PUT, POST, DELETE
• Supports Replication
• Supports the notion of attachments
• Could work in offline modes and supports small footprint profiles
39. RWN Math
• R – Number of nodes that are read from.
• W – Number of nodes that are written to.
• N – Total number of nodes in the cluster.
• In general: R < N and W < N for higher availability
40. R+W>N
• Easy to determine consistent state
• R + W = 2N
• absolutely consistent, can provide ACID gaurantee
• In all cases when R + W > N there is some overlap between read and write
nodes.
41. R = 1, W = N
• more reads than writes
•W=N
• 1 node failure = entire system unavailable
42. R = N, W =1
•W=N
• Chance of data inconsistency quite high
•R=N
• Read only possible when all nodes in the cluster are available
43. R = W = ceiling ((N + 1)/2)
Effective quorum for eventual consistency
44. Eventual consistency variants
• Causal consistency -- A writes and informs B then B always sees updated
value
• Read-your-writes-consistency -- A writes a new value and never see the old
one
• Session consistency -- read-your-writes-consistency within a client session
• Monotonic read consistency -- once seen a new value, never return previous
value
• Monotonic write consistency -- serialize writes by the same process
45. Dynamo Techniques
• Consistent Hashing (Incremental scalability)
• Vector clocks (high availability for writes)
• Sloppy quorum and hinted handoff (recover from temporary failure)
• Gossip based membership protocol (periodic, pair wise, inter-process
interactions, low reliability, random peer selection)
• Anti-entropy using Merkle trees
• (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-
dynamo-sosp2007.pdf)