2. This talk is essentially the first couple
chapters of “NoSQL Distilled” (Sadalage,
Fowler)
Highly recommend this book!
3. App development productivity
Fixes “impedance mismatch”
Large scale
Happily handles the “threeVs” of “big data”
▪ Volume
▪ Velocity
▪ Variety
4. You’ve always needed a “backing store”
…could be files
great for a single user or application
…could be databases
great for multiple users/applications
…and on the DB side, could be:
Application Database (used by single app)
Integration Database (used by several apps)
5. Concurrency
Simple problem, very tough to solve
Application Datastores
One app, many users
Integration Datastores
One set of data, many apps, lots of potential for
headbanging
6. {
“id”: “1001”,
"firstName": ”Ann",
"lastName": "Williams",
"age": 55,
“purchasedItems”:
{
0321290533 {qty, price… }
0321601912 {qty, price… }
0131495054 {qty, price… }
}
“paymentDetails”:
{ cc info… }
"address":
{
"street": "1234 Park",
"city": "San Francisco",
"state": "CA",
"zip": "94102"
}
}
1 object = 10, 20, 100?Tables. Ugh…
Your code has one structure, but your RDBMS stores in another…
7. A great "all purpose" storage + query tool
ACID compliant
Supports many users
Supports many apps
3NF stores data efficiently
Disk wasn't always cheap
Fast and tunable
Introduced a common interface (SQL)
Which every vendor quickly then “broke”
8. Impedance mismatch
Many teams build (then have to maintain) custom
ORM or SOA proxies
Weren't build to be distributed
Google, Amazon, et al hit hard walls on RDBMS
capabilities
Often required expensive, proprietary hardware
Ooops, I sharded myself!
Additional complexity
Cross shard joins now extremely expensive
9. Velocity
Faster responses required
Volume
100s ofTB, PB now common
“Web Scale” can mean 100s of thousands of
concurrent transactions
Both of those increasing rapidly
Variety
Mixed structure, semi-structured, unstructured
10. Bigtable paper (by Google)
Heavily influenced the “Columnar” branch of NoSQL
Dynamo paper (by Amazon)
Heavily influenced the “KeValue” branch of NoSQL
This is NOT DynamoDB!!!
Design considerations:
Distributed from the start
Clusters of inexpensive commodity hardware are cheaper &
more fault tolerant at scale
Relaxed and/or tunable C&A (from CAP theorem)
Deal with unheard of volume & velocity
Schemaless (bye bye impedance mismatch)
11. Consistency
How consistent the data looks to 2 or more
viewers
“Eventual” consistency possible (and common)!
Availability
Responsiveness of the system
PartitionTolerance
How well does the system respond to partition
failures?
This is normally “untunable”, unlike the C&A
12. Because “Cloud” and “Big Data” were just not
confusing enough people in IT
"Not ONLY SQL" - incredibly unfortunate
"little o"
Name born out of a Bay Area meetup in 2009
…and regretted / derided ever since
13. Fancy term for “multiple datastores”
...you're already doing it
Browser side cache
Memcache
Query cache
OLAP systems
...just add NoSQL
Tell your RDBMS not to worry – it will (probably)
still live a long, happy life
14. Generally Open Source
Schemaless
Easily change schema or do 'schema on read'
Cluster-oriented
With the exception of Graph DBs
Generally favor "Web Scale" over ACID
Generally better for APPLICATION Databases
Aggregate data models
Let you treat a group of data as a unit
Again, graph DBs are an exception here…
15. KeyValue
Fast lookup on a single “hashed” key
Document
Each “Document” self-defines it’s own structure
Columnar (or Column-Family)
Great for “sparse” data (millions of columns)
Graph [bit of a black sheep in the NoSQL family]
Specialized to crawl graph relations like social
networks, resource flows, etc
Less popular at the moment, but gaining steam fast
16. Can only look up by (normally a single) Key
Extremely fast for that key
Value can be anything
Example: DynamoDB, Riak
17. Document can contain anything
json extremely popular
But can also be XML, CSV, semi-structured,
unstructured, custom… literally anything
Can query on aggregates inside of document
Can even index on aggregates
Can retrieve part of the document
Extremely memory intensive
Example: MongoDB, CouchDB
18. Great for “sparse” data (populated columns vary
greatly between rows)
Group columns into families
Think of it as a “two level” aggregate
First level “key” is rowID or aggregate of interest
2nd level values are the columns
You can visualize the data as row or column-
oriented
Example: Hbase, Cassandra
19. Built to efficiently crawl & search graph trees
Social Networks
Resource flows
“people of interest”
Don’t run well on clusters
Example: Neo4J (and not much else right now)
20. RDBMS were not designed with many of today’s
problems in mind
NoSQL DBs were built from the ground up to deal
with these “ThreeV” issues
NoSQL can either replace or (more commonly)
supplement existing RDBMS functions
Move hot tables out to DynamoDB
Write a greenfield app from ground up with only a NoSQL
datastore
Consistency & Availability are often tunable
Many flavors exist & each have their own best use
cases
Research heavily before deciding upon a platform