No sql distilled-distilled

 This talk is essentially the first couple
chapters of “NoSQL Distilled” (Sadalage,
Fowler)
 Highly recommend this book!

 App development productivity
 Fixes “impedance mismatch”
 Large scale
 Happily handles the “threeVs” of “big data”
▪ Volume
▪ Velocity
▪ Variety

You’ve always needed a “backing store”
 …could be files
 great for a single user or application
 …could be databases
 great for multiple users/applications
 …and on the DB side, could be:
 Application Database (used by single app)
 Integration Database (used by several apps)

 Concurrency
 Simple problem, very tough to solve
 Application Datastores
 One app, many users
 Integration Datastores
 One set of data, many apps, lots of potential for
headbanging

{
“id”: “1001”,
"firstName": ”Ann",
"lastName": "Williams",
"age": 55,
“purchasedItems”:
{
0321290533 {qty, price… }
0321601912 {qty, price… }
0131495054 {qty, price… }
}
“paymentDetails”:
{ cc info… }
"address":
{
"street": "1234 Park",
"city": "San Francisco",
"state": "CA",
"zip": "94102"
}
}
1 object = 10, 20, 100?Tables. Ugh…
Your code has one structure, but your RDBMS stores in another…

A great "all purpose" storage + query tool
 ACID compliant
 Supports many users
 Supports many apps
 3NF stores data efficiently
 Disk wasn't always cheap
 Fast and tunable
 Introduced a common interface (SQL)
 Which every vendor quickly then “broke”

 Impedance mismatch
 Many teams build (then have to maintain) custom
ORM or SOA proxies
 Weren't build to be distributed
 Google, Amazon, et al hit hard walls on RDBMS
capabilities
 Often required expensive, proprietary hardware
 Ooops, I sharded myself!
 Additional complexity
 Cross shard joins now extremely expensive

 Velocity
 Faster responses required
 Volume
 100s ofTB, PB now common
 “Web Scale” can mean 100s of thousands of
concurrent transactions
 Both of those increasing rapidly
 Variety
 Mixed structure, semi-structured, unstructured

 Bigtable paper (by Google)
 Heavily influenced the “Columnar” branch of NoSQL
 Dynamo paper (by Amazon)
 Heavily influenced the “KeValue” branch of NoSQL
 This is NOT DynamoDB!!!
Design considerations:
 Distributed from the start
 Clusters of inexpensive commodity hardware are cheaper &
more fault tolerant at scale
 Relaxed and/or tunable C&A (from CAP theorem)
 Deal with unheard of volume & velocity
 Schemaless (bye bye impedance mismatch)

 Consistency
 How consistent the data looks to 2 or more
viewers
 “Eventual” consistency possible (and common)!
 Availability
 Responsiveness of the system
 PartitionTolerance
 How well does the system respond to partition
failures?
 This is normally “untunable”, unlike the C&A

 Because “Cloud” and “Big Data” were just not
confusing enough people in IT
 "Not ONLY SQL" - incredibly unfortunate
"little o"
 Name born out of a Bay Area meetup in 2009
 …and regretted / derided ever since

Fancy term for “multiple datastores”
 ...you're already doing it
 Browser side cache
 Memcache
 Query cache
 OLAP systems
 ...just add NoSQL
 Tell your RDBMS not to worry – it will (probably)
still live a long, happy life

 Generally Open Source
 Schemaless
 Easily change schema or do 'schema on read'
 Cluster-oriented
 With the exception of Graph DBs
 Generally favor "Web Scale" over ACID
 Generally better for APPLICATION Databases
 Aggregate data models
 Let you treat a group of data as a unit
 Again, graph DBs are an exception here…

 KeyValue
 Fast lookup on a single “hashed” key
 Document
 Each “Document” self-defines it’s own structure
 Columnar (or Column-Family)
 Great for “sparse” data (millions of columns)
 Graph [bit of a black sheep in the NoSQL family]
 Specialized to crawl graph relations like social
networks, resource flows, etc
 Less popular at the moment, but gaining steam fast

 Can only look up by (normally a single) Key
 Extremely fast for that key
 Value can be anything
 Example: DynamoDB, Riak

 Document can contain anything
 json extremely popular
 But can also be XML, CSV, semi-structured,
unstructured, custom… literally anything
 Can query on aggregates inside of document
 Can even index on aggregates
 Can retrieve part of the document
 Extremely memory intensive
 Example: MongoDB, CouchDB

 Great for “sparse” data (populated columns vary
greatly between rows)
 Group columns into families
 Think of it as a “two level” aggregate
 First level “key” is rowID or aggregate of interest
 2nd level values are the columns
 You can visualize the data as row or column-
oriented
 Example: Hbase, Cassandra

 Built to efficiently crawl & search graph trees
 Social Networks
 Resource flows
 “people of interest”
 Don’t run well on clusters
 Example: Neo4J (and not much else right now)

 RDBMS were not designed with many of today’s
problems in mind
 NoSQL DBs were built from the ground up to deal
with these “ThreeV” issues
 NoSQL can either replace or (more commonly)
supplement existing RDBMS functions
 Move hot tables out to DynamoDB
 Write a greenfield app from ground up with only a NoSQL
datastore
 Consistency & Availability are often tunable
 Many flavors exist & each have their own best use
cases
 Research heavily before deciding upon a platform

No sql distilled-distilled

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a No sql distilled-distilled

Similar a No sql distilled-distilled (20)

Más de rICh morrow

Más de rICh morrow (6)

No sql distilled-distilled