Eberhard Wolff discusses NoSQL databases and architectures. He explains that NoSQL databases sacrifice joins for scalability by storing related data together rather than across tables. This allows for high availability at the cost of eventual consistency according to the CAP theorem. Architects taking a polyglot approach must choose the right database for each use case, factoring in scalability, flexibility and operational tradeoffs.
4. NoSQL Is All About the Persistence
Question
Eberhard Wolff - @ewolff
5. Key-Value Stores
Key
Maps keys to values
► Just a large globally available Map
► i.e. not very powerful data model
►
Value
42
Some
data
No complex queries or indices
► Just access by key
► Might add e.g. full text engine
►
Redis: Cache + Persistence
► Riak: Massive scale
+Solr queries
►
Eberhard Wolff - @ewolff
6. Wide Column
Add any "column" you like to a row
► key-(column-value)
► Column families like tables
► E.g. in the "Users" column family
►
> "someuser" è ("username"è"someuser"),
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
Columns named: indexing possible
► So fast queries possible
XX
XX
XX
("email" è"someuser@example.com")
►
XX
XX
XX
XX
XX
xX
XX
XX
XX
XX
XX
XX
Apache Cassandra
► Amazon SimpleDB
► Apache HBase
► All tuned for large data sets
►
Eberhard Wolff - @ewolff
7. Document Stores
Aggregates are typically stored as
"documents“ (key-value collection)
► JSON quite common
► No fixed schema
► Indexes possible
► Queries possible
►
> E.g. "find all baskets that contain the product 123"
Still great horizontal scalability
► Relations might be modeled as links
►
MongoDB, CouchDB
►
Eberhard Wolff - @ewolff
8. Graph
Nodes with Properties
► Typed relationships with
properties
►
Ideal e.g. to model relations in a
social network
►
Easy to find number of followers,
degree of relation etc.
► Hard to scale out
►
Neo4j
►
Eberhard Wolff - @ewolff
9. NoSQL Benefits
Costs
• Scale out instead of Scale Up
• Cheap Hardware
• Usually Open Source
Dev
Ops
Flexibility
• Schema in code not in
database
• Easier to upgrade schema
• Easier to handle
heterogeneous data
No Object/relational impedance mismatch
• NoSQL database are more OO like
Eberhard Wolff - @ewolff
12. Document-oriented databases
Offer scale out
> Unless you need huge amounts of data
►
Offer a rich and flexible data model
> …and queries
►
Cost
Flexibility
Other databases have other sweet spots
> Huge data sets
> Graph structures
> Analyzing data
►
Niches or mainstream?
►
Eberhard Wolff - @ewolff
17. Polyglot Persistence in Ecommerce
Application
Needs transactions
& reports. Data fit well in
tables.
Complex document-like
data structures and
complex queries
Financial Data
Product Catalog
RDBMS
Document
Store
High Performance &
Scalability
No complex queries
Based on friends, their
purchases and reviews
Shopping Cart
Recommendation
Key / Value
Graph
Eberhard Wolff - @ewolff
18. The NoSQL Game
Needs transactions
& reports. Data fit well in
tables.
Complex document-like
data structures and
complex queries
2700
High Score!
Financial Data
Product Catalog
RDBMS
Document
Store
0
1000
High Performance &
Scalability
No complex queries
Based on friends, their
purchases and reviews
Shopping Cart
Recommendation
Key / Value
Graph
900
800
Eberhard Wolff - @ewolff
19. Just Like the Patterns Game!
Points for each Pattern used
Extra points if one class implements
multiple Pattern
Eberhard Wolff - @ewolff
20. This is not how
Software Architecture works.
Eberhard Wolff - @ewolff
21. Why not?
More is worse!
More hardware
More Developer Skills
Not necessarily bad
More Ops Trouble
• Installation
• Backup
• Disaster Recovery
• Monitoring
• Optimizations
Eberhard Wolff - @ewolff
22. But: Polyglot Persistence Has a Point
Object-oriented Databases did it wrong
► Strategy: Replace RDBMS
► Enterprises will stick to RDBMS
► Pure technology migration basically
never happens
► …only vendors think differently
►
Eberhard Wolff - @ewolff
24. Archives for Insurances
Legacy migration
► Querying and visualizing not migrated
data
► i.e. old contracts
► Legacy hard- and software can be
switched off
► Flexibility: Host data formats
► Cost: Inexpensively handling large data
volumes
►
Eberhard Wolff - @ewolff
25. Complex Document Processing System
MongoDB
Documentoriented
Documents
Redis
Key/value
in memory
Meta Data for
quick access
elastic
search
Search
engine
Search
index
Eberhard Wolff - @ewolff
26. Alternative: Only elasticsearch
• Stores original documents as
well
• (like a key/value store)
• Support for complex queries
elastic
• Very powerful features also for search
data mining / analytics
• Not well suited for update heavy
operations
• Backup / disaster recovery?
• Written in Java
Eberhard Wolff - @ewolff
28. Alternative: Only MongoDB
• Now with (limited beta)
fulltext search
• Excellent support for updates
• Quite fast – memory mapped
MongoDB
files
• Also fast for updates
• Disaster recovery possible
• Map/Reduce support
• Written in C++
Eberhard Wolff - @ewolff
31. What about Redis?
• MongoDB uses memory mapped
files
– Why Redis?
• Like a Swiss Knife
• Cache
• Messaging
• Central coordination in a
distributed environment
• Written in C
Redis
Eberhard Wolff - @ewolff
33. Alternative: Riak
•
•
•
•
•
•
Key / value store
But includes Solr for fulltext
search
What is the difference to a
document store then?
Map/reduce possible
Written in Erlang
Smart scaling
Eberhard Wolff - @ewolff
36. Scaling Riak
Server A
Shard3
Shard1
Server B
Shard1
Shard2
Shard4
Shard4
New Server
Server D
Shard2
Shard4
Server C
Shard2
Shard3
Shard3
Shard1
Eberhard Wolff - @ewolff
39. Data Access: RDBMS
Optimizations
Data Model
• Indices
• Tables
spaces
No need to
change code
• …
• Schema
• Stored Procedures
DBA
Data Access
• Queries
• Other code
RDBMS
Architect/
Developer
Eberhard Wolff - @ewolff
40. RDBMS separate data from
data access
Indices
Joins and normalization
allow flexible data access
patterns
Eberhard Wolff - @ewolff
41. Sacrifice Joins for Scalability
► Join: Combine tables to retrieve results
► Need transactions spanning multiple
tables
► Example: Customer table + addresses
► Inserts need locks and consistency
across both tables
Limits scalability
► Global and distributed locks are nasty
► Consistency limits either availability or
partition tolerance
Eberhard Wolff - @ewolff
►
42. CAP Theorem
Consistency
►
> All nodes see the same data
> Not the ACID Consistency
Availability
►
> Node failure do not prevent survivors from operating
Partition Tolerance
►
> System continues to operate despite arbitrary message loss
C
Can at max have two
A
P
► Or rather: If network fail – choose A or C.
►
Eberhard Wolff - @ewolff
44. BASE
► Basically Available Soft state
Eventually consistent
► I.e. trade consistency for
availability
Pun concerning ACID…
► Not the same C, however!
►
Eberhard Wolff - @ewolff
45. BASE
Eventually consistent
► If no updates are sent for a
while all previous updates will
eventually propagate through
the system
► Then all replicas are consistent
► Can deal with network
partitioning: Message will be
transferred later
► All replicas are always available
►
Pun concerning ACID…
► Not the same C, however!
►
Eberhard Wolff - @ewolff
46. Banking is BASE
ATMs relax rules on providing cash if
network partitioned
►
Your account is only guaranteed to be
consistent by the end of the year
►
Eberhard Wolff - @ewolff
47. No Joins - What now?
► Customer and addresses must be
consistent!
► Solution: Store both as one entity
► Atomic changes easily possible
► Queries might be distributed across
multiple notes
“NoSQL does not support transactions /
ACID” is wrong
►
> NoSQL does not support Joins is better
> Atomic changes still possible
> Schema design different
Eberhard Wolff - @ewolff
48. Data Access MongoDB
Optimizations
• Only basic
indices
Other
optimizations
must be
done in
code
DBA
Data Model
• Influences access
patterns
Data Access
• WriteConcerns
how much do
love your data?
• Shard key
• Consistency
MongoDB
Architect/
Developer
Eberhard Wolff - @ewolff
49. Cluster: RDBMS
►
Transparent to developers
►
How many nodes?
►
A special setup of hardware and RDBMS software
DBA
Eberhard Wolff - @ewolff
50. Cluster: MongoDB
►
CAP theorem
> If the network is
down choose
> Consistency xor
> Availabilty
►
Deals with replication
► MongoDB has
master / slave
replication
Write Concerns:
> Unacknowledged
> Acknowledged
> Journaled
> Some nodes in the
replica set
►
Queries might go to
master only or also
slaves
► Influences
consistency
►
MongoDB
Architect/
Developer
Eberhard Wolff - @ewolff
51. More Power and more Responsibility
Architect
DB Admin
Eberhard Wolff - @ewolff
52. Architects
Architecture has always been a multidimensional problem
►
►
Need to choose persistence technology
►
Need to think about operations
►
Needs to do DBA work
Eberhard Wolff - @ewolff
53. NoSQL Is All About the Persistence
Question
Eberhard Wolff - @ewolff