Scale Your Database And Be Happy

Scale Your Database
And Be Happy
Sergio Bossa
@sbtourist
Spring Framework Italian Meeting 2009

Sergio Bossa - http://www.linkedin.com/in/sergiob
About Me
➔ Software architect and engineer
➔ Gioco Digitale (online gambling and casinos)
➔ Open Source enthusiast
➔ Terracotta Messaging (http://forge.terracotta.org)
➔ Actorom (http://code.google.com/p/actorom/)
➔ Terrastore (coming soon…)
➔ (Micro-)Blogger
➔ http://twitter.com/sbtourist
➔ http://sbtourist.blogspot.com

Premise #1
Database
≠
Relational Database

Premise #2
Relational Databases
Are Not
Dead

Premise #3
You'll never hear the word
NoSQL
Here

Scaling Your Database … what?
● Scaling used as a loose term here.
● Scale to handle heterogeneous data.
● Scale to handle more data.
● Scale to handle more load.
● Scale to handle topology changes due to:
● Unplanned growth.
● Unpredictable failures.

Scaling Your Database … why?
● Scaling the way you handle your data is going to
be more and more important.
● Business is moving toward data-centric
applications.
● Let's call them “social”.
● Interest is toward efficient ways of:
● Storing …
● Serving …
● Analyzing …
● Data!

Scaling Your Relational Database

Replication
● Master - Slave replication.
● One (and only one)
master database.
● One or more slaves.
● All writes goes to the
master.
● Replicated to slaves.
● Reads are balanced
among master and slaves.
● Major issues:
● Single point of failure.
● Single point of bottleneck.
● Static topology.

Replication
● Master - Master replication.
● One or more masters.
● Writes and reads can go
to any master node.
● Writes are replicated
among masters.
● Major issues:
● Limited performance and
scalability (due to
quorum).
● Complexity.
● Static topology.

Partitioning
● Vertical partitioning.
● Put tables belonging to
different functional areas
on different database
nodes.
● Scale your data and load
by function.
● Move joins to the
application level.
● Major issues:
● No more truly relational.
● Limited scalability (what if
a functional area grows
too much?).

Partitioning
● Horizontal partitioning.
● Split tables by key and put
partitions (shards) on
different nodes.
● Scale your data and load
by key.
● Move joins to the
application level.
● Needs some kind of
routing.
● Major issues:
● No more truly relational.
● Limited scalability (what if
you need to rebalance?).

Caching
● Put a cache in front of your
database.
● Distribute.
● Write-through for scaling
reads.
● Write-behind for scaling
reads and writes.
● Saves you a lot of pain, but
...
● “Only” scales read/write
load.

Still left out ...
● We didn't scale our data model.
● Still bound to the relational data model.
● We didn't scale our topology.
● Still static.
● Hard to add nodes for handling growth.
● Hard to tolerate nodes leaving due to failures.

Non Relational Databases, coming...

Friends or Foes?
We come in peace.
To help our old friend: the relational database.

Requirements
● Flexible data model.
● Extreme reliability.
● Scale as you need.
● Scale at unplanned change in the data model.
● Scale at unplanned growth in data size.
● Scale at unplanned growth in load.

Data Model
● Column oriented (hybrid).
● Group by columns.
● Hybrid: group by keys and column families.
● Dynamically add columns.
● Different key-identified values may have
different number of columns.
● Efficiently access the same group of columns
(column family).

Data Model
● Document oriented.
● Group by named collections.
● Identify by key.
● Store a schema-less document.
● JSON.
● XML.
● Whatever ...
● Dynamically update your data model by simply
changing your documents.
● Efficiently access whole documents.

Data Model
● Key/Value oriented.
● Group by named collections.
● Identify by key.
● Store an opaque value (whatever).
● Maybe the ancestor of modern non relationals.

Data Partitioning
● Consistent Hashing.
● Nodes mapped on a ring space of integers.
● Each node mapped on multiple locations.
● Each node owns a range of integers.
● Keys assigned to integers in the ring space.
● Stored on the owner node.
● Joining/Leaving nodes only affect the partition
they're mapped to.
● Hence, keys re-balancing is limited to that
specific range (efficient).

Data Partitioning

Data Consistency
● Strict (ACID) Consistency.
● All nodes ...
● At every point in time ...
● Hold a consistent view of the stored data.
● Reads and writes can executed on every node.
● Results will be always consistent and up-to-
date.
● Due to the CAP Theorem you will sacrifice one
of:
● Availability.
● Partition tolerance.

Data Consistency
● Eventual (BASE) Consistency.
● N: number of nodes you want to replicate to.
● W: number of required writes to succeed.
● R: number of required reads to succeed.
● W < N
● Nodes not receiving the write may eventually
get that value later.
● R < N
● Nodes not holding the read value are ignored.

Data Consistency
● Eventual (BASE) Consistency.
● High read/write availability.
● Work even when some nodes fail to read and
write values.
● Partition tolerance.
● Work even when some nodes cannot be
reached anymore.
● Due to the CAP Theorem you are sacrificing
consistency.

Data Versioning
● Vector Clocks.
● List of (node, counter) values associated to
each object version.
● Every time a given object is read by a node, all
its vector clocks are transferred.
● Every time a given object is written back by a
node, counter for that node is incremented.
● A vector clock can express causal ordering.
● A vector clock can express branching.
● Read-time reconciliation (read repair).

Data Versioning
● Other...
● Multi-Version Concurrency Control.
● Each read/write operation works on a
consistent snapshot.
● Optimistic concurrency.
● Write operations succeed only if their version
is the current one.
● Last Wins (optionally with timestamps).
● Last write operation wins.
● Optionally, with the highest timestamp.

Data Recovery
● Hinted Handoff.
● Writes to unavailable nodes get directed to
“secondary” nodes.
● Secondary nodes get an hint about the
original destination node.
● When the node is available again, the
secondary node send back the value.

Data Recovery
● Merkle Trees.
● For nodes missing large number of values (i.e.
after disaster recovery).
● Nodes exchange a tree composed of:
● Leaves containing each the hash of a value
hosted by the node.
● Parents containing each the hash of the
children.
● Updated values are recovered by comparing
hashes and reading back from healthy nodes.

Membership
● Master-based.
● Registry-like.
● Membership
information maintained
and broadcasted by
one or more master
nodes.
● Consistent.
● No SPOF with
active/passive master.
● Prone to partitioning
failures.

Membership
● Gossip-based.
● Peer-to-Peer.
● Membership information
is randomly spread
among nodes.
● Each node picks one
or more nodes,
broadcasting them its
own topology view.
● All nodes will
eventually reach a
consistent view of the
cluster topology.

Data Analysis
● The importance of data locality.
● A distributed system is built by:
● Moving data toward its behavior.
● ... or ...
● Moving behavior toward its data.
● An efficient distributed system is built by:
● Moving behavior toward its data.

Data Analysis
● Map-Reduce.
● Map data
analysis and
computation
tasks toward the
data itself.
● Reduce results.
● No need to
move data
around.

Use Cases (1)
● Runtime data.
● “Runtime” VS “Transactional”.
● Not all data need complex relations.
● Not all data need to be persisted forever.
● That is, everything regarding the current
“runtime” state.
● User session and everything related.
● Put the “runtime” state into your N-RDBMS.
● When the “runtime” state turns into
“transactional”, put it into your RDBMS.

Use Cases (2)
● Hot spots.
● For read-intensive data:
● Use your N-RDBMS as a primary database
for reads.
● Use your RDBMS as a primary database for
writes and load data into the N-RDBMS from
a background thread.
● For read/write-intensive data:
● Use your N-RDBMS as a primary database
for writes and reads.
● Put your data in your RDBMS from a
background thread (if needed).

Use Cases (3)
● Intense data computations.
● When the relational model doesn't efficiently
represent your data ...
● And join operations are just too expensive ...
● N-RDBMS come to rescue!
● Providing more efficient data
representation/storage.
● Providing grid-style computations (i.e. Map-
Reduce).

Products (1)
● MongoDB
● http://www.mongodb.org
● Document-based.
● (Binary) Json.
● Support for indexes and object queries.
● Full support for master-slave replication.
● Alpha support for sharding.
● ACID (unless failure scenarios during
replication).

Products (2)
● Cassandra
● http://incubator.apache.org/cassandra/
● Column-based (hybrid).
● Keys.
● Column Families.
● Columns.
● Super-Columns.
● Support for ordered range queries.
● Fully distributed.
● Peer-to-Peer.
● Eventually consistent.

Products (3)
● Voldemort
● http://project-voldemort.com
● Key/Value.
● Pluggable data serialization.
● No support for queries.
● Peer-to-Peer.

Products (4)
● Riak
● http://riak.basho.com/
● Document-based.
● Json.
● Links.
● Support for Map-Reduce.
● Peer-to-Peer.
● With runtime dynamic tuning.

Final words
● Know how to scale your relational database.
● Don't dismiss it just to follow the hype.
● Know how non-relational databases scale.
● There are many choices around.
● Know your use cases.
● Make sensible decisions.
● Enjoy!
● And be happy!

Thank you!
Q&A

Scale Your Database And Be Happy

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (20)

Scale Your Database And Be Happy