NoSQL matters, on that much I'm sure we can all agree. But if we take a closer look, what really matters when it comes to choosing a data store and/or a data processing platform? What really matters when it comes to getting the most out of that platform? And what is really going to matter as we take things to the next level?
2. 1. when choosing a data store / processing
platform
2. when it comes to getting the most out of that
platform
3. when we take things to the next level
What really matters...
3.
4. The 13 horsemen of the apocalypse...
Your application(s)
Anomaly (Prevented By) Tolerable? Mitigation (M,G,A…)
Dirty Writes Read Uncommitted
Dirty Reads Read Committed
Fuzzy Reads
(non-repeat-
able)
Item-Cut Isolation
Phantoms Predicate-Cut Isolation
...
5. Your application(s)
Anomaly (Prevented By) Tolerable? Mitigation
Read Skew MAV Isolation +
item-cut
Lost Update Repeatable Read
Cursor Lost
Update
Cursor Stability
Write Skew Repeatable Read
Stale Reads Partition-intolerance
7. Your Developers
“we believe there is considerable work to be
done to improve the programmability of highly-
available systems” - Bailis et al. 2014 (HAT)
9. Consistency and all that...
If you accept a weaker consistency model
make sure it’s a genuine trade-off and you’re
getting something (you need) in return.
You can have causal consistency with (C)AC
11. Operations & all the other use cases
…it is important to consider the data accesses that don’t
use the API. These include back-ups, bulk import and
deletion of data, bulk migrations from one data format to
another, replica creation, asynchronous replication,
consistency monitoring tools, and operational debugging.
An alternate store would also have to provide atomic write
transactions, efficient granular writes, and few latency
outliers.
- Facebook 2013 (TAO)
“
”
14. Why is it so hard?
“We have found that the standard verification techniques in
industry are necessary but not sufficient. We use deep
design reviews, code reviews, static code analysis, stress
testing, fault-injection testing, and many other techniques,
but we still find that subtle bugs can hide in complex
concurrent fault-tolerant systems.” - Amazon 2014
20. How Big?
“Working sets are Zipf-distributed. We can therefore store
in memory all but the very largest datasets, which we avoid
storing in memory altogether. For example, the distribution
of input sizes of MapReduce jobs at Facebook is heavy-
tailed. Furthermore, 96% of active jobs can have their
entire data simultaneously fit in the corresponding clusters’
memory” - Tachyon, Lie et al. 2014
22. Performance
40-80% of all MR jobs would perform
better on a single machine!
(and cost less, and be easier to
operate, and have many fewer
failures…)
23. COST
The Configuration that Outperforms a Single
Thread
“You can have a second computer
once you’ve shown you know how
to use the first one.” - Paul Barham
38. Some closing thoughts
● Do you need eventual?
● Have you planned for anomalies?
● Does it actually work?
● Are you distributing for the right reasons? (AL…)
● Do you need exact?
● Do you need it ASAP?
● Can you keep CALM?
● Do you understand your application’s invariants?
40. References
● Highly Available Transactions, Virtues & Limitations - Bailis et al. 2014 http:
//blog.acolyer.org/2014/11/07/highly-available-transactions-virtues-and-
limitations/
● Building on Quicksand - Helland 2009 http://blog.acolyer.
org/2015/03/23/building-on-quicksand/
● F1: A Distributed SQL Database that Scales - Google 2012 http://blog.
acolyer.org/2015/01/06/f1-a-distributed-sql-database-that-scales/
● Scalability! But at what COST? - McSherry et al. 2015 http://blog.acolyer.
org/?p=941 (to appear, June 5th 2015)
● Applying the Universal Scalability Law to Organisations - Colyer 2015 http:
//blog.acolyer.org/2015/04/29/applying-the-universal-scalability-law-to-
organisations/
41. References
● Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area
Storage with COPS - LLoyd et al. 2011 http://blog.acolyer.
org/2015/03/17/consistency-availability-and-convergence-cops/
● Consistency, Availability, and Convergence - Mahajan et al. 2014 http:
//blog.acolyer.org/2015/03/17/consistency-availability-and-convergence-
cops/
● Tachyon: Reliable, Memory-Speed Storage for Cluster Computing - Lie et
al. 2014 http://blog.acolyer.org/2014/12/04/tachyon-reliable-memory-
speed-storage-for-cluster-computing/
42. References
● Musketeer: all for one, one for all in data processing systems - Gog et al.
2015 http://blog.acolyer.org/2015/04/27/musketeer-part-i-whats-the-best-
data-processing-system/ and http://blog.acolyer.
org/2015/04/28/musketeer-part-ii-one-for-all-and-all-for-one/
● Pregel: A System for Large-Scale Graph Processing - Google 2010 http:
//blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-
processing/
● FlashGraph: Processing Billion Node Graphs on an array of commodity
SSDs - Zheng et al. 2015 http://blog.acolyer.org/?p=935
43. References
● ApproxHadoop: Bringing Approximations to Hadoop Frameworks - Goiri
2015 http://blog.acolyer.org/2015/04/16/approxhadoop-bringing-
approximations-to-mapreduce-frameworks/
● BlinkDB: http://blinkdb.org/
● Making Sense of Performance in Data Analytics Frameworks - Ousterhout
et al 2015 http://blog.acolyer.org/2015/04/20/making-sense-of-
performance-in-data-analytics-frameworks/
● A Comprehensive Study of Convergent and Commutative Replicated Data
Types - Shapiro et al. 2011 http://blog.acolyer.org/2015/03/18/a-
comprehensive-study-of-convergent-and-commutative-replicated-data-
types/
44. References
● The Declarative Imperative: Experiences and Conjectures in Distributed
Logic - Hellerstein 2010 http://blog.acolyer.org/2014/11/13/the-declarative-
imperative-experiences-and-conjectures-in-distributed-logic/
● Fast Remote Memory - Dragojevic et al. 2014 http://blog.acolyer.
org/2015/05/20/farm-fast-remote-memory/
● Mojim: A Reliable and Highly-Available Non-Volatile Memory System -
Zhang et al. 2015 http://blog.acolyer.org/2015/04/14/mojim-a-reliable-and-
highly-available-non-volatile-memory-system/
45. References
● Consistency Analysis in Bloom: A Calm and Collected Approach - Alvaro et
al. 2011 http://blog.acolyer.org/2015/03/16/consistency-analysis-in-bloom-
a-calm-and-collected-approach/
● Edelweiss: Automatic Storage Reclamation for Distributed Programming -
Conway et al. 2014 http://blog.acolyer.org/2015/02/20/edelweiss-
automatic-storage-reclamation-for-distributed-programming/
● Scalable Atomic Visibility with RAMP Transactions - Bailis et al. 2014 http:
//blog.acolyer.org/2015/03/27/scalable-atomic-visibility-with-ramp-
transactions/
46. References
● Coordination Avoidance in Database Systems - Bailis et al. 2014 http:
//blog.acolyer.org/2015/03/19/coordination-avoidance-in-database-
systems/
● Putting Consistency Back into Eventual Consistency - Balegas et al. 2015
http://blog.acolyer.org/2015/05/04/putting-consistency-back-into-eventual-
consistency/
● Use of Formal Methods at Amazon Web Services - Newcombe et al. 2014
http://blog.acolyer.org/2014/11/24/use-of-formal-methods-at-amazon-web-
services/
● Consistency Trade-offs in Modern Distributed Database Systems Design -
Abadi 2012 http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf