Breaking the Kubernetes Kill Chain: Host Path Mount
CAP Theorem - Theory, Implications and Practices
1. CAP Theorem
Theory, Implications and Practices
Tomer Cagan
Yoav Francis
June 2012
Seminar in Concurrent and Distributed Computing - Prof. Gadi Taubenfeld 2012/2
Interdisciplinary Centre, Herzelia, Israel
2. Agenda
• Survey
• CAP - Quick Glance
• Background and Motivation
• Who needs it?
• Why?
• A little about consistency model
• CAP Theorem
• Brewer‟s conjecture and proof
• Context (FLP – Concensus)
• Implications
• CAP in Practice - Living with CAP
3. Quick Survey
• Do you know/use any of the following:
• SQL?
• ACID?
• Database replication?
• NoSQL?
• NoSQL Types/Implementations?
• Distributed Development?
• Ever heard of CAP Theorem?
4. Purpose/Goals
• CAP Theorem is at the base of
developing distributed systems
• Still - not everyone aware of it.
• We want to (goals)
o introduce it (theorem)
o understand what it means to us as
developers (implications and criticism)
o learn (CAP in practice):
of what others are doing
what can be done
5. Brewer‟s CAP Theorem
• Presented as a conjuncture at PODC 2000
(Brewer's conjecture)
• Formalized and proved in 2002 by Nancy
Lynch and Seth Gilbert (MIT)
• Consistency, Availability and Partition-
Tolerance cannot be achieved all at the
same time in a distributed system
• There is a tradeoff between these 3
properties
6. CAP - Definition
In simple terms:
in an asynchronous network that performs as
expected, where messages may be lost
(partition-tolerance), it is impossible to
implement a service that provides consistent
data and responds eventually to every request
(availability) under every pattern of message
loss.
7. CAP - Definitions
• Consistency
• Data is consistent and the same for all nodes.
• All the nodes in the system see the same state of the
data
vi
• Availability
• Partition-tolerance
vi vi vi
8. CAP - Definitions
• Availability
• Every request to non-failing node should be
processed and receive response whether it failed or
succeeded.
• Consistency
• Partition-tolerance
9. CAP - Definitions
• Partition-Tolerance
• If some nodes crash / communication fails,
service still performs as expected
• Consistency
• Availability
20. Little Background
• RDBMS
• Scalability
o Vertical Scaling
o Horizontal Scaling
o Big data challenge
• Consistency Model
21. RDBMS
• Emerged in 1970 (initially a mess)
• Standardized with SQL
• Ubiquitous – widely used and understood
• Supports transactions
• High availability is achieved via Replication
• Master – Master
• Master – Slave
• Synchronous/Asynchronous
• But, in general scales vertically…
22. Scalability
• Vertical (scale up)
• Few (10s max) nodes
• Grow by add/replace hardware on nodes
• “Simple” to work with (less concurrency)
• But, expensive to scale (huge nodes,
expensive equipment, expensive dedicated
storage solutions)
23. Scalability – cont.
• Horizontal (scale out)
• Many nodes (100s, 1000s)
• Grow by adding nodes
• Easy to grow – adding commodity servers is
not expensive
• Especially with Virtualization and the cloud.
• But
• More complex management
• Harder to understand state and develop…
24. Large Scale RDBMS
• We saw that is expensive to scale
• We know what it gives us :
• Guarantees Atomicity, Consistency, Isolation
and Durability
• E.g. Support transactions
• “You know what you will get”
• But how well it works in case of:
• largely distributed environment?
• Very large volumes of data?
25. Reminder – Consistency Model
• We've talked about it before:
• Linearization, quasi-linearization
• Transactional memory
• Etc.
• When does change take effect
• What is the "contract" between the system
and the developer
• Will discuss more later...
26. Consistency Model in RDBMS
• We are very used to RDBMS
• Clear contract
o Transactions
• Ever thought about the consistency
model?
• Consistency Model in RDBMS is
ACID
27. • Atomicity
• of an operation(transaction) - "All or nothing“ – If part fails, the
entire transaction fails.
• Consistency*
• Database will remain in a valid state after the transaction.
• Means adhering to the database rules (key, uniqueness, etc.)
• Different from CAP‟s consistency definition.
• Isolation
• 2 Simultaneous transactions cannot interfere one with the other.
(Executed as if executed sequentially)
• Durability
• Once a transaction is commited, it remains so indefinitely, even
after power loss or crash. (no caching)
Definition – ACID
28. ACID in Dist. Systems
• Works well in many (most?) large sites
• Proved problematic in very big sites
• How to guarantee ACID properties ?
• Atomicity requires more thought - e.g. two-phase commit
(and 3-phase commit, PAXOS…)
• Isolation requires to hold all of its locks for the entire
transaction duration - High Lock Contention !
• Complex
• Prone to failure - algorithm should handle
• Failure = outage during write.
• Comes with High overhead commits.
29. ACID in Dist. Systems
• Ensuring ACID properties comes with
Performance Overhead
• Sometime it‟s mandatory, in order not to
sacrifice data integrity.
• In very large scale sites – this adds up:
• Google : 0.5 sec in response = 20% decrease
in traffic
• Amazon: : 1ms decrease in response = 1%
drop in income
30. Back to CAP
• Vendors therefore came up with their own storage
solutions, e.g.
• Google BigTable (over Google File System)
• Amazon DynamoDB
• Facebook – hybrid (Cassandra, Hadoop)
• Twitter – move from MySQL to Cassandra
• These solutions, as a group, is dubbed as "NoSQL"; more
on this later...
• Common approach is to relax the consistency
requirements for higher availability (or latency)
• Sacrifice ACID-compliance in order to achieve higher
performance – this is according to CAP.
31. Relaxed Consistency?
Why do we need to give up consistency for
availability?
Can't we have both (at the same time)?
Let‟s look more deeply into CAP Theorem..
32.
33. CAP - Model
• Atomic Data Object
• There must exist a total order of operations s.t. each operation
looks as if it where completed at a single instant – equivalent
to as if they were executed on a single node.
• Available Data Object
• Every request receive by non-failing node must get a
response (any alg. used in service must eventually terminate)
• Partition Tolerance
• Both above should tolerate partitions
• Model partitions as messages delayed/lost in network
34. CAP – Theorem 1
It is impossible in the asynchronous network model to
implement a read/write data object that guarantees the
following properties:
• Availability
• Atomic Consistency
in all fair executions (including those in which messages
are lost)
Asynchronous, i.e. there is no clock, nodes make decisions based
only on the messages received and local computation.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45. CAP – Corollary 1.1
It is impossible in the asynchronous network model to
implement a read/write data object that guarantees the
following properties:
• Availability, in all fair executions
• Atomic Consistency, in fair executions in which no
messages are lost.
Intuition
• In asynchronous model the alg. doesn„t know if messages are lost
• Thus, there is no difference between this definition to theorem 1 and
if such alg. exist it contradicts theorem 1.
46. CAP – Theorem 2
It is impossible in the partially asynchronous network
model to implement a read/write data object that
guarantees the following properties:
• Availability
• Atomic Consistency
in all fair executions (including those in which messages
are lost)
Partially synchronous, i.e. every node has a clock, and all clocks
increase at the same rate. However, they are not synchronized.
49. Side note - context
Does this looks somewhat familiar?
• Several Processes
• Asynchronous communication
• Some processes may fail (partitioned)
Impossibility of Distributed Consensus
with One Faulty Process (FLP, 1985)
50. Side note - context
CAP ~ Impossibility of guaranteeing both
safety and liveness in an unreliable
distributed system:
Consistency ~ safety - every response sent
to a client is correct
Availability ~ liveness - every request
eventually receives a response
51. Side note - context
Actually - this is similar to (a case of)
consensus in an asynchronous system with
faulty processes ( impossible).
Consensus is more difficult to meet than the
requirements of CAP - achieving agreement
is (provably) harder than achieving CAP‟s
consistency requirement.
CAP also implies that it is impossible to
achieve consensus in a system subject to
partitions.
52. Side note - context
Criticism: The reduction is not one-to-one
Requirement/definition in CAP for availability is
slightly different from the fail-stop assumption
in FLP - Failed nodes still participate.
Read more here:
http://the-paper-trail.org/blog/flp-and-cap-arent-the-same-thing/
(It is important and practical to know these "theoretical" problems - Take the distributed algorithms
course next year!)
53.
54.
55.
56.
57. Eventual Consistency - BASE
Along with the CAP conjuncture, Brewer suggested a new
consistency model - BASE (Basically Available, Soft state, Eventual consistency)
• BASE model gives up on Consistency from the CAP
Theorem.
• This model is optimistic and accepts eventual
consistency, in contrast to ACID.
o Given enough time, all nodes will be consistent and
every request will result with same responses.
• Brewer points out that ACID and BASE are two
extremes and one can have a range of options in
choosing the balance between consistency and
availability. (consistency models).
58. Eventual Consistency & BASE
• Basically Available - the system does guarantee
availability, in terms of the CAP theorem. It is always
available, but subsets of data may become unavailable
for short periods of time.
• Soft state - State of system may change over time, even
without input. Data does not have to be consistent.
• Eventual Consistency - System will become consistent
eventually in the future. ACID, on the contrary, enforces
consistency immediately after any operation.
59.
60. CAP Implications/Perspectives
CAP is very prominent in discussion over
the development of large, distributed
systems.
A new “eco-system” for “CAP aware”
solutions is available and made common
by the increase in massive web services.
Over the years since introduced there is
some criticism regarding the theorem.
61.
62. Cannot omit Partition-Tolerance
You can‟t really choose 2 out of 3:
For a distributed (i.e., multi-node) system to
not require partition-tolerance it would
have to run on a network which is
guaranteed to never drop messages (or
even deliver them late) and whose nodes
are guaranteed to never die. We do not
work with these types of systems - simply
because they don't exist.
63. CAP - Revisited
CAP was devised and proved relatively early to the
prevalence of systems that are affected by it
Brewer - "CAP Twelve Years Later: How the “Rules” Have
Changed" (2012)(reference)
o Discusses misconceptions, suggests different models
of consistency [3]
Gilbert, Lynch - "Perspectives on the CAP Theorem"
(2012)(reference)
o Revisit proof concepts and discuss practical
implications [4]
64. CAP - Revisited
• Formal model too restrictive
o In prove compared to conjecture.
o E.g. relax time constraints (here)
• Partitions are guaranteed to happen:
o Maybe call it PAC (here)
o Discussion (here)
• Ignoring latency
o Maybe call it CAPELC: CAP or Else
Latency/Consistency (here)
65. CAP in Practice
• CAP implications changes way of
thinking/developing distributed systems
• When designing/developing such a
system one should be aware of CAP‟s
considerations
• We will now explore practical examples
and techniques
66. Give up Scale
Develop as usual with ACID
Restricts the growth options
This is more a business/design decision
But some really don't need it (Small
businesses that with relatively small data
or limited number of users)
Not so interesting for this discussion so we
will continue.
67.
68. NoSQL - Give up Consistency
• Coined in 1998 by Carlo Strozzi for his RDBMS that
does not use the standard SQL interface
• Usually, gives up consistency to achieve availability
o Does not support joins (are expensive)
o No constraints (PK-FK) (related to joins)
o Denormalization
• Re-coined by a Rackspace employee in 2009 to label
all data stores that do not provide ACID.
• Started by internet giants (Google, Amazon etc) and
later released as open source
• Many, many variants...
o see http://nosql-database.org/ for a list.
69. NoSQL
• Not Only SQL (initially strictly NO but...)
• Sacrifices ACID-Compliance in order to achieve
higher performance (Use BASE)
o Maintains eventual consistency instead.
• Distributed and fault tolerant
• Scalable, redundancy on other servers (If one fails
we can recover)
• Usually scales horizontally and manages big
amounts of data.
• Used when performance and real-time-ness is
more important than consistency of data.
70. NoSQL
• Does not use SQL - Data does not necessarily
follow a schema - it is partitioned among many
nodes. We cannot do join operations.
• Optimized for retrieval and appending. Usually
works in key-value record storage.
• Useful when working with a lot of data, that does
not require following the ACID model.
o Maybe not the best idea for your next banking
application.
71. NoSQL - Taxonomy
• Key-value: store key-value pairs. Values can be list etc.
• Column-oriented (or column family): key value with
subgroups of "columns" within a value that can be
retrieved as a block
• Document-oriented: store structured documents
(JSON, XML)
• Graph Database: model around nodes and edges with
attributes on both.
See more/comparison:
http://www.infoq.com/presentations/Polyglot-Persistence-
for-Java-Developers
72. NoSQL – Key-Value
• Support Simple Operations
o get
o put
o delete
• Operations based on (access by) a primary key.
• Due to consistency model (eventual) you may have
duplicates etc.
• Very fast
• Examples:
o DynamoDB (Amazon)
o Berkeley DB
o Voldemort
o Many others...
73. NoSQL - Column Oriented
• Column-oriented systems still use tables but have no
joins (joins must be handled within your application).
Obviously, they store data by column as opposed to
traditional row-oriented databases. This makes
aggregations much easier.
• Examples:
o Hadoop/HBase (Apache)
o Cassandra (Apache)
o SimpleDB
o BigTable (Google)
• MapReduce used for retrieving (map) and aggregating
(reduce) data.
74. NoSQL - Document Oriented
• Document-oriented systems store structured
"documents" such as JSON or XML but have no joins
(joins must be handled within your application). It's very
easy to map data from object-oriented software to these
systems.
• Query the document with relatively familiar syntax (same
as the document syntax)
• Isolation at document level but not between documents
o Can have a transaction on a document
o Not so easy for many documents
• Examples:
◦CouchDb
◦MongoDb
RavenDb (.NET)
75. NoSQL - Graph
• Model Nodes and Relations
o Node may have associated attributes
o Node as relations to other nodes
o Relations may have attributes as well
• Replace joins with relationships
o No more many-many tables = performance
• Very useful and natural for networks: Social,
Communication, Biology
• Many times natively supports common graph operations
o Short path
o Diameter etc
• Examples: Neo4J...
76.
77. NoSQL - Future
Work on UnQL (unstructured) has begun - allows
querying NoSQL DB's.
• Queries collections instead of tables.
• Queries documents instead of rows.
• Superset of SQL (in SQL queries return same
fields)
• Does not support Data Definition (DDL) : CREATE
TABLE etc. (but many times these stores are
schema-less)
78. NoSQL - Summary
• New classes of databases that are based on example
work from internet giants
• Usually relaxes consistency
o but not always - read your manual
• Not as trivial as using SQL
o Transaction support may be limited and restricted in
scope
o No joins - need to care for it yourself
o Design around questions and not "model"
E.g. aggregated keys
• Implement various consistency models (coming up)
• And be careful with it !
79. Consistency Models
• Give up (some) consistency in response to CAP
• Several models that may fit different usage
scenarios
• Combine between models in application
o Catalog with eventual consistency
o Checkout/register with strong consistency
• Next, look at some of the variants
• We look from the client side (programmer)
based on discussion in http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
80. Consistency Models - Examples
Causal consistency.
• cause (and effect, not casual)
• If process A has communicated to process B that it has
updated a data item, a subsequent access by process B
will return the updated value, and a write is guaranteed
to supersede the earlier write.
o Processes A and B are causally related
• Access by process C that has no causal relationship to
process A is subject to the normal eventual consistency
rules.
o A and C have not relation
81. Read-your-writes consistency.
This is an important model where process A, after it has
updated a data item, always accesses the updated value
and will never see an older value. This is a special case
of the causal consistency model. (A is causal to itself)
Session consistency.
Practical version of the previous model
Within a session the system guarantees read-your-writes
consistency (use cache on server)
When session terminates data is stored.
Guarantees do not overlap the sessions
Consistency Models - Examples
82. Monotonic read consistency.
If a process has seen a particular value for the object, any subsequent
accesses will never return any previous values.
Good when data relatively static, even use local cache
Monotonic write consistency.
In this case the system guarantees to serialize the writes by the same
process. Systems that do not guarantee this level of consistency are
notoriously hard to program.
Consistency Models
83. Segmentation - Smart
Partitioning
• No single uniform requirement
o some aspects require strong consistency
o others high availability.
• Segmentation to component an approach to
circumventing CAP
o each provide different types of guarantees.
• Overall guarantees neither consistency nor availability, yet
ultimately each part of the service provides exactly what is needed.
• Can be partitioned along various dimensions.
o guarantees not always clear
o specific to the given application and the particular partitioning scheme.
o Thus, difficult but maybe necessary
84. Partitioning - Examples
• Data partitioning
• Operational partitioning
• Functional partitioning
• User partitioning
• Hierarchical partitioning
85. Partitioning - Examples
Data partitioning
• Different data may require different consistency
and availability
• Example:
o Shopping cart - high availability, responsive, can
sometimes suffer anomalies
o Product information need to be available, slight
variation in inventory is sufferale
o Checkout, billing, shopping records must be
consistent...
86. Partitioning - Examples
Operational partitioning
• Each operation may require different balance
between consistency and availability
• Example
o Reads - high availability
o Writes - high consistency - lock when writing
87. Partitioning - Examples
Hierarchical partitioning
• Large global services with local "extensions"
• Different location in hierarchy may use different
consistency
• Example
o Local servers (better connected) guarantee more
consistency and availability
o Global servers has more partition and thus relax one
of the requiements
88. Partitioning - Examples
User partitioning
• Try to keep related data close together to
assure better performance
• Minimize partitioning and thus get better
consistency and availability
• Less consistency between non-related data
• E.g. keep cluster of Facebook user together.
89. Partitioning - Examples
Functional partitioning
• System consists of sub-services
• Different sub-services provide balance
according to requirements
• The composition (whole system) is not always
available and consistent but each part is
assured to work well.
90. Best-effort availability
• This means sacrificing availability for consistency
• Still optimize to give as much availability as
possible
• Makes more sense when the network is more
reliable
91. Summary
• Introduced CAP
• What it is, what it is made of and to where it applies
• Explore its properties and Implications
• You can‟t have it all
• You can‟t really give up on partition tolerance
• Saw some ways systems are designed around
this concept in order to achieve their (business)
goals
92. Summary
• The last point is important – we have to understand
the limitations we face and see how to achieve the
requirements of the system while taking these
limitations into account
• Must be another consideration in the design phase
• Technical
• But also, functional/business decision
93. So, what to use?
“And remember also, most people are not building
facebook, they are building reservation systems, tracking
systems, HR systems, finance systems, order entry
systems, banking systems, etc - things where transactions
are sort of important (lose my status update - no big deal,
lose my $100 transfer and I'm sort of mad). There is room
for a lot of things out there.”
(from AskTom answer)
95. Bibliography
1) Eric A Brewer, Toward Robust Distributed Systems, PODC 2000
2) Seth Gilbert, Nancy Lynch, Brewer’s Conjecture and the
Feasibility of Consistent, Available, Partition-Tolerant Web
Services, 2002
3) Eric A Brewer, CAP Twelve Years Later - How the rules have
changed, IEEE Computer Society Computer Magazine, Feb. 2012
4) Seth Gilbert, Nancy Lynch, Perspectives on the CAP
Theorem, IEEE Computer Society Computer Magazine, Feb. 2012
5) Werner Vogels (CTO, Amazon), Eventually Consistent –
Revisited, All Things Distributed blog, December 2008
6) Ivan Giangreco, CAP Theorem Talk, University Of Basel, Fall
2010 (part of Distributed Information System course)
7) Kaushik Sathupadi, A plain English introduction to CAP Theorem
8) Google : CAP Theorem
Notas del editor
Atoimc Data Object=תחשבו על המערכת כ-node אחד שהמשתמש עובד מולו – בשבילו זה שקוף והוא רוצה שהמערכת תתנהג בהתאם כאילו הוא על node יחיד.
including those in which messages are lost = Partition Tolerance
הרעיון הוא שברשת א-סינכרונית, תהליך\אלגוריתם לא יכול לדעת האם הודעה הגיע לצד השני (אין חסם) והוא חייב לדעת להמשיך ללא הידע הזה. אם יש אלגוריתם שפועל כמו שצריך בסביבה כזאת ניתן לטעון שאין ממש הבדל בין המצב הזה לבין המצב בו הודעות יכולות ללכת לאיבוד – ואם קיים אלגוריתם כזה אז הוא יכול לעבוד גם המקרה השני וזה בניגוד לתאוריה 1.In an async network, process cannot tell if a message reached another proces – and must continue without this knowledge.If an algorithm can work in this environment, it will also work in an environment in which messages are lost – since it cannot differentiate between the 2 conditions.אנחנו לא יודעים אם הודעות נעלמות או לא כי אנחנו אסינכרונים – אם היינו יודעים אז לא היינו אסינכרונים
Skipיש שעון פנימי מסוכנרן – ניתן לעשות timeout וכו – פה אם הודעות לא נעבדות אז כן ניתן לממש מערכת שכזאת.