You write with QUORUM, you read with QUORUM. You're safe, right?
Although it may seem that way, you could read a different value than the one you wrote - even if nobody else wrote after you. One way this can happen is if the time on the machines in your cluster is not synchronized closely enough. This is called clock skew, and is just one of the ways you'll see that this anomaly can occur.
In this talk we'll dive in to how Cassandra handles conflicting data, walk through several weird and seemingly impossible situations that can happen (both with and without clock skew), and see what we can do to work around them.
About the Speaker
Donny Nadolny Senior Developer, PagerDuty
Donny Nadolny is a Scala developer at PagerDuty, working on improving the reliability of their backend systems. He spends a large amount of time investigating problems experienced with distributed systems like Cassandra and ZooKeeper.
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016
1. 2016−09−08
Clock Skew, and other annoying realities in
distributed systems
Donny Nadolny
donny@pagerduty.com
#CassandraSummit
2. CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS 2016−09−08
3. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Probably not:
• user tracking / metrics
• hit counter / impressions
• log data
Should I Care?
Yes:
• incident management (PagerDuty)
• financial info / banking / stocks
• online store
4. 2016−09−08
Probably not:
• user tracking / metrics
• hit counter / impressions
• log data
Individual data is low impact
Yes:
• incident management (PagerDuty)
• financial info / banking / stocks
• online store
Individual data is high impact
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Should I Care?
6. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Cluster: 5 nodes
• Replication factor: 3
• Consistency: QUORUM
Cassandra Write
7. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
8. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
write
foo
write
foo
write foo
9. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
value: foo
write
foo
write
foo
write foo
10. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
value: foo
value: foo
write
foo
write
foo
write foo
11. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
Success
value: foo
value: foo
write
foo
write
foo
write foo
12. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
Success
value: foo
value: foo
write
foo
write
foo
write foo
13. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Read
SELECT * FROM table1 WHERE …
value: foo
value: foo
14. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Read
SELECT * FROM table1 WHERE …
value: foo
value: foo
read
read
15. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Read
SELECT * FROM table1 WHERE …
value: foo
value: foo
read
read
16. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Read
SELECT * FROM table1 WHERE …
value: foo
value: foo
read
read
17. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Read
SELECT * FROM table1 WHERE …
Success, value: foo
value: foo
value: foo
read
read
18. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Update
UPDATE table1 …
value: foo, t=5
value: foo, t=5
19. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Update
UPDATE table1 …
value: foo, t=5
write
bar, t=7
write
bar, t=7
write bar, t=7
value: foo, t=5
20. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Update
UPDATE table1 …
value: foo, t=5
value: bar,
t=7
write
bar, t=7
write
bar, t=7
write bar, t=7
value: foo, t=5
value: bar, t=7
22. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Bank Example
t=5
savings: 10000, t=5
savings: 10000,
t=5
write
…
write
…
write …
t=2
INSERT INTO balances …
savings: 10000, t=5
23. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Bank Example savings: 10000, t=5
savings: 10000,
t=5
t=5
t=2
Success
INSERT INTO balances …
savings: 10000, t=5
24. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Withdraw 8,000 from ATM:
• Read current balance: 10,000
Bank Example savings: 10000, t=5
savings: 10000,
t=5
read
read
t=6
t=3
savings: 10000, t=5
25. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Withdraw 8,000 from ATM:
• Read current balance: 10,000
• Update to 2,000
Bank Example savings: 10000, t=5
savings: 2000, t=4
write …
w
rite
…
t=7
t=4
writesavings:2000,t=4
savings: 10000, t=5
savings: 2000, t=4
s: 10000, t=5
s: 2000, t=4
26. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Withdraw 8,000 from ATM:
• Read current balance: 10,000
• Update to 2,000
• Dispense 8,000 cash
Bank Example
Success
t=7
t=4
savings: 10000, t=5
savings: 2000, t=4
savings: 10000, t=5
savings: 2000, t=4
s: 10000, t=5
s: 2000, t=4
27. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• A successful write can really fail
• Your clocks are not perfectly synchronized
• “I’m running NTP, I’m good” - oh really?
Clock Skew
29. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
INSERT INTO stock_trades …
trade 123: buy 100 BRKA
trade 123…
trade 123…
write
…
write
trade
123
…
write trade 123 …
30. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
INSERT INTO stock_trades …
trade 123: buy 100 BRKA
trade 123…
trade 123…
write
…
write
trade
123
…
write trade 123 …
31. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
Connection error
trade 123: buy 100 BRKA
trade 123…
trade 123…
write
…
write
trade
123
…
write trade 123 …
32. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
INSERT INTO stock_trades …
33. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
Connection Error
Write Timeout
34. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
INSERT INTO stock_trades …
trade 245: buy 100 BRKA
trade 245…
trade 245…
35. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write? trade 245: buy 100 BRKA
trade 245…
trade 245…
hints:
tell nodeA trade 123 …
tell nodeB trade 123 …
tell nodeC trade 123 …
36. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write? trade 245: buy 100 BRKA
trade 123: buy 100 BRKA
trade 245…
trade 123…
trade 245…
trade 123…
write
…
write
trade
123
…
write trade 123 …
37. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Full repair
• Read repair chance
• Hinted handoff
Eventual Consistency
39. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Rule: minimum $10,000 end of day balance, monthly fee otherwise
Another Bank Example
40. 2016−09−08
• Rule: minimum $10,000 end of day balance, monthly fee otherwise
Balance checker
for each user:
s = read savings
c = read checking
if s + c < 10000
mark user for monthly fee
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Another Bank Example
41. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Rule: minimum $10,000 end of day balance, monthly fee otherwise
Balance checker
for each user:
s = read savings
c = read checking
if s + c < 10000
mark user for monthly fee
Another Bank Example
Transfer money
amount = …
s = read savings
c = read checking
write_savings(s - amount)
write_checking(c + amount)
42. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Rule: minimum $10,000 end of day balance, monthly fee otherwise
Balance checker
for each user:
s = read savings
c = read checking
if s + c < 10000
mark user for monthly fee
Another Bank Example
Transfer money
amount = 5000
s = read savings //7000
c = read checking //6000
write_savings(2000)
write_checking(13000)
43. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Rule: minimum $10,000 end of day balance, monthly fee otherwise
Balance checker
for each user:
s = read savings //2000
c = read checking //6000
if s + c < 10000 //true
mark user for monthly fee
Another Bank Example
Transfer money
amount = 5000
s = read savings //7000
c = read checking //6000
write_savings(2000)
write_checking(11000)
44. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
1. “Window of vulnerability is small, hope it doesn’t happen”
• The client (your application) can crash
2. “Do the writes in reverse order”
• Works for balance checker, but allows overdrawing your account
3. “Use a lock!”
• The write can propagate out anyway
• How long will you hold the lock for a failed write?
Solutions?
45. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Writes to multiple columns in the same row (when issued at the
same time)
• Writes to multiple rows in one table that have the same partition
key (when issued at the same time)
Partition key: the primary key of a table, or the first part of the
primary key if it is a compound key
Isolation Guarantees in Cassandra
47. 2016−09−08
https://en.wikipedia.org/wiki/Atomicity_(database_systems)
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Atomicity
“An atomic transaction is an indivisible and irreducible series of
database operations such that either all occur, or nothing occurs…
the transaction cannot be observed to be in progress by another
database client”
48. 2016−09−08
https://en.wikipedia.org/wiki/Atomicity_(database_systems)
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Atomicity
“An atomic transaction is an indivisible and irreducible series of
database operations such that either all occur, or nothing occurs…
the transaction cannot be observed to be in progress by another
database client”
“An example of an atomic transaction is a monetary transfer
from bank account A to account B.”
49. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
Atomic Batch Write
50. 2016−09−08
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Atomic Batch Write
write
batch
write
batch
51. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
Atomic Batch Write
write
batch
write
batch
52. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
Atomic Batch Write
write
table2
write
table1
writetable1
53. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
Atomic Batch Write
Success
write
table2
write
table1
writetable1
54. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
Atomic Batch Write
delete
batch
delete
batch
55. 2016−09−08
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Atomic Batch Write
write
table1
writetable1
56. 2016−09−08
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Atomic Batch Write
Connection
error
57. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
Atomic Batch Write
write
table2
writetable1
writetable1
59. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• No isolation - you can read partial results
• … even without any failures
Summary
60. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• No isolation - you can read partial results
• … even without any failures
• Atomic batches aren't really atomic
• also, you give up sequential ordering
Summary
61. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• No isolation - you can read partial results
• … even without any failures
• Atomic batches aren't really atomic
• also, you give up sequential ordering
• A write can say it failed but really it succeeded
• or it didn’t yet, but will hours later
Summary
62. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• No isolation - you can read partial results
• … even without any failures
• Atomic batches aren't really atomic
• also, you give up sequential ordering
• A write can say it failed but really it succeeded
• or it didn’t yet, but will hours later
• A write can say it succeeded but really it failed
• :(
Summary
64. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Idempotency - useful overall in distributed systems
• Avoid modifying data
• Critical deletes get a new delete column written + row delete
• Truly mutable data can be written to a new column (incrementing a
version number in the column name)
• Monitor ntp
• Distributed locks with ZooKeeper and a sleep(100) before release
• Think hard about ordering & partial failure
• Test by adding “if (rng < …) exit or sleep” in between various writes
How do you deal with it?