SlideShare una empresa de Scribd logo
1 de 64
Descargar para leer sin conexión
2016−09−08
Clock Skew, and other annoying realities in
distributed systems
Donny Nadolny
donny@pagerduty.com
#CassandraSummit
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS 2016−09−08
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Probably not:
• user tracking / metrics
• hit counter / impressions
• log data
Should I Care?
Yes:
• incident management (PagerDuty)
• financial info / banking / stocks
• online store
2016−09−08
Probably not:
• user tracking / metrics
• hit counter / impressions
• log data
Individual data is low impact
Yes:
• incident management (PagerDuty)
• financial info / banking / stocks
• online store
Individual data is high impact
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Should I Care?
9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC
Introduction to Reads & Writes
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Cluster: 5 nodes
• Replication factor: 3
• Consistency: QUORUM
Cassandra Write
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
write
foo
write
foo
write foo
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
value: foo
write
foo
write
foo
write foo
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
value: foo
value: foo
write
foo
write
foo
write foo
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
Success
value: foo
value: foo
write
foo
write
foo
write foo
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
Success
value: foo
value: foo
write
foo
write
foo
write foo
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Read
SELECT * FROM table1 WHERE …
value: foo
value: foo
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Read
SELECT * FROM table1 WHERE …
value: foo
value: foo
read
read
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Read
SELECT * FROM table1 WHERE …
value: foo
value: foo
read
read
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Read
SELECT * FROM table1 WHERE …
value: foo
value: foo
read
read
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Read
SELECT * FROM table1 WHERE …
Success, value: foo
value: foo
value: foo
read
read
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Update
UPDATE table1 …
value: foo, t=5
value: foo, t=5
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Update
UPDATE table1 …
value: foo, t=5
write
bar, t=7
write
bar, t=7
write bar, t=7
value: foo, t=5
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Update
UPDATE table1 …
value: foo, t=5
value: bar,
t=7
write
bar, t=7
write
bar, t=7
write bar, t=7
value: foo, t=5
value: bar, t=7
9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC
Successful Write?
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Bank Example
t=5
savings: 10000, t=5
savings: 10000,
t=5
write
…
write
…
write …
t=2
INSERT INTO balances …
savings: 10000, t=5
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Bank Example savings: 10000, t=5
savings: 10000,
t=5
t=5
t=2
Success
INSERT INTO balances …
savings: 10000, t=5
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Withdraw 8,000 from ATM:
• Read current balance: 10,000
Bank Example savings: 10000, t=5
savings: 10000,
t=5
read
read
t=6
t=3
savings: 10000, t=5
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Withdraw 8,000 from ATM:
• Read current balance: 10,000
• Update to 2,000
Bank Example savings: 10000, t=5
savings: 2000, t=4
write …
w
rite
…
t=7
t=4
writesavings:2000,t=4
savings: 10000, t=5
savings: 2000, t=4
s: 10000, t=5
s: 2000, t=4
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Withdraw 8,000 from ATM:
• Read current balance: 10,000
• Update to 2,000
• Dispense 8,000 cash
Bank Example
Success
t=7
t=4
savings: 10000, t=5
savings: 2000, t=4
savings: 10000, t=5
savings: 2000, t=4
s: 10000, t=5
s: 2000, t=4
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• A successful write can really fail
• Your clocks are not perfectly synchronized
• “I’m running NTP, I’m good” - oh really?
Clock Skew
9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC
Failed Write?
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
INSERT INTO stock_trades …
trade 123: buy 100 BRKA
trade 123…
trade 123…
write
…
write
trade
123
…
write trade 123 …
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
INSERT INTO stock_trades …
trade 123: buy 100 BRKA
trade 123…
trade 123…
write
…
write
trade
123
…
write trade 123 …
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
Connection error
trade 123: buy 100 BRKA
trade 123…
trade 123…
write
…
write
trade
123
…
write trade 123 …
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
INSERT INTO stock_trades …
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
Connection Error
Write Timeout
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
INSERT INTO stock_trades …
trade 245: buy 100 BRKA
trade 245…
trade 245…
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write? trade 245: buy 100 BRKA
trade 245…
trade 245…
hints:
tell nodeA trade 123 …
tell nodeB trade 123 …
tell nodeC trade 123 …
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write? trade 245: buy 100 BRKA
trade 123: buy 100 BRKA
trade 245…
trade 123…
trade 245…
trade 123…
write
…
write
trade
123
…
write trade 123 …
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Full repair
• Read repair chance
• Hinted handoff
Eventual Consistency
9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC
Multiple Writes
aka “I wish I had transactions”
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Rule: minimum $10,000 end of day balance, monthly fee otherwise
Another Bank Example
2016−09−08
• Rule: minimum $10,000 end of day balance, monthly fee otherwise
Balance checker
for each user:
s = read savings
c = read checking
if s + c < 10000
mark user for monthly fee
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Another Bank Example
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Rule: minimum $10,000 end of day balance, monthly fee otherwise
Balance checker
for each user:
s = read savings
c = read checking
if s + c < 10000
mark user for monthly fee
Another Bank Example
Transfer money
amount = …
s = read savings
c = read checking
write_savings(s - amount)
write_checking(c + amount)
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Rule: minimum $10,000 end of day balance, monthly fee otherwise
Balance checker
for each user:
s = read savings
c = read checking
if s + c < 10000
mark user for monthly fee
Another Bank Example
Transfer money
amount = 5000
s = read savings //7000
c = read checking //6000
write_savings(2000)
write_checking(13000)
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Rule: minimum $10,000 end of day balance, monthly fee otherwise
Balance checker
for each user:
s = read savings //2000
c = read checking //6000
if s + c < 10000 //true
mark user for monthly fee
Another Bank Example
Transfer money
amount = 5000
s = read savings //7000
c = read checking //6000
write_savings(2000)
write_checking(11000)
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
1. “Window of vulnerability is small, hope it doesn’t happen”
• The client (your application) can crash
2. “Do the writes in reverse order”
• Works for balance checker, but allows overdrawing your account
3. “Use a lock!”
• The write can propagate out anyway
• How long will you hold the lock for a failed write?
Solutions?
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Writes to multiple columns in the same row (when issued at the
same time)
• Writes to multiple rows in one table that have the same partition
key (when issued at the same time)
Partition key: the primary key of a table, or the first part of the
primary key if it is a compound key
Isolation Guarantees in Cassandra
9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC
Atomic Batches
2016−09−08
https://en.wikipedia.org/wiki/Atomicity_(database_systems)
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Atomicity
“An atomic transaction is an indivisible and irreducible series of
database operations such that either all occur, or nothing occurs…
the transaction cannot be observed to be in progress by another
database client”
2016−09−08
https://en.wikipedia.org/wiki/Atomicity_(database_systems)
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Atomicity
“An atomic transaction is an indivisible and irreducible series of
database operations such that either all occur, or nothing occurs…
the transaction cannot be observed to be in progress by another
database client”
“An example of an atomic transaction is a monetary transfer
from bank account A to account B.”
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
Atomic Batch Write
2016−09−08
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Atomic Batch Write
write
batch
write
batch
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
Atomic Batch Write
write
batch
write
batch
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
Atomic Batch Write
write
table2
write
table1
writetable1
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
Atomic Batch Write
Success
write
table2
write
table1
writetable1
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
Atomic Batch Write
delete
batch
delete
batch
2016−09−08
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Atomic Batch Write
write
table1
writetable1
2016−09−08
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Atomic Batch Write
Connection
error
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
Atomic Batch Write
write
table2
writetable1
writetable1
9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC
Summary
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• No isolation - you can read partial results
• … even without any failures
Summary
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• No isolation - you can read partial results
• … even without any failures
• Atomic batches aren't really atomic
• also, you give up sequential ordering
Summary
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• No isolation - you can read partial results
• … even without any failures
• Atomic batches aren't really atomic
• also, you give up sequential ordering
• A write can say it failed but really it succeeded
• or it didn’t yet, but will hours later
Summary
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• No isolation - you can read partial results
• … even without any failures
• Atomic batches aren't really atomic
• also, you give up sequential ordering
• A write can say it failed but really it succeeded
• or it didn’t yet, but will hours later
• A write can say it succeeded but really it failed
• :(
Summary
2016−09−08
Questions?
donny@pagerduty.com
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Idempotency - useful overall in distributed systems
• Avoid modifying data
• Critical deletes get a new delete column written + row delete
• Truly mutable data can be written to a new column (incrementing a
version number in the column name)
• Monitor ntp
• Distributed locks with ZooKeeper and a sleep(100) before release
• Think hard about ordering & partial failure
• Test by adding “if (rng < …) exit or sleep” in between various writes
How do you deal with it?

Más contenido relacionado

La actualidad más candente

PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
DataStax
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
Edward Capriolo
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandra
Axel Liljencrantz
 

La actualidad más candente (20)

Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japan
 
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
 
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
 
Beginning Operations: 7 Deadly Sins for Apache Cassandra Ops
Beginning Operations: 7 Deadly Sins for Apache Cassandra OpsBeginning Operations: 7 Deadly Sins for Apache Cassandra Ops
Beginning Operations: 7 Deadly Sins for Apache Cassandra Ops
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandra
 
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
 
Micro-batching: High-performance writes
Micro-batching: High-performance writesMicro-batching: High-performance writes
Micro-batching: High-performance writes
 
Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...
Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...
Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyond
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and Spark
 

Destacado

The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
DataStax
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
DataStax
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
DataStax
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
DataStax
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
DataStax
 

Destacado (8)

The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
PagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresPagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra Failures
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 

Similar a Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalyticsconf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
Tom LaGatta
 

Similar a Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 (20)

How blockchain technology could revolutionise the insurance industry
How blockchain technology could revolutionise the insurance industryHow blockchain technology could revolutionise the insurance industry
How blockchain technology could revolutionise the insurance industry
 
Mining AWR V2 - Trend Analysis
Mining AWR V2 - Trend AnalysisMining AWR V2 - Trend Analysis
Mining AWR V2 - Trend Analysis
 
Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
 
Big Data and OpenStack, a Love Story: Michael Still, Rackspace
Big Data and OpenStack, a Love Story: Michael Still, RackspaceBig Data and OpenStack, a Love Story: Michael Still, Rackspace
Big Data and OpenStack, a Love Story: Michael Still, Rackspace
 
World of Oracle Eloqua Reporting
World of Oracle Eloqua ReportingWorld of Oracle Eloqua Reporting
World of Oracle Eloqua Reporting
 
What's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at CiscoWhat's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at Cisco
 
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalyticsconf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
 
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 
Julian Hyde - Streaming SQL
Julian Hyde - Streaming SQLJulian Hyde - Streaming SQL
Julian Hyde - Streaming SQL
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
From Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation
From Cardinal(ity) Sins to Cost-Efficient Metrics AggregationFrom Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation
From Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation
 
KSQL: The Streaming SQL Engine for Apache Kafka
KSQL: The Streaming SQL Engine for Apache KafkaKSQL: The Streaming SQL Engine for Apache Kafka
KSQL: The Streaming SQL Engine for Apache Kafka
 

Más de DataStax

Más de DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Último

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Último (20)

Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 

Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

  • 1. 2016−09−08 Clock Skew, and other annoying realities in distributed systems Donny Nadolny donny@pagerduty.com #CassandraSummit
  • 2. CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS 2016−09−08
  • 3. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Probably not: • user tracking / metrics • hit counter / impressions • log data Should I Care? Yes: • incident management (PagerDuty) • financial info / banking / stocks • online store
  • 4. 2016−09−08 Probably not: • user tracking / metrics • hit counter / impressions • log data Individual data is low impact Yes: • incident management (PagerDuty) • financial info / banking / stocks • online store Individual data is high impact CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Should I Care?
  • 5. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Introduction to Reads & Writes
  • 6. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Cluster: 5 nodes • Replication factor: 3 • Consistency: QUORUM Cassandra Write
  • 7. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 …
  • 8. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 … write foo write foo write foo
  • 9. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 … value: foo write foo write foo write foo
  • 10. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 … value: foo value: foo write foo write foo write foo
  • 11. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 … Success value: foo value: foo write foo write foo write foo
  • 12. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 … Success value: foo value: foo write foo write foo write foo
  • 13. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Read SELECT * FROM table1 WHERE … value: foo value: foo
  • 14. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Read SELECT * FROM table1 WHERE … value: foo value: foo read read
  • 15. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Read SELECT * FROM table1 WHERE … value: foo value: foo read read
  • 16. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Read SELECT * FROM table1 WHERE … value: foo value: foo read read
  • 17. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Read SELECT * FROM table1 WHERE … Success, value: foo value: foo value: foo read read
  • 18. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Update UPDATE table1 … value: foo, t=5 value: foo, t=5
  • 19. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Update UPDATE table1 … value: foo, t=5 write bar, t=7 write bar, t=7 write bar, t=7 value: foo, t=5
  • 20. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Update UPDATE table1 … value: foo, t=5 value: bar, t=7 write bar, t=7 write bar, t=7 write bar, t=7 value: foo, t=5 value: bar, t=7
  • 21. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Successful Write?
  • 22. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Bank Example t=5 savings: 10000, t=5 savings: 10000, t=5 write … write … write … t=2 INSERT INTO balances … savings: 10000, t=5
  • 23. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Bank Example savings: 10000, t=5 savings: 10000, t=5 t=5 t=2 Success INSERT INTO balances … savings: 10000, t=5
  • 24. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Withdraw 8,000 from ATM: • Read current balance: 10,000 Bank Example savings: 10000, t=5 savings: 10000, t=5 read read t=6 t=3 savings: 10000, t=5
  • 25. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Withdraw 8,000 from ATM: • Read current balance: 10,000 • Update to 2,000 Bank Example savings: 10000, t=5 savings: 2000, t=4 write … w rite … t=7 t=4 writesavings:2000,t=4 savings: 10000, t=5 savings: 2000, t=4 s: 10000, t=5 s: 2000, t=4
  • 26. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Withdraw 8,000 from ATM: • Read current balance: 10,000 • Update to 2,000 • Dispense 8,000 cash Bank Example Success t=7 t=4 savings: 10000, t=5 savings: 2000, t=4 savings: 10000, t=5 savings: 2000, t=4 s: 10000, t=5 s: 2000, t=4
  • 27. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • A successful write can really fail • Your clocks are not perfectly synchronized • “I’m running NTP, I’m good” - oh really? Clock Skew
  • 28. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Failed Write?
  • 29. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? INSERT INTO stock_trades … trade 123: buy 100 BRKA trade 123… trade 123… write … write trade 123 … write trade 123 …
  • 30. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? INSERT INTO stock_trades … trade 123: buy 100 BRKA trade 123… trade 123… write … write trade 123 … write trade 123 …
  • 31. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? Connection error trade 123: buy 100 BRKA trade 123… trade 123… write … write trade 123 … write trade 123 …
  • 32. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? INSERT INTO stock_trades …
  • 33. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? Connection Error Write Timeout
  • 34. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? INSERT INTO stock_trades … trade 245: buy 100 BRKA trade 245… trade 245…
  • 35. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? trade 245: buy 100 BRKA trade 245… trade 245… hints: tell nodeA trade 123 … tell nodeB trade 123 … tell nodeC trade 123 …
  • 36. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? trade 245: buy 100 BRKA trade 123: buy 100 BRKA trade 245… trade 123… trade 245… trade 123… write … write trade 123 … write trade 123 …
  • 37. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Full repair • Read repair chance • Hinted handoff Eventual Consistency
  • 38. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Multiple Writes aka “I wish I had transactions”
  • 39. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Rule: minimum $10,000 end of day balance, monthly fee otherwise Another Bank Example
  • 40. 2016−09−08 • Rule: minimum $10,000 end of day balance, monthly fee otherwise Balance checker for each user: s = read savings c = read checking if s + c < 10000 mark user for monthly fee CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Another Bank Example
  • 41. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Rule: minimum $10,000 end of day balance, monthly fee otherwise Balance checker for each user: s = read savings c = read checking if s + c < 10000 mark user for monthly fee Another Bank Example Transfer money amount = … s = read savings c = read checking write_savings(s - amount) write_checking(c + amount)
  • 42. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Rule: minimum $10,000 end of day balance, monthly fee otherwise Balance checker for each user: s = read savings c = read checking if s + c < 10000 mark user for monthly fee Another Bank Example Transfer money amount = 5000 s = read savings //7000 c = read checking //6000 write_savings(2000) write_checking(13000)
  • 43. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Rule: minimum $10,000 end of day balance, monthly fee otherwise Balance checker for each user: s = read savings //2000 c = read checking //6000 if s + c < 10000 //true mark user for monthly fee Another Bank Example Transfer money amount = 5000 s = read savings //7000 c = read checking //6000 write_savings(2000) write_checking(11000)
  • 44. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS 1. “Window of vulnerability is small, hope it doesn’t happen” • The client (your application) can crash 2. “Do the writes in reverse order” • Works for balance checker, but allows overdrawing your account 3. “Use a lock!” • The write can propagate out anyway • How long will you hold the lock for a failed write? Solutions?
  • 45. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Writes to multiple columns in the same row (when issued at the same time) • Writes to multiple rows in one table that have the same partition key (when issued at the same time) Partition key: the primary key of a table, or the first part of the primary key if it is a compound key Isolation Guarantees in Cassandra
  • 46. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Atomic Batches
  • 47. 2016−09−08 https://en.wikipedia.org/wiki/Atomicity_(database_systems) CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Atomicity “An atomic transaction is an indivisible and irreducible series of database operations such that either all occur, or nothing occurs… the transaction cannot be observed to be in progress by another database client”
  • 48. 2016−09−08 https://en.wikipedia.org/wiki/Atomicity_(database_systems) CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Atomicity “An atomic transaction is an indivisible and irreducible series of database operations such that either all occur, or nothing occurs… the transaction cannot be observed to be in progress by another database client” “An example of an atomic transaction is a monetary transfer from bank account A to account B.”
  • 49. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write
  • 50. 2016−09−08 BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Atomic Batch Write write batch write batch
  • 51. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write write batch write batch
  • 52. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write write table2 write table1 writetable1
  • 53. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write Success write table2 write table1 writetable1
  • 54. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write delete batch delete batch
  • 55. 2016−09−08 BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Atomic Batch Write write table1 writetable1
  • 56. 2016−09−08 BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Atomic Batch Write Connection error
  • 57. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write write table2 writetable1 writetable1
  • 58. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Summary
  • 59. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • No isolation - you can read partial results • … even without any failures Summary
  • 60. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • No isolation - you can read partial results • … even without any failures • Atomic batches aren't really atomic • also, you give up sequential ordering Summary
  • 61. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • No isolation - you can read partial results • … even without any failures • Atomic batches aren't really atomic • also, you give up sequential ordering • A write can say it failed but really it succeeded • or it didn’t yet, but will hours later Summary
  • 62. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • No isolation - you can read partial results • … even without any failures • Atomic batches aren't really atomic • also, you give up sequential ordering • A write can say it failed but really it succeeded • or it didn’t yet, but will hours later • A write can say it succeeded but really it failed • :( Summary
  • 64. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Idempotency - useful overall in distributed systems • Avoid modifying data • Critical deletes get a new delete column written + row delete • Truly mutable data can be written to a new column (incrementing a version number in the column name) • Monitor ntp • Distributed locks with ZooKeeper and a sleep(100) before release • Think hard about ordering & partial failure • Test by adding “if (rng < …) exit or sleep” in between various writes How do you deal with it?