SlideShare una empresa de Scribd logo
1 de 75
Descargar para leer sin conexión
© DataStax, All Rights Reserved.
Cassandra and the Cloud (2018 edition)
Jonathan Ellis
© DataStax, All Rights Reserved.
“Databases”
© DataStax, All Rights Reserved.
Stuff everyone agrees on
© DataStax, All Rights Reserved.
Stuff (Almost) Everyone Agrees On
1. Eventual Consistency is useful
© DataStax, All Rights Reserved.
Eventual Consistency (CP edition)
© DataStax, All Rights Reserved.
Eventual Consistency (AP edition)
© DataStax, All Rights Reserved.
Default consistency levels
1. Cassandra: Eventual
2. Dynamo: Eventual
3. CosmosDB: Eventual (“Session”)
4. Spanner: ACID (no EC)
© DataStax, All Rights Reserved.
Stuff (Almost) Everyone Agrees On
1. Eventual Consistency is useful
2. Automatic partitioning doesn’t work
© DataStax, All Rights Reserved.
H-Store (2012)
© DataStax, All Rights Reserved.
Partitioning approaches
1. Cassandra: Explicit
2. Dynamo: Explicit
3. CosmosDB: Explicit
4. Spanner: Explicit
© DataStax, All Rights Reserved.
Stuff (Almost) Everyone Agrees On
1. Eventual Consistency is useful
2. Automatic partitioning doesn’t work
3. SQL is a pretty okay query language
© DataStax, All Rights Reserved.
© DataStax, All Rights Reserved.
Query APIs
1. Cassandra: CQL, inspired by SQL
2. DynamoDB: Actually still pretty first-gen NoSQL
3. CosmosDB: “SQL”
4. Spanner: “SQL”
© DataStax, All Rights Reserved.
Stuff (Almost) Everyone Agrees On
1. Eventual Consistency is useful
2. Automatic partitioning doesn’t work
3. SQL is a pretty okay query language
4. … that’s about it
© DataStax, All Rights Reserved.
Thomas Sowell
There are no solutions.
Only tradeoffs.
© DataStax, All Rights Reserved.
Cassandra
© DataStax, All Rights Reserved.
Data Model: tabular, with nested content
CREATE TABLE notifications (
target_user text,
notification_id timeuuid,
source_id uuid,
source_type text,
activity text,
PRIMARY KEY (target_user, notification_id)
)
WITH CLUSTERING ORDER BY (notification_id DESC);
© DataStax, All Rights Reserved.
target_user notification_id source_id source_type activity
nick e1bd2bcb- d972b679- photo tom liked
nick 321998c- d972b679- photo jake commented
nick ea1c5d35- 88a049d5- user mike created
account
nick 5321998c- 64613f27- photo tom commented
nick 07581439- 076eab7e- user tyler created
account
mike 1c34467a- f04e309f- user tom created
account
© DataStax, All Rights Reserved.
Collections
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int,
email_addresses set<text>
);
© DataStax, All Rights Reserved.
User-defined Types
CREATE TYPE address (
street text,
city text,
zip_code int,
phones set<text>
)
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
addresses map<text, address>
)
SELECT id, name, addresses.city, addresses.phones FROM users;
id | name | addresses.city | addresses.phones
--------------------+----------------+--------------------------
63bf691f | jbellis | Austin | {'512-4567', '512-9999'}
© DataStax, All Rights Reserved.
JSON
INSERT INTO users JSON
'{"id": "0514e410-",
"name": "jbellis",
"addresses": {"home": {"street": "9920 Cassandra Ave",
"city": "Austin",
"zip_code": 78700,
"phones": ["1238614789"]}}}';
© DataStax, All Rights Reserved.
Without nesting
© DataStax, All Rights Reserved.
With nesting
© DataStax, All Rights Reserved.
Consistency Levels
© DataStax, All Rights Reserved.
Consistency Levels
© DataStax, All Rights Reserved.
Consistency Levels
© DataStax, All Rights Reserved.
Multi-region
1. Synchronous writes locally; async globally
2. Serve reads and writes for any row in any region
3. DR is “free”
4. Client-level support
© DataStax, All Rights Reserved.
Notable features
● Lightweight Transactions (Paxos, expensive)
● Materialized Views
● User-defined functions
● Strict schema
© DataStax, All Rights Reserved.
© DataStax, All Rights Reserved.
© DataStax, All Rights Reserved.
Schema confusion
{"userid": "2452347",
"name": "jbellis",
... }
{"userid": 2452348,
"name": "jshook",
... }
{"user_id": 2452349,
"name": "jlacefield",
... }
© DataStax, All Rights Reserved.
DynamoDB
© DataStax, All Rights Reserved.
CP single partition
● Original Dynamo was AP
● DynamoDB offers Strong/Eventual read consistency
● But
Conditional writes are the same price as regular writes
And: “All write requests are applied in the order in which
they were received”
© DataStax, All Rights Reserved.
Data model
● “Map of maps”
● Primary key, sort key
© DataStax, All Rights Reserved.
Data Model
© DataStax, All Rights Reserved.
Multi-region
● New feature (late 2017): Global tables
● Shards and replicates a table across regions
● Each shard can only be written to by its master region
© DataStax, All Rights Reserved.
Notable features
● Global indexes
● Change feed
● “DynamoDB Transaction Library”
“A put that does not contend with any other
simultaneous puts can be expected to perform 7N + 4
writes as the original operation, where N is the number
of requests in the transaction.”
© DataStax, All Rights Reserved.
Sidebar: cross-partition txns in AP?
© DataStax, All Rights Reserved.
CosmosDB
© DataStax, All Rights Reserved.
Data model:
CP Single Partition
© DataStax, All Rights Reserved.
“Multi model”
● NOT just “APIs”
● More like MyRocks/MongoRocks than C* CQL/JSON
● What they have in common:
● Hash partitioning
● Undefined sorting within partitions; use ORDER BY
© DataStax, All Rights Reserved.
Data model
© DataStax, All Rights Reserved.
“Multi model”
● Not all features supported everywhere
● Azure Functions
● Change Feed
● TLDR use SQL/Document API
© DataStax, All Rights Reserved.
© DataStax, All Rights Reserved.
SQL support and extensions
SELECT c.givenName
FROM Families f
JOIN c IN f.children
WHERE f.id = 'WakefieldFamily'
ORDER BY f.address.city ASC
“The language lets you refer to nodes of the tree at any
arbitrary depth, like Node1.Node2.Node3…..NodeM”
© DataStax, All Rights Reserved.
© DataStax, All Rights Reserved.
Consistency Levels
© DataStax, All Rights Reserved.
Consistency Levels
● This is probably still too many
(“About 73% of Azure Cosmos DB tenants use session
consistency and 20% prefer bounded staleness.”)
© DataStax, All Rights Reserved.
Implementation clue?
© DataStax, All Rights Reserved.
Multi-region
● Claims local read/writes with async replication between
regions
● But, also claims ACID single-partition transactions in
stored procedures
● You can’t have both! Something doesn’t add up!
© DataStax, All Rights Reserved.
Notable features
● Everything is indexed
99p < 20% overhead
● “Attachment” special document type for blobs
Main purpose seems to be to allow PUTing data easily
● Stored procedures
Including (single-partition) transactions
Only for SQL API
● Change feed
© DataStax, All Rights Reserved.
Spanner
© DataStax, All Rights Reserved.
Data model: CP multi-partition
● “Reuse existing SQL skills to query data in Cloud
Spanner using familiar, industry-standard ANSI 2011
SQL.”
(Actually a fairly small subset)
(And only for SELECT)
© DataStax, All Rights Reserved.
Interleaved/child tables
CREATE TABLE Singers (
SingerId INT64 NOT NULL,
FirstName STRING(1024),
LastName STRING(1024),
SingerInfo BYTES(MAX),
) PRIMARY KEY (SingerId);
CREATE TABLE Albums (
SingerId INT64 NOT NULL,
AlbumId INT64 NOT NULL,
AlbumTitle STRING(MAX),
) PRIMARY KEY (SingerId, AlbumId),
INTERLEAVE IN PARENT Singers ON DELETE CASCADE;
© DataStax, All Rights Reserved.
© DataStax, All Rights Reserved.
Multi-region
© DataStax, All Rights Reserved.
Multi-region, TLDR
● You can replicate to multiple regions
● Only one region can accept writes at a time
● Opinion: it is not often useful to scale reads without also
scaling writes
© DataStax, All Rights Reserved.
Notable features
● Full multi-partition ACID 2PC
(using Paxos replication groups)
● DFS-based, not local storage
© DataStax, All Rights Reserved.
The price of ACID
© DataStax, All Rights Reserved.
Writes slow down (indexed) reads
Quizlet:
“Bulk writes severely impact the performance of queries
using the secondary index [becuase] a write with a
secondary index updates many splits, which [since
Spanner uses pessimistic locking] creates contention for
reads that use that secondary index.”
© DataStax, All Rights Reserved.
Practical considerations
© DataStax, All Rights Reserved.
Cassandra
● Run anywhere you like--but you have to run it
○ But: DataStax Managed Cloud, Instaclustr
○ Also: DataStax Remote DBA
● Storage closely tied to compute
○ But: everyone struggles with this
© DataStax, All Rights Reserved.
Multi-cloud
● JP Morgan: “We have seen increasingly all the
customers we talk to, almost exclusively large
mid-market to large enterprise, all now are embracing
multi-cloud as a specific strategy.”
●
© DataStax, All Rights Reserved.
DynamoDB
● Request capacity tied to “partitions” [pp]
○ pp count = max (rc / 3000, wc / 1000, st / 10 GB)
● Subtle implication: capacity / pp decreases as storage
volume increases
○ Non-uniform: pp request capacity halved when shard splits
● Subtle implication 2: bulk loads will wreck your planning
© DataStax, All Rights Reserved.
“Best practices for tables”
● Bulk load 20M items = 20 GB
● Target 30 minutes = 11,000 write capacity = 11 pps
● Post bulk load steady state 200 req/s = 18 req/pp
● No way to reduce partition count
© DataStax, All Rights Reserved.
DynamoDB provisioning in the wild
● You Probably Shouldn’t Use DynamoDB
○ Hacker News Discussion
● The Million Dollar Engineering Challenge
○ Hacker News discussion
© DataStax, All Rights Reserved.
CosmosDB
● Like DynamoDB, but underdocumented
● Partition and scale in Azure Cosmos DB
○ Unspecified: max pp storage size, max pp request capacity
© DataStax, All Rights Reserved.
Spanner
● DFS architecture means it doesn’t have the provisioning
problem
● Priced per node (~$8500 min per 2 TB data, per year) +
$3600 / TB / yr
○ Doesn’t appear to be part of the GCP free usage tier
● Competing with Cloud Datastore, Cloud BigTable
© DataStax, All Rights Reserved.
Recommendations
© DataStax, All Rights Reserved.
Never recommended
● DynamoDB: Lack of nesting is too limiting
● Spanner: All ACID, all the time is too expensive
© DataStax, All Rights Reserved.
CosmosDB vs Cassandra
● Rough feature parity
○ Partitioned rows (documents) with nesting
○ Default EC with opt-in to stronger options
● Cassandra
○ Materialized views
○ True multi-model
○ Predictable provisioning
● CosmosDB
○ Stored procedures
○ ORDER BY
○ Change feed
© DataStax, All Rights Reserved.
CosmosDB vs Cassandra
● Given rough feature parity, why would you pick the one
that only runs in a single cloud?
● Hobbyist / less than one (three) C* VM?
© DataStax, All Rights Reserved.
DataStax Enterprise
© DataStax, All Rights Reserved.
Distributed Systems Reading List
● Bigtable: A Distributed Storage System for Structured
Data
● Dynamo: Amazon’s Highly Available Key-value Store
● Cassandra - A Decentralized Structured Storage
System [annotated by Jonathan Ellis]
● Skew-aware automatic database partitioning in
shared-nothing, parallel OLTP systems
● Calvin: Fast Distributed Transactions for Partitioned
Database Systems
● Spanner: Google's Globally-Distributed Database
© DataStax, All Rights Reserved.
Thank you

Más contenido relacionado

La actualidad más candente

Lecture 40 1
Lecture 40 1Lecture 40 1
Lecture 40 1patib5
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in CassandraShogo Hoshii
 
N1QL New Features in couchbase 7.0
N1QL New Features in couchbase 7.0N1QL New Features in couchbase 7.0
N1QL New Features in couchbase 7.0Keshav Murthy
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopPatricia Gorla
 
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...DataStax
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
 
5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra EnvironmentJim Hatcher
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...DataStax
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary
 
Mongodb - NoSql Database
Mongodb - NoSql DatabaseMongodb - NoSql Database
Mongodb - NoSql DatabasePrashant Gupta
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOAltinity Ltd
 
Writing A Foreign Data Wrapper
Writing A Foreign Data WrapperWriting A Foreign Data Wrapper
Writing A Foreign Data Wrapperpsoo1978
 
Accessing Databases from R
Accessing Databases from RAccessing Databases from R
Accessing Databases from RJeffrey Breen
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraАндрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraOlga Lavrentieva
 
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...Amazon Web Services
 

La actualidad más candente (20)

Lecture 40 1
Lecture 40 1Lecture 40 1
Lecture 40 1
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in Cassandra
 
N1QL New Features in couchbase 7.0
N1QL New Features in couchbase 7.0N1QL New Features in couchbase 7.0
N1QL New Features in couchbase 7.0
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and Hadoop
 
NoSQL Introduction
NoSQL IntroductionNoSQL Introduction
NoSQL Introduction
 
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
druid.io
druid.iodruid.io
druid.io
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it works
 
Mongodb - NoSql Database
Mongodb - NoSql DatabaseMongodb - NoSql Database
Mongodb - NoSql Database
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
 
Writing A Foreign Data Wrapper
Writing A Foreign Data WrapperWriting A Foreign Data Wrapper
Writing A Foreign Data Wrapper
 
Accessing Databases from R
Accessing Databases from RAccessing Databases from R
Accessing Databases from R
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraАндрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
 
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
 
Druid
DruidDruid
Druid
 

Similar a Data day texas: Cassandra and the Cloud

Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 
implementation of a big data architecture for real-time analytics with data s...
implementation of a big data architecture for real-time analytics with data s...implementation of a big data architecture for real-time analytics with data s...
implementation of a big data architecture for real-time analytics with data s...Joseph Arriola
 
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...DataStax
 
Flight on Zeppelin with Apache Spark & Cassandra
Flight on Zeppelin with Apache Spark & CassandraFlight on Zeppelin with Apache Spark & Cassandra
Flight on Zeppelin with Apache Spark & CassandraAlex Ott
 
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...Vinay Kumar Chella
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBjhugg
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSqlOmid Vahdaty
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyGuillaume Lefranc
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive Omid Vahdaty
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamNeville Li
 
Getting Started With Amazon Redshift
Getting Started With Amazon Redshift Getting Started With Amazon Redshift
Getting Started With Amazon Redshift Matillion
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
Understanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreMariaDB plc
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSKimmo Kantojärvi
 
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...Spark Summit
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Amazon Web Services
 

Similar a Data day texas: Cassandra and the Cloud (20)

Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
DataStax 6 and Beyond
DataStax 6 and BeyondDataStax 6 and Beyond
DataStax 6 and Beyond
 
implementation of a big data architecture for real-time analytics with data s...
implementation of a big data architecture for real-time analytics with data s...implementation of a big data architecture for real-time analytics with data s...
implementation of a big data architecture for real-time analytics with data s...
 
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
Flight on Zeppelin with Apache Spark & Cassandra
Flight on Zeppelin with Apache Spark & CassandraFlight on Zeppelin with Apache Spark & Cassandra
Flight on Zeppelin with Apache Spark & Cassandra
 
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
Safer restarts, faster streaming, and better repair, just a glimpse of cassan...
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
Dancing with the Elephant
Dancing with the ElephantDancing with the Elephant
Dancing with the Elephant
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
 
Getting Started With Amazon Redshift
Getting Started With Amazon Redshift Getting Started With Amazon Redshift
Getting Started With Amazon Redshift
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Understanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStore
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 

Más de jbellis

Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015jbellis
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014jbellis
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1jbellis
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014jbellis
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013jbellis
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0jbellis
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynotejbellis
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012jbellis
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionjbellis
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012jbellis
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandrajbellis
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1jbellis
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Javajbellis
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprisejbellis
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)jbellis
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011jbellis
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)jbellis
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from javajbellis
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011jbellis
 
Brisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by CassandraBrisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by Cassandrajbellis
 

Más de jbellis (20)

Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solution
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from java
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011
 
Brisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by CassandraBrisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by Cassandra
 

Último

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Último (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Data day texas: Cassandra and the Cloud

  • 1. © DataStax, All Rights Reserved. Cassandra and the Cloud (2018 edition) Jonathan Ellis
  • 2. © DataStax, All Rights Reserved. “Databases”
  • 3. © DataStax, All Rights Reserved. Stuff everyone agrees on
  • 4. © DataStax, All Rights Reserved. Stuff (Almost) Everyone Agrees On 1. Eventual Consistency is useful
  • 5. © DataStax, All Rights Reserved. Eventual Consistency (CP edition)
  • 6. © DataStax, All Rights Reserved. Eventual Consistency (AP edition)
  • 7. © DataStax, All Rights Reserved. Default consistency levels 1. Cassandra: Eventual 2. Dynamo: Eventual 3. CosmosDB: Eventual (“Session”) 4. Spanner: ACID (no EC)
  • 8. © DataStax, All Rights Reserved. Stuff (Almost) Everyone Agrees On 1. Eventual Consistency is useful 2. Automatic partitioning doesn’t work
  • 9. © DataStax, All Rights Reserved. H-Store (2012)
  • 10. © DataStax, All Rights Reserved. Partitioning approaches 1. Cassandra: Explicit 2. Dynamo: Explicit 3. CosmosDB: Explicit 4. Spanner: Explicit
  • 11. © DataStax, All Rights Reserved. Stuff (Almost) Everyone Agrees On 1. Eventual Consistency is useful 2. Automatic partitioning doesn’t work 3. SQL is a pretty okay query language
  • 12. © DataStax, All Rights Reserved.
  • 13. © DataStax, All Rights Reserved. Query APIs 1. Cassandra: CQL, inspired by SQL 2. DynamoDB: Actually still pretty first-gen NoSQL 3. CosmosDB: “SQL” 4. Spanner: “SQL”
  • 14. © DataStax, All Rights Reserved. Stuff (Almost) Everyone Agrees On 1. Eventual Consistency is useful 2. Automatic partitioning doesn’t work 3. SQL is a pretty okay query language 4. … that’s about it
  • 15. © DataStax, All Rights Reserved. Thomas Sowell There are no solutions. Only tradeoffs.
  • 16. © DataStax, All Rights Reserved. Cassandra
  • 17. © DataStax, All Rights Reserved. Data Model: tabular, with nested content CREATE TABLE notifications ( target_user text, notification_id timeuuid, source_id uuid, source_type text, activity text, PRIMARY KEY (target_user, notification_id) ) WITH CLUSTERING ORDER BY (notification_id DESC);
  • 18. © DataStax, All Rights Reserved. target_user notification_id source_id source_type activity nick e1bd2bcb- d972b679- photo tom liked nick 321998c- d972b679- photo jake commented nick ea1c5d35- 88a049d5- user mike created account nick 5321998c- 64613f27- photo tom commented nick 07581439- 076eab7e- user tyler created account mike 1c34467a- f04e309f- user tom created account
  • 19. © DataStax, All Rights Reserved. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text> );
  • 20. © DataStax, All Rights Reserved. User-defined Types CREATE TYPE address ( street text, city text, zip_code int, phones set<text> ) CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address> ) SELECT id, name, addresses.city, addresses.phones FROM users; id | name | addresses.city | addresses.phones --------------------+----------------+-------------------------- 63bf691f | jbellis | Austin | {'512-4567', '512-9999'}
  • 21. © DataStax, All Rights Reserved. JSON INSERT INTO users JSON '{"id": "0514e410-", "name": "jbellis", "addresses": {"home": {"street": "9920 Cassandra Ave", "city": "Austin", "zip_code": 78700, "phones": ["1238614789"]}}}';
  • 22. © DataStax, All Rights Reserved. Without nesting
  • 23. © DataStax, All Rights Reserved. With nesting
  • 24. © DataStax, All Rights Reserved. Consistency Levels
  • 25. © DataStax, All Rights Reserved. Consistency Levels
  • 26. © DataStax, All Rights Reserved. Consistency Levels
  • 27. © DataStax, All Rights Reserved. Multi-region 1. Synchronous writes locally; async globally 2. Serve reads and writes for any row in any region 3. DR is “free” 4. Client-level support
  • 28. © DataStax, All Rights Reserved. Notable features ● Lightweight Transactions (Paxos, expensive) ● Materialized Views ● User-defined functions ● Strict schema
  • 29. © DataStax, All Rights Reserved.
  • 30. © DataStax, All Rights Reserved.
  • 31. © DataStax, All Rights Reserved. Schema confusion {"userid": "2452347", "name": "jbellis", ... } {"userid": 2452348, "name": "jshook", ... } {"user_id": 2452349, "name": "jlacefield", ... }
  • 32. © DataStax, All Rights Reserved. DynamoDB
  • 33. © DataStax, All Rights Reserved. CP single partition ● Original Dynamo was AP ● DynamoDB offers Strong/Eventual read consistency ● But Conditional writes are the same price as regular writes And: “All write requests are applied in the order in which they were received”
  • 34. © DataStax, All Rights Reserved. Data model ● “Map of maps” ● Primary key, sort key
  • 35. © DataStax, All Rights Reserved. Data Model
  • 36. © DataStax, All Rights Reserved. Multi-region ● New feature (late 2017): Global tables ● Shards and replicates a table across regions ● Each shard can only be written to by its master region
  • 37. © DataStax, All Rights Reserved. Notable features ● Global indexes ● Change feed ● “DynamoDB Transaction Library” “A put that does not contend with any other simultaneous puts can be expected to perform 7N + 4 writes as the original operation, where N is the number of requests in the transaction.”
  • 38. © DataStax, All Rights Reserved. Sidebar: cross-partition txns in AP?
  • 39. © DataStax, All Rights Reserved. CosmosDB
  • 40. © DataStax, All Rights Reserved. Data model: CP Single Partition
  • 41. © DataStax, All Rights Reserved. “Multi model” ● NOT just “APIs” ● More like MyRocks/MongoRocks than C* CQL/JSON ● What they have in common: ● Hash partitioning ● Undefined sorting within partitions; use ORDER BY
  • 42. © DataStax, All Rights Reserved. Data model
  • 43. © DataStax, All Rights Reserved. “Multi model” ● Not all features supported everywhere ● Azure Functions ● Change Feed ● TLDR use SQL/Document API
  • 44. © DataStax, All Rights Reserved.
  • 45. © DataStax, All Rights Reserved. SQL support and extensions SELECT c.givenName FROM Families f JOIN c IN f.children WHERE f.id = 'WakefieldFamily' ORDER BY f.address.city ASC “The language lets you refer to nodes of the tree at any arbitrary depth, like Node1.Node2.Node3…..NodeM”
  • 46. © DataStax, All Rights Reserved.
  • 47. © DataStax, All Rights Reserved. Consistency Levels
  • 48. © DataStax, All Rights Reserved. Consistency Levels ● This is probably still too many (“About 73% of Azure Cosmos DB tenants use session consistency and 20% prefer bounded staleness.”)
  • 49. © DataStax, All Rights Reserved. Implementation clue?
  • 50. © DataStax, All Rights Reserved. Multi-region ● Claims local read/writes with async replication between regions ● But, also claims ACID single-partition transactions in stored procedures ● You can’t have both! Something doesn’t add up!
  • 51. © DataStax, All Rights Reserved. Notable features ● Everything is indexed 99p < 20% overhead ● “Attachment” special document type for blobs Main purpose seems to be to allow PUTing data easily ● Stored procedures Including (single-partition) transactions Only for SQL API ● Change feed
  • 52. © DataStax, All Rights Reserved. Spanner
  • 53. © DataStax, All Rights Reserved. Data model: CP multi-partition ● “Reuse existing SQL skills to query data in Cloud Spanner using familiar, industry-standard ANSI 2011 SQL.” (Actually a fairly small subset) (And only for SELECT)
  • 54. © DataStax, All Rights Reserved. Interleaved/child tables CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), SingerInfo BYTES(MAX), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), ) PRIMARY KEY (SingerId, AlbumId), INTERLEAVE IN PARENT Singers ON DELETE CASCADE;
  • 55. © DataStax, All Rights Reserved.
  • 56. © DataStax, All Rights Reserved. Multi-region
  • 57. © DataStax, All Rights Reserved. Multi-region, TLDR ● You can replicate to multiple regions ● Only one region can accept writes at a time ● Opinion: it is not often useful to scale reads without also scaling writes
  • 58. © DataStax, All Rights Reserved. Notable features ● Full multi-partition ACID 2PC (using Paxos replication groups) ● DFS-based, not local storage
  • 59. © DataStax, All Rights Reserved. The price of ACID
  • 60. © DataStax, All Rights Reserved. Writes slow down (indexed) reads Quizlet: “Bulk writes severely impact the performance of queries using the secondary index [becuase] a write with a secondary index updates many splits, which [since Spanner uses pessimistic locking] creates contention for reads that use that secondary index.”
  • 61. © DataStax, All Rights Reserved. Practical considerations
  • 62. © DataStax, All Rights Reserved. Cassandra ● Run anywhere you like--but you have to run it ○ But: DataStax Managed Cloud, Instaclustr ○ Also: DataStax Remote DBA ● Storage closely tied to compute ○ But: everyone struggles with this
  • 63. © DataStax, All Rights Reserved. Multi-cloud ● JP Morgan: “We have seen increasingly all the customers we talk to, almost exclusively large mid-market to large enterprise, all now are embracing multi-cloud as a specific strategy.” ●
  • 64. © DataStax, All Rights Reserved. DynamoDB ● Request capacity tied to “partitions” [pp] ○ pp count = max (rc / 3000, wc / 1000, st / 10 GB) ● Subtle implication: capacity / pp decreases as storage volume increases ○ Non-uniform: pp request capacity halved when shard splits ● Subtle implication 2: bulk loads will wreck your planning
  • 65. © DataStax, All Rights Reserved. “Best practices for tables” ● Bulk load 20M items = 20 GB ● Target 30 minutes = 11,000 write capacity = 11 pps ● Post bulk load steady state 200 req/s = 18 req/pp ● No way to reduce partition count
  • 66. © DataStax, All Rights Reserved. DynamoDB provisioning in the wild ● You Probably Shouldn’t Use DynamoDB ○ Hacker News Discussion ● The Million Dollar Engineering Challenge ○ Hacker News discussion
  • 67. © DataStax, All Rights Reserved. CosmosDB ● Like DynamoDB, but underdocumented ● Partition and scale in Azure Cosmos DB ○ Unspecified: max pp storage size, max pp request capacity
  • 68. © DataStax, All Rights Reserved. Spanner ● DFS architecture means it doesn’t have the provisioning problem ● Priced per node (~$8500 min per 2 TB data, per year) + $3600 / TB / yr ○ Doesn’t appear to be part of the GCP free usage tier ● Competing with Cloud Datastore, Cloud BigTable
  • 69. © DataStax, All Rights Reserved. Recommendations
  • 70. © DataStax, All Rights Reserved. Never recommended ● DynamoDB: Lack of nesting is too limiting ● Spanner: All ACID, all the time is too expensive
  • 71. © DataStax, All Rights Reserved. CosmosDB vs Cassandra ● Rough feature parity ○ Partitioned rows (documents) with nesting ○ Default EC with opt-in to stronger options ● Cassandra ○ Materialized views ○ True multi-model ○ Predictable provisioning ● CosmosDB ○ Stored procedures ○ ORDER BY ○ Change feed
  • 72. © DataStax, All Rights Reserved. CosmosDB vs Cassandra ● Given rough feature parity, why would you pick the one that only runs in a single cloud? ● Hobbyist / less than one (three) C* VM?
  • 73. © DataStax, All Rights Reserved. DataStax Enterprise
  • 74. © DataStax, All Rights Reserved. Distributed Systems Reading List ● Bigtable: A Distributed Storage System for Structured Data ● Dynamo: Amazon’s Highly Available Key-value Store ● Cassandra - A Decentralized Structured Storage System [annotated by Jonathan Ellis] ● Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems ● Calvin: Fast Distributed Transactions for Partitioned Database Systems ● Spanner: Google's Globally-Distributed Database
  • 75. © DataStax, All Rights Reserved. Thank you