SlideShare una empresa de Scribd logo
1 de 64
Descargar para leer sin conexión
Cassandra - lesson learned
Andrzej Ludwikowski
About me?
- www.ludwikowski.info
- github.com/aludwiko
- @aludwikowski
-
Why Cassandra?
- BigData!!!
- Volume (petabytes of data, trillions of entities)
- Velocity (real-time, streams, millions of transactions per second)
- Variety (un-, semi-, structured)
- writes are cheap, reads are ???
- near-linear horizontal scaling (in a proper use cases)
- fully distributed, with no single point of failure
- data replication by default
Cassandra vs CAP?
- CAP Theorem - pick two
Cassandra vs CAP?
- CAP Theorem - pick two
Cassandra vs CAP?
- CAP Theorem - pick two
Origins?
2010
Name?
Name?
Write path
Node 1
Node 2
Node 3
Node 4
Client
(driver)
Write path
Node 1
Node 2
Node 3
Node 4
Client
(driver)
- Any node can coordinate any request (NSPOF)
- Any node can coordinate any request (NSPOF)
- Replication Factor
Write path
Node 1
Node 2
Node 3
Node 4
Client
RF=3
- Any node can coordinate any request (NSPOF)
- Replication Factor
- Consistency Level
Write path
Node 1
Node 2
Node 3
Node 4
Client
RF=3
CL=2
- Token ring from -2^63 to 2^64
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
0100
- Token ring from -2^63 to 2^64
- Partitioner: partition key -> token
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
Client
Partitioner
0-25
25-50
51-75
76-100
77
- Token ring from -2^63 to 2^64
- Partitioner: primary key -> token
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
Client
Partitioner
0-25
25-50
51-75
76-100
77
- Token ring from -2^63 to 2^64
- Partitioner: primary key -> token
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
Client
Partitioner
0-25
25-50
51-75
76-100
77
77
77
DEMO
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
- Hinted handoff
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
- Hinted handoff
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
- Hinted handoff
- Retry idempotent inserts
- build-in policies
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
- Hinted handoff
- Retry idempotent inserts
- build-in policies
- Lightweight transactions (Paxos)
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
- Hinted handoff
- Retry idempotent inserts
- build-in policies
- Lightweight transactions (Paxos)
- Batches
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
Write path - node level
Write path - why so fast?
- Commit log - append only
Write path - why so fast?
Write path - why so fast?
50,000 t/s
50 t/ms
5 t/100us
1 t/20us
Write path - why so fast?
- Commit log - append only
- Periodic (10s) or batch sync to disk
Node 1
Node 2
Node 3
Node 4
Client
RF=2
CL=2
D
asdd
R
ack
2
R
ack
1
Write path - why so fast?
- Commit log - append only
- Periodic or batch sync to disk
- Network topology aware
Node 1
Node 2
Node 3
Node 4
Client
RF=2
CL=2
Write path - why so fast?
Client
- Commit log - append only
- Periodic or batch sync to disk
- Network topology aware
Asia DC
Europe DC
- Most recent win
- Eager retries
- In-memory
- MemTable
- Row Cache
- Bloom Filters
- Key Caches
- Partition Summaries
- On disk
- Partition Indexes
- SSTables
Node 1
Node 2
Node 3
Node 4
Client
RF=3
CL=3
Read path
timestamp 67
timestamp 99
timestamp 88
Immediate vs. Eventual Consistency
- if (writeCL + readCL) > replication_factor then immediate consistency
- writeCL=ALL, readCL=1
- writeCL=1, readCL=ALL
- writeCL,readCL=QUORUM
- https://www.ecyrd.com/cassandracalculator/
Node 1
Node 2
Node 3
Node 4
Client
RF=3
Modeling - new mindset
- QDD, Query Driven Development
- Nesting is ok
- Duplication is ok
- Writes are cheap
no joins
QDD - Conceptual model
- Technology independent
- Chen notation
QDD - Application workflow
QDD - Logical model
- Chebotko diagram
QDD - Physical model
- Technology dependent
- Analysis and validation (finding problems)
- Physical optimization (fixing problems)
- Data types
Physical storage
- Primary key
- Partition key
CREATE TABLE videos (
id int,
title text,
runtime int,
year int,
PRIMARY KEY (id)
);
id | title | runtime | year
----+---------------------+---------+------
1 | dzien swira | 93 | 2002
2 | chlopaki nie placza | 96 | 2000
3 | psy | 104 | 1992
4 | psy 2 | 96 | 1994
1
title runtime year
dzien swira 93 2002
2
title runtime year
chlopaki... 96 2000
3
title runtime year
psy 104 1992
4
title runtime year
psy 2 96 1994
SELECT FROM videos
WHERE title = ‘dzien swira’
Physical storage
CREATE TABLE videos_with_clustering (
title text,
runtime int,
year int,
PRIMARY KEY ((title), year)
);
- Primary key (could be compound)
- Partition key
- Clustering column (order, uniqueness)
title | year | runtime
-------------+------+---------
godzilla | 1954 | 98
godzilla | 1998 | 140
godzilla | 2014 | 123
psy | 1992 | 104
godzilla
1954 runtime
98
1998 runtime
140
2014 runtime
123
1992 runtime
104
psy
SELECT FROM videos_with_clustering
WHERE title = ‘godzilla’;
SELECT FROM videos_with_clustering
WHERE title = ‘godzilla’ AND year > 1998;
Physical storage
CREATE TABLE videos_with_composite_pk(
title text,
runtime int,
year int,
PRIMARY KEY ((title, year))
);
- Primary key (could be compound)
- Partition key (could be composite)
- Clustering column (order, uniqueness)
title | year | runtime
-------------+------+---------
godzilla | 1954 | 98
godzilla | 1998 | 140
godzilla | 2014 | 123
psy | 1992 | 104
godzilla:1954
runtime
93
godzilla:1998
runtime
140
godzilla:2014
runtime
123
psy:1992
runtime
104
SELECT FROM videos_with_composite_pk
WHERE title = ‘godzilla’
AND year = 1954
Modeling - clustering column(s)
Q: Retrieve videos an actor has appeared in (newest first).
Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ( )
) WITH CLUSTERING ORDER BY ( );
Q: Retrieve videos an actor has appeared in (newest first).
Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ((actor), added_date)
) WITH CLUSTERING ORDER BY (added_date desc);
Q: Retrieve videos an actor has appeared in (newest first).
Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ((actor), added_date, video_id)
) WITH CLUSTERING ORDER BY (added_date desc);
Q: Retrieve videos an actor has appeared in (newest first).
Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ((actor), added_date, video_id, character_name)
) WITH CLUSTERING ORDER BY (added_date desc);
Q: Retrieve videos an actor has appeared in (newest first).
Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ( )
) WITH CLUSTERING ORDER BY ( );
Q: Retrieve last 1000 measurement from given day.
Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ((weather_station_id), date, event_time)
) WITH CLUSTERING ORDER BY (event_time desc);
Q: Retrieve last 1000 measurement from given day.
Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ((weather_station_id), date, event_time)
) WITH CLUSTERING ORDER BY (event_time desc);
Q: Retrieve last 1000 measurement from given day.
1 day = 86 400 rows
1 week = 604 800 rows
1 month = 2 592 000 rows
1 year = 31 536 000 rows
Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ((weather_station_id, date), event_time)
) WITH CLUSTERING ORDER BY (event_time desc);
Q: Retrieve last 1000 measurement from given day.
Modeling - TTL
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ((weather_station_id, date), event_time)
) WITH CLUSTERING ORDER BY (event_time desc);
Retention policy - keep data only from last week.
INSERT INTO temperature_by_day … USING TTL 604800;
Modeling - bit map index
CREATE TABLE car (
year timestamp,
model text,
color timestamp,
vehicle_id int,
//other columns
PRIMARY KEY ((year, model, color), vehicle_id)
);
Q: Find car by year and/or model and/or color.
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', 'blue', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', '', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', 'blue', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', '', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', 'blue', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', '', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', '', 'blue', 13, ...);
SELECT * FROM car WHERE year=2000 and model=’’ and color=’blue’;
Modeling - wide rows
CREATE TABLE user (
email text,
name text,
age int,
PRIMARY KEY (email)
);
Q: Find user by email.
Modeling - wide rows
CREATE TABLE user (
domain text,
user text,
name text,
age int,
PRIMARY KEY ((domain), user)
);
Q: Find user by email.
Modeling - versioning with lightweight transactions
CREATE TABLE document (
id text,
content text,
version int,
locked_by text,
PRIMARY KEY ((id))
);
INSERT INTO document (id, content , version ) VALUES ( 'my doc', 'some content', 1)
IF NOT EXISTS;
UPDATE document SET locked_by = 'andrzej' WHERE id = 'my doc' IF locked_by = null;
UPDATE document SET content = 'better content', version = 2, locked_by = null
WHERE id = 'my doc' IF locked_by = 'andrzej';
Modeling - JSON with UDT and tuples
{
"title": "Example Schema",
"type": "object",
"properties": {
"firstName": “andrzej”,
"lastName": “ludwikowski”,
"age": {
"description": "Age in years",
"type": "integer",
"minimum": 0
}
},
“x_dimension”: “1”,
“y_dimension”: “2”,
}
CREATE TYPE age (
description text,
type int,
minimum int
);
CREATE TYPE prop (
firstName text,
lastName text,
age frozen <age>
);
CREATE TABLE json (
title text,
type text,
properties list<frozen <prop>>,
dimensions tuple<int, int>
PRIMARY KEY (title)
);
Common use cases
- Sensor data (Zonar)
- Fraud detection (Barracuda)
- Playlist and collections (Spotify)
- Personalization and recommendation engines (Ebay)
- Messaging (Instagram)
- Event Sourcing!
Common anti use cases
- Queue
- Search engine
Tombstones
- Understanding Cassandra tombstones
Datastax Academy
- Introduction to Apache Cassandra
- Data Modeling
- DataStax Enterprise Foundations of Apache Cassandra
- DataStax Enterprise Operations with Apache Cassandra
- DataStax Enterprise Search
- DataStax Enterprise Analytics with Apache Spark
- DataStax Enterprise Graph
Competition?
ScyllaDB
- Cassandra without JVM
- same protocol, SSTable compatibility
- C++ and Seastar lib
- 1,000,000 IOPS
Not covered
- schema migrations
- backups
- DSE
About me?
- www.ludwikowski.info
- github.com/aludwiko
- @aludwikowski
-

Más contenido relacionado

La actualidad más candente

ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...Altinity Ltd
 
Webinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse
Webinar slides: Adding Fast Analytics to MySQL Applications with ClickhouseWebinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse
Webinar slides: Adding Fast Analytics to MySQL Applications with ClickhouseAltinity Ltd
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovAltinity Ltd
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...Altinity Ltd
 
Fulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesFulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesAdrian Nuta
 
ClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesAltinity Ltd
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesAltinity Ltd
 
Leveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy SystemsLeveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy SystemsMathias Herberts
 
Bh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesBh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesMatt Kocubinski
 
Fun with click house window functions webinar slides 2021-08-19
Fun with click house window functions webinar slides  2021-08-19Fun with click house window functions webinar slides  2021-08-19
Fun with click house window functions webinar slides 2021-08-19Altinity Ltd
 
financial analytics of AAPL_stock markets
financial analytics of AAPL_stock marketsfinancial analytics of AAPL_stock markets
financial analytics of AAPL_stock marketssarath Kumar
 
Go Programming Patterns
Go Programming PatternsGo Programming Patterns
Go Programming PatternsHao Chen
 
Cassandra
CassandraCassandra
Cassandrapcmanus
 
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03Mathias Herberts
 
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Mathias Herberts
 
Tiered storage intro. By Robert Hodges, Altinity CEO
Tiered storage intro. By Robert Hodges, Altinity CEOTiered storage intro. By Robert Hodges, Altinity CEO
Tiered storage intro. By Robert Hodges, Altinity CEOAltinity Ltd
 
This is not your father's monitoring.
This is not your father's monitoring.This is not your father's monitoring.
This is not your father's monitoring.Mathias Herberts
 
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Altinity Ltd
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...Altinity Ltd
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOAltinity Ltd
 

La actualidad más candente (20)

ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
 
Webinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse
Webinar slides: Adding Fast Analytics to MySQL Applications with ClickhouseWebinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse
Webinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
 
Fulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesFulltext engine for non fulltext searches
Fulltext engine for non fulltext searches
 
ClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic Continues
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
 
Leveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy SystemsLeveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy Systems
 
Bh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slidesBh ad-12-stealing-from-thieves-saher-slides
Bh ad-12-stealing-from-thieves-saher-slides
 
Fun with click house window functions webinar slides 2021-08-19
Fun with click house window functions webinar slides  2021-08-19Fun with click house window functions webinar slides  2021-08-19
Fun with click house window functions webinar slides 2021-08-19
 
financial analytics of AAPL_stock markets
financial analytics of AAPL_stock marketsfinancial analytics of AAPL_stock markets
financial analytics of AAPL_stock markets
 
Go Programming Patterns
Go Programming PatternsGo Programming Patterns
Go Programming Patterns
 
Cassandra
CassandraCassandra
Cassandra
 
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
Warp 10 Platform Presentation - Criteo Beer & Tech 2016-02-03
 
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108
 
Tiered storage intro. By Robert Hodges, Altinity CEO
Tiered storage intro. By Robert Hodges, Altinity CEOTiered storage intro. By Robert Hodges, Altinity CEO
Tiered storage intro. By Robert Hodges, Altinity CEO
 
This is not your father's monitoring.
This is not your father's monitoring.This is not your father's monitoring.
This is not your father's monitoring.
 
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
 

Similar a Cassandra lesson learned - extended

Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax Academy
 
Cassandra, web scale no sql data platform
Cassandra, web scale no sql data platformCassandra, web scale no sql data platform
Cassandra, web scale no sql data platformMarko Švaljek
 
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2aaronmorton
 
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2DataStax
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?Andrii Soldatenko
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
SkySQL Cloud MySQL MariaDB
SkySQL Cloud MySQL MariaDBSkySQL Cloud MySQL MariaDB
SkySQL Cloud MySQL MariaDBlemugfr
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
 
Non-blocking I/O, Event loops and node.js
Non-blocking I/O, Event loops and node.jsNon-blocking I/O, Event loops and node.js
Non-blocking I/O, Event loops and node.jsMarcus Frödin
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsMonal Daxini
 
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗YUCHENG HU
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...InSync2011
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 

Similar a Cassandra lesson learned - extended (20)

Cassandra - lesson learned
Cassandra  - lesson learnedCassandra  - lesson learned
Cassandra - lesson learned
 
Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
 
Cassandra, web scale no sql data platform
Cassandra, web scale no sql data platformCassandra, web scale no sql data platform
Cassandra, web scale no sql data platform
 
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
 
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
SkySQL Cloud MySQL MariaDB
SkySQL Cloud MySQL MariaDBSkySQL Cloud MySQL MariaDB
SkySQL Cloud MySQL MariaDB
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
Non-blocking I/O, Event loops and node.js
Non-blocking I/O, Event loops and node.jsNon-blocking I/O, Event loops and node.js
Non-blocking I/O, Event loops and node.js
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
 
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
 
Load Data Fast!
Load Data Fast!Load Data Fast!
Load Data Fast!
 
1 Dundee - Cassandra 101
1 Dundee - Cassandra 1011 Dundee - Cassandra 101
1 Dundee - Cassandra 101
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 

Más de Andrzej Ludwikowski

Event Sourcing - what could go wrong - Devoxx BE
Event Sourcing - what could go wrong - Devoxx BEEvent Sourcing - what could go wrong - Devoxx BE
Event Sourcing - what could go wrong - Devoxx BEAndrzej Ludwikowski
 
Event Sourcing - what could go wrong - Jfokus 2022
Event Sourcing - what could go wrong - Jfokus 2022Event Sourcing - what could go wrong - Jfokus 2022
Event Sourcing - what could go wrong - Jfokus 2022Andrzej Ludwikowski
 
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
Event sourcing  - what could possibly go wrong ? Devoxx PL 2021Event sourcing  - what could possibly go wrong ? Devoxx PL 2021
Event sourcing - what could possibly go wrong ? Devoxx PL 2021Andrzej Ludwikowski
 
Event Sourcing - what could possibly go wrong?
Event Sourcing - what could possibly go wrong?Event Sourcing - what could possibly go wrong?
Event Sourcing - what could possibly go wrong?Andrzej Ludwikowski
 
Performance tests with Gatling (extended)
Performance tests with Gatling (extended)Performance tests with Gatling (extended)
Performance tests with Gatling (extended)Andrzej Ludwikowski
 
Stress test your backend with Gatling
Stress test your backend with GatlingStress test your backend with Gatling
Stress test your backend with GatlingAndrzej Ludwikowski
 

Más de Andrzej Ludwikowski (9)

Event Sourcing - what could go wrong - Devoxx BE
Event Sourcing - what could go wrong - Devoxx BEEvent Sourcing - what could go wrong - Devoxx BE
Event Sourcing - what could go wrong - Devoxx BE
 
Event Sourcing - what could go wrong - Jfokus 2022
Event Sourcing - what could go wrong - Jfokus 2022Event Sourcing - what could go wrong - Jfokus 2022
Event Sourcing - what could go wrong - Jfokus 2022
 
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
Event sourcing  - what could possibly go wrong ? Devoxx PL 2021Event sourcing  - what could possibly go wrong ? Devoxx PL 2021
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
 
Event Sourcing - what could possibly go wrong?
Event Sourcing - what could possibly go wrong?Event Sourcing - what could possibly go wrong?
Event Sourcing - what could possibly go wrong?
 
Performance tests - it's a trap
Performance tests - it's a trapPerformance tests - it's a trap
Performance tests - it's a trap
 
Performance tests with Gatling (extended)
Performance tests with Gatling (extended)Performance tests with Gatling (extended)
Performance tests with Gatling (extended)
 
Stress test your backend with Gatling
Stress test your backend with GatlingStress test your backend with Gatling
Stress test your backend with Gatling
 
Performance tests with Gatling
Performance tests with GatlingPerformance tests with Gatling
Performance tests with Gatling
 
Annotation processing tool
Annotation processing toolAnnotation processing tool
Annotation processing tool
 

Último

Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxSasikiranMarri
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 

Último (20)

Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 

Cassandra lesson learned - extended

  • 1. Cassandra - lesson learned Andrzej Ludwikowski
  • 2. About me? - www.ludwikowski.info - github.com/aludwiko - @aludwikowski -
  • 3. Why Cassandra? - BigData!!! - Volume (petabytes of data, trillions of entities) - Velocity (real-time, streams, millions of transactions per second) - Variety (un-, semi-, structured) - writes are cheap, reads are ??? - near-linear horizontal scaling (in a proper use cases) - fully distributed, with no single point of failure - data replication by default
  • 4. Cassandra vs CAP? - CAP Theorem - pick two
  • 5. Cassandra vs CAP? - CAP Theorem - pick two
  • 6. Cassandra vs CAP? - CAP Theorem - pick two
  • 10. Write path Node 1 Node 2 Node 3 Node 4 Client (driver)
  • 11. Write path Node 1 Node 2 Node 3 Node 4 Client (driver) - Any node can coordinate any request (NSPOF)
  • 12. - Any node can coordinate any request (NSPOF) - Replication Factor Write path Node 1 Node 2 Node 3 Node 4 Client RF=3
  • 13. - Any node can coordinate any request (NSPOF) - Replication Factor - Consistency Level Write path Node 1 Node 2 Node 3 Node 4 Client RF=3 CL=2
  • 14. - Token ring from -2^63 to 2^64 Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 0100
  • 15. - Token ring from -2^63 to 2^64 - Partitioner: partition key -> token Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 Client Partitioner 0-25 25-50 51-75 76-100 77
  • 16. - Token ring from -2^63 to 2^64 - Partitioner: primary key -> token Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 Client Partitioner 0-25 25-50 51-75 76-100 77
  • 17. - Token ring from -2^63 to 2^64 - Partitioner: primary key -> token Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 Client Partitioner 0-25 25-50 51-75 76-100 77 77 77
  • 18. DEMO
  • 19. Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 20. - Hinted handoff Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 21. - Hinted handoff Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 22. - Hinted handoff - Retry idempotent inserts - build-in policies Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 23. - Hinted handoff - Retry idempotent inserts - build-in policies - Lightweight transactions (Paxos) Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 24. - Hinted handoff - Retry idempotent inserts - build-in policies - Lightweight transactions (Paxos) - Batches Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 25. Write path - node level
  • 26. Write path - why so fast? - Commit log - append only
  • 27. Write path - why so fast?
  • 28. Write path - why so fast? 50,000 t/s 50 t/ms 5 t/100us 1 t/20us
  • 29. Write path - why so fast? - Commit log - append only - Periodic (10s) or batch sync to disk Node 1 Node 2 Node 3 Node 4 Client RF=2 CL=2
  • 30. D asdd R ack 2 R ack 1 Write path - why so fast? - Commit log - append only - Periodic or batch sync to disk - Network topology aware Node 1 Node 2 Node 3 Node 4 Client RF=2 CL=2
  • 31. Write path - why so fast? Client - Commit log - append only - Periodic or batch sync to disk - Network topology aware Asia DC Europe DC
  • 32. - Most recent win - Eager retries - In-memory - MemTable - Row Cache - Bloom Filters - Key Caches - Partition Summaries - On disk - Partition Indexes - SSTables Node 1 Node 2 Node 3 Node 4 Client RF=3 CL=3 Read path timestamp 67 timestamp 99 timestamp 88
  • 33. Immediate vs. Eventual Consistency - if (writeCL + readCL) > replication_factor then immediate consistency - writeCL=ALL, readCL=1 - writeCL=1, readCL=ALL - writeCL,readCL=QUORUM - https://www.ecyrd.com/cassandracalculator/ Node 1 Node 2 Node 3 Node 4 Client RF=3
  • 34. Modeling - new mindset - QDD, Query Driven Development - Nesting is ok - Duplication is ok - Writes are cheap no joins
  • 35. QDD - Conceptual model - Technology independent - Chen notation
  • 36. QDD - Application workflow
  • 37. QDD - Logical model - Chebotko diagram
  • 38. QDD - Physical model - Technology dependent - Analysis and validation (finding problems) - Physical optimization (fixing problems) - Data types
  • 39. Physical storage - Primary key - Partition key CREATE TABLE videos ( id int, title text, runtime int, year int, PRIMARY KEY (id) ); id | title | runtime | year ----+---------------------+---------+------ 1 | dzien swira | 93 | 2002 2 | chlopaki nie placza | 96 | 2000 3 | psy | 104 | 1992 4 | psy 2 | 96 | 1994 1 title runtime year dzien swira 93 2002 2 title runtime year chlopaki... 96 2000 3 title runtime year psy 104 1992 4 title runtime year psy 2 96 1994 SELECT FROM videos WHERE title = ‘dzien swira’
  • 40. Physical storage CREATE TABLE videos_with_clustering ( title text, runtime int, year int, PRIMARY KEY ((title), year) ); - Primary key (could be compound) - Partition key - Clustering column (order, uniqueness) title | year | runtime -------------+------+--------- godzilla | 1954 | 98 godzilla | 1998 | 140 godzilla | 2014 | 123 psy | 1992 | 104 godzilla 1954 runtime 98 1998 runtime 140 2014 runtime 123 1992 runtime 104 psy SELECT FROM videos_with_clustering WHERE title = ‘godzilla’; SELECT FROM videos_with_clustering WHERE title = ‘godzilla’ AND year > 1998;
  • 41. Physical storage CREATE TABLE videos_with_composite_pk( title text, runtime int, year int, PRIMARY KEY ((title, year)) ); - Primary key (could be compound) - Partition key (could be composite) - Clustering column (order, uniqueness) title | year | runtime -------------+------+--------- godzilla | 1954 | 98 godzilla | 1998 | 140 godzilla | 2014 | 123 psy | 1992 | 104 godzilla:1954 runtime 93 godzilla:1998 runtime 140 godzilla:2014 runtime 123 psy:1992 runtime 104 SELECT FROM videos_with_composite_pk WHERE title = ‘godzilla’ AND year = 1954
  • 42. Modeling - clustering column(s) Q: Retrieve videos an actor has appeared in (newest first).
  • 43. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ( ) ) WITH CLUSTERING ORDER BY ( ); Q: Retrieve videos an actor has appeared in (newest first).
  • 44. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date) ) WITH CLUSTERING ORDER BY (added_date desc); Q: Retrieve videos an actor has appeared in (newest first).
  • 45. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date, video_id) ) WITH CLUSTERING ORDER BY (added_date desc); Q: Retrieve videos an actor has appeared in (newest first).
  • 46. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date, video_id, character_name) ) WITH CLUSTERING ORDER BY (added_date desc); Q: Retrieve videos an actor has appeared in (newest first).
  • 47. Modeling - compound partition key CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ( ) ) WITH CLUSTERING ORDER BY ( ); Q: Retrieve last 1000 measurement from given day.
  • 48. Modeling - compound partition key CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id), date, event_time) ) WITH CLUSTERING ORDER BY (event_time desc); Q: Retrieve last 1000 measurement from given day.
  • 49. Modeling - compound partition key CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id), date, event_time) ) WITH CLUSTERING ORDER BY (event_time desc); Q: Retrieve last 1000 measurement from given day. 1 day = 86 400 rows 1 week = 604 800 rows 1 month = 2 592 000 rows 1 year = 31 536 000 rows
  • 50. Modeling - compound partition key CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id, date), event_time) ) WITH CLUSTERING ORDER BY (event_time desc); Q: Retrieve last 1000 measurement from given day.
  • 51. Modeling - TTL CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id, date), event_time) ) WITH CLUSTERING ORDER BY (event_time desc); Retention policy - keep data only from last week. INSERT INTO temperature_by_day … USING TTL 604800;
  • 52. Modeling - bit map index CREATE TABLE car ( year timestamp, model text, color timestamp, vehicle_id int, //other columns PRIMARY KEY ((year, model, color), vehicle_id) ); Q: Find car by year and/or model and/or color. INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', 'blue', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', '', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', 'blue', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', '', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', 'blue', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', '', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', '', 'blue', 13, ...); SELECT * FROM car WHERE year=2000 and model=’’ and color=’blue’;
  • 53. Modeling - wide rows CREATE TABLE user ( email text, name text, age int, PRIMARY KEY (email) ); Q: Find user by email.
  • 54. Modeling - wide rows CREATE TABLE user ( domain text, user text, name text, age int, PRIMARY KEY ((domain), user) ); Q: Find user by email.
  • 55. Modeling - versioning with lightweight transactions CREATE TABLE document ( id text, content text, version int, locked_by text, PRIMARY KEY ((id)) ); INSERT INTO document (id, content , version ) VALUES ( 'my doc', 'some content', 1) IF NOT EXISTS; UPDATE document SET locked_by = 'andrzej' WHERE id = 'my doc' IF locked_by = null; UPDATE document SET content = 'better content', version = 2, locked_by = null WHERE id = 'my doc' IF locked_by = 'andrzej';
  • 56. Modeling - JSON with UDT and tuples { "title": "Example Schema", "type": "object", "properties": { "firstName": “andrzej”, "lastName": “ludwikowski”, "age": { "description": "Age in years", "type": "integer", "minimum": 0 } }, “x_dimension”: “1”, “y_dimension”: “2”, } CREATE TYPE age ( description text, type int, minimum int ); CREATE TYPE prop ( firstName text, lastName text, age frozen <age> ); CREATE TABLE json ( title text, type text, properties list<frozen <prop>>, dimensions tuple<int, int> PRIMARY KEY (title) );
  • 57. Common use cases - Sensor data (Zonar) - Fraud detection (Barracuda) - Playlist and collections (Spotify) - Personalization and recommendation engines (Ebay) - Messaging (Instagram) - Event Sourcing!
  • 58. Common anti use cases - Queue - Search engine
  • 60. Datastax Academy - Introduction to Apache Cassandra - Data Modeling - DataStax Enterprise Foundations of Apache Cassandra - DataStax Enterprise Operations with Apache Cassandra - DataStax Enterprise Search - DataStax Enterprise Analytics with Apache Spark - DataStax Enterprise Graph
  • 61. Competition? ScyllaDB - Cassandra without JVM - same protocol, SSTable compatibility - C++ and Seastar lib - 1,000,000 IOPS
  • 62. Not covered - schema migrations - backups - DSE
  • 63.
  • 64. About me? - www.ludwikowski.info - github.com/aludwiko - @aludwikowski -