SlideShare una empresa de Scribd logo
1 de 59
Descargar para leer sin conexión
Cassandra - lesson learned
Andrzej Ludwikowski
About me?
- http://aludwikowski.blogspot.com/
- https://github.com/aludwiko
- @aludwikowski
- SoftwareMill
Why cassandra?
- BigData!!!
- Volume (petabytes of data, trillions of entities)
- Velocity (real-time, streams, millions of transactions per second)
- Variety (un-, semi-, structured)
- Near-linear horizontal scaling (in proper use cases)
- Fully distributed, with no single point of failure
- Data replication
- By default
What is cassandra vs CAP?
- CAP Theorem - pick two
What is cassandra vs CAP?
- CAP Theorem - pick two
What is cassandra vs CAP?
- CAP Theorem - pick two
Origins?
2010
Name?
Name?
Write path
Node 1
Node 2
Node 3
Node 4
Client
(driver)
Write path
Node 1
Node 2
Node 3
Node 4
Client
(driver)
- Any node can coordinate any request (NSPOF)
- Any node can coordinate any request (NSPOF)
- Replication Factor
Write path
Node 1
Node 2
Node 3
Node 4
Client
RF=3
- Any node can coordinate any request (NSPOF)
- Replication Factor
- Consistency Level
Write path
Node 1
Node 2
Node 3
Node 4
Client
RF=3
CL=2
- Token ring from -2^63 to 2^64
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
0100
- Token ring from -2^63 to 2^64
- Partitioner: partition key -> token
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
Client
Partitioner
0-25
25-50
51-75
76-100
77
- Token ring from -2^63 to 2^64
- Partitioner: primary key -> token
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
Client
Partitioner
0-25
25-50
51-75
76-100
77
- Token ring from -2^63 to 2^64
- Partitioner: primary key -> token
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
Client
Partitioner
0-25
25-50
51-75
76-100
77
77
77
- Token ring from -2^63 to 2^64
- Partitioner: primary key -> token
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
Client
0-25
Partitioner
77
25-50
51-75
76-100
77
77
DEMO
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
- Hinted handoff
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
- Hinted handoff
- Retry idempotent inserts
- build-in policies
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
- Hinted handoff
- Retry idempotent inserts
- build-in policies
- Lightweight transactions (Paxos)
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
- Hinted handoff
- Retry idempotent inserts
- build-in policies
- Lightweight transactions (Paxos)
- Batches
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-25
77
25-50
51-75
76-100
77
77
Write path - node level
Write path - why so fast?
- Commit log - append only
Write path - why so fast?
Write path - why so fast?
50,000 t/s
50 t/ms
5 t/100us
1 t/20us
Write path - why so fast?
- Commit log - append only
- Periodic (10s) or batch sync to disk
Node 1
Node 2
Node 3
Node 4
Client
RF=2
CL=2
D
asdd
R
ack
2
R
ack
1
Write path - why so fast?
- Commit log - append only
- Periodic or batch sync to disk
- Network topology aware
Node 1
Node 2
Node 3
Node 4
Client
RF=2
CL=2
Write path - why so fast?
Client
- Commit log - append only
- Periodic or batch sync to disk
- Network topology aware
Asia DC
Europe DC
- Most recent win
- Eager retries
- In-memory
- MemTable
- Row Cache
- Bloom Filters
- Key Caches
- Partition Summaries
- On disk
- Partition Indexes
- SSTables
Node 1
Node 2
Node 3
Node 4
Client
RF=3
CL=3
Read path
timestamp 67
timestamp 99
timestamp 88
Immediate vs. Eventual Consistency
- if (writeCL + readCL) > replication_factor then immediate consistency
- writeCL=ALL, readCL=1
- writeCL=1, readCL=ALL
- writeCL,readCL=QUORUM
- If "stale" is measured in milliseconds,
how much are those milliseconds worth?
Node 1
Node 2
Node 3
Node 4
Client
RF=3
Modeling - new mindset
- QDD, Query Driven Development
- Nesting is ok
- Duplication is ok
- Writes are cheap
QDD - Conceptual model
- Technology independent
- Chen notation
QDD - Application workflow
QDD - Logical model
- Chebotko diagram
QDD - Physical model
- Technology dependent
- Analysis and validation (finding problems)
- Physical optimization (fixing problems)
- Data types
Physical storage
- Primary key
- Partition key
CREATE TABLE videos (
id int,
title text,
runtime int,
year int,
PRIMARY KEY (id)
);
id | title | runtime | year
----+---------------------+---------+------
1 | dzien swira | 93 | 2002
2 | chlopaki nie placza | 96 | 2000
3 | psy | 104 | 1992
4 | psy 2 | 96 | 1994
1
title runtime year
dzien swira 93 2002
2
title runtime year
chlopaki... 96 2000
3
title runtime year
psy 104 1992
4
title runtime year
psy 2 96 1994
SELECT FROM videos
WHERE title = ‘dzien swira’
Physical storage
CREATE TABLE videos_with_clustering (
title text,
runtime int,
year int,
PRIMARY KEY ((title), year)
);
- Primary key (could be compound)
- Partition key
- Clustering column (order, uniqueness)
title | year | runtime
-------------+------+---------
godzilla | 1954 | 98
godzilla | 1998 | 140
godzilla | 2014 | 123
psy | 1992 | 104
godzilla
1954 runtime
98
1998 runtime
140
2014 runtime
123
1992 runtime
104
psy
SELECT FROM videos_with_clustering
WHERE title = ‘godzilla’
Physical storage
CREATE TABLE videos_with_composite_pk(
title text,
runtime int,
year int,
PRIMARY KEY ((title, year))
);
- Primary key (could be compound)
- Partition key (could be composite)
- Clustering column (order, uniqueness)
title | year | runtime
-------------+------+---------
godzilla | 1954 | 98
godzilla | 1998 | 140
godzilla | 2014 | 123
psy | 1992 | 104
godzilla:1954
runtime
93
godzilla:1998
runtime
140
godzilla:2014
runtime
123
psy:1992
runtime
104
SELECT FROM videos_with_composite_pk
WHERE title = ‘godzilla’
AND year = 1954
Modeling - clustering column(s)
Q: Retrieve videos an actor has appeared in (newest first).
Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ( )
) WITH CLUSTERING ORDER BY ( );
Q: Retrieve videos an actor has appeared in (newest first).
Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ((actor), added_date)
) WITH CLUSTERING ORDER BY (added_date desc);
Q: Retrieve videos an actor has appeared in (newest first).
Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ((actor), added_date, video_id)
) WITH CLUSTERING ORDER BY (added_date desc);
Q: Retrieve videos an actor has appeared in (newest first).
Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ((actor), added_date, video_id, character_name)
) WITH CLUSTERING ORDER BY (added_date desc);
Q: Retrieve videos an actor has appeared in (newest first).
Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ( )
) WITH CLUSTERING ORDER BY ( );
Q: Retrieve last 1000 measurement from given day.
Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ((weather_station_id), date, event_time)
) WITH CLUSTERING ORDER BY (event_time desc);
Q: Retrieve last 1000 measurement from given day.
1 day = 86 400 rows
1 week = 604 800 rows
1 month = 2 592 000 rows
1 year = 31 536 000 rows
Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ((weather_station_id, date), event_time)
) WITH CLUSTERING ORDER BY (event_time desc);
Q: Retrieve last 1000 measurement from given day.
Modeling - TTL
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ((weather_station_id, date), event_time)
) WITH CLUSTERING ORDER BY (event_time desc);
Retention policy - keep data only from last week.
INSERT INTO temperature_by_day … USING TTL 604800;
Modeling - bit map index
CREATE TABLE car (
year timestamp,
model text,
color timestamp,
vehicle_id int,
//other columns
PRIMARY KEY ((year, model, color), vehicle_id)
);
Q: Find car by year and/or model and/or color.
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', 'blue', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', '', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', 'blue', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', '', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', 'blue', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', '', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', '', 'blue', 13, ...);
SELECT * FROM car WHERE year=2000 and model=’’ and color=’blue’;
Modeling - wide rows
CREATE TABLE user (
email text,
name text,
age int,
PRIMARY KEY (email)
);
Q: Find user by email.
Modeling - wide rows
CREATE TABLE user (
domain text,
user text,
name text,
age int,
PRIMARY KEY ((domain), user)
);
Q: Find user by email.
Modeling - versioning with lightweight transactions
CREATE TABLE document (
id text,
content text,
version int,
locked_by text,
PRIMARY KEY ((id))
);
INSERT INTO document (id, content , version ) VALUES ( 'my doc', 'some content', 1)
IF NOT EXISTS;
UPDATE document SET locked_by = 'andrzej' WHERE id = 'my doc' IF locked_by = null;
UPDATE document SET content = 'better content', version = 2, locked_by = null
WHERE id = 'my doc' IF locked_by = 'andrzej';
Modeling - JSON with UDT and tuples
{
"title": "Example Schema",
"type": "object",
"properties": {
"firstName": “andrzej”,
"lastName": “ludwikowski”,
"age": {
"description": "Age in years",
"type": "integer",
"minimum": 0
}
},
“x_dimension”: “1”,
“y_dimension”: “2”,
}
CREATE TYPE age (
description text,
type int,
minimum int
);
CREATE TYPE prop (
firstName text,
lastName text,
age frozen <age>
);
CREATE TABLE json (
title text,
type text,
properties list<frozen <prop>>,
dimensions tuple<int, int>
PRIMARY KEY (title)
);
Common use cases
- Sensor data (Zonar)
- Fraud detection (Barracuda)
- Playlist and collections (Spotify)
- Personalization and recommendation engines (Ebay)
- Messaging (Instagram)
Common anti use cases
- Queue
- Search engine
Cassandra  - lesson learned
Cassandra  - lesson learned

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Virtualizing Java in Java (jug.ru)
Virtualizing Java in Java (jug.ru)Virtualizing Java in Java (jug.ru)
Virtualizing Java in Java (jug.ru)
 
Java black box profiling JUG.EKB 2016
Java black box profiling JUG.EKB 2016Java black box profiling JUG.EKB 2016
Java black box profiling JUG.EKB 2016
 
자바 성능 강의
자바 성능 강의자바 성능 강의
자바 성능 강의
 
NYC Java Meetup - Profiling and Performance
NYC Java Meetup - Profiling and PerformanceNYC Java Meetup - Profiling and Performance
NYC Java Meetup - Profiling and Performance
 
Java on Linux for devs and ops
Java on Linux for devs and opsJava on Linux for devs and ops
Java on Linux for devs and ops
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
I know why your Java is slow
I know why your Java is slowI know why your Java is slow
I know why your Java is slow
 
Find bottleneck and tuning in Java Application
Find bottleneck and tuning in Java ApplicationFind bottleneck and tuning in Java Application
Find bottleneck and tuning in Java Application
 
Advanced VCL: how to use restart
Advanced VCL: how to use restartAdvanced VCL: how to use restart
Advanced VCL: how to use restart
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 HowtoPostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
 
Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Mасштабирование микросервисов на Go, Matt Heath (Hailo)Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Mасштабирование микросервисов на Go, Matt Heath (Hailo)
 
What every Java developer should know about network?
What every Java developer should know about network?What every Java developer should know about network?
What every Java developer should know about network?
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
 
Tools for Metaspace
Tools for MetaspaceTools for Metaspace
Tools for Metaspace
 
Linux tuning for PostgreSQL at Secon 2015
Linux tuning for PostgreSQL at Secon 2015Linux tuning for PostgreSQL at Secon 2015
Linux tuning for PostgreSQL at Secon 2015
 
Celery: The Distributed Task Queue
Celery: The Distributed Task QueueCelery: The Distributed Task Queue
Celery: The Distributed Task Queue
 
Managing MariaDB Server operations with Percona Toolkit
Managing MariaDB Server operations with Percona ToolkitManaging MariaDB Server operations with Percona Toolkit
Managing MariaDB Server operations with Percona Toolkit
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaAutovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
 

Destacado

20130625101128670 copy
20130625101128670 copy20130625101128670 copy
20130625101128670 copy
JP MERLANO
 
Judith López Ant.anip course final project
Judith López Ant.anip course final projectJudith López Ant.anip course final project
Judith López Ant.anip course final project
juditlopins
 

Destacado (13)

Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Res (2015) security donnalee
Res (2015) security   donnaleeRes (2015) security   donnalee
Res (2015) security donnalee
 
IOT Exec Summary
IOT Exec SummaryIOT Exec Summary
IOT Exec Summary
 
All about lindsay costigan
All about lindsay costiganAll about lindsay costigan
All about lindsay costigan
 
Earnings Season: 5 Must See Charts
Earnings Season:  5 Must See ChartsEarnings Season:  5 Must See Charts
Earnings Season: 5 Must See Charts
 
Nanomedicine S.A. Laimos High School
Nanomedicine S.A.  Laimos High SchoolNanomedicine S.A.  Laimos High School
Nanomedicine S.A. Laimos High School
 
Crazy road signs
Crazy road signs Crazy road signs
Crazy road signs
 
20130625101128670 copy
20130625101128670 copy20130625101128670 copy
20130625101128670 copy
 
Judith López Ant.anip course final project
Judith López Ant.anip course final projectJudith López Ant.anip course final project
Judith López Ant.anip course final project
 
Robert Miller CV uptodate
Robert Miller CV uptodateRobert Miller CV uptodate
Robert Miller CV uptodate
 
Nanomedicine Α.Ε. Γυμνάσιο- Λ.Τ. Λαιμού Πρεσπών
Nanomedicine Α.Ε.  Γυμνάσιο- Λ.Τ. Λαιμού ΠρεσπώνNanomedicine Α.Ε.  Γυμνάσιο- Λ.Τ. Λαιμού Πρεσπών
Nanomedicine Α.Ε. Γυμνάσιο- Λ.Τ. Λαιμού Πρεσπών
 
Report
ReportReport
Report
 
ebook_34013
ebook_34013ebook_34013
ebook_34013
 

Similar a Cassandra - lesson learned

Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Design Verification Using SystemC
Design Verification Using SystemCDesign Verification Using SystemC
Design Verification Using SystemC
DVClub
 

Similar a Cassandra - lesson learned (20)

Cassandra lesson learned - extended
Cassandra   lesson learned  - extendedCassandra   lesson learned  - extended
Cassandra lesson learned - extended
 
Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Cassandra, web scale no sql data platform
Cassandra, web scale no sql data platformCassandra, web scale no sql data platform
Cassandra, web scale no sql data platform
 
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
 
Design Verification Using SystemC
Design Verification Using SystemCDesign Verification Using SystemC
Design Verification Using SystemC
 
SkySQL Cloud MySQL MariaDB
SkySQL Cloud MySQL MariaDBSkySQL Cloud MySQL MariaDB
SkySQL Cloud MySQL MariaDB
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
 
Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101
 
Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101
 
Riviera jug apicassandra
Riviera jug apicassandraRiviera jug apicassandra
Riviera jug apicassandra
 
1 Dundee - Cassandra 101
1 Dundee - Cassandra 1011 Dundee - Cassandra 101
1 Dundee - Cassandra 101
 
Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassa...
Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassa...Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassa...
Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassa...
 

Más de Andrzej Ludwikowski

Más de Andrzej Ludwikowski (7)

Event Sourcing - what could go wrong - Devoxx BE
Event Sourcing - what could go wrong - Devoxx BEEvent Sourcing - what could go wrong - Devoxx BE
Event Sourcing - what could go wrong - Devoxx BE
 
Event Sourcing - what could go wrong - Jfokus 2022
Event Sourcing - what could go wrong - Jfokus 2022Event Sourcing - what could go wrong - Jfokus 2022
Event Sourcing - what could go wrong - Jfokus 2022
 
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
Event sourcing  - what could possibly go wrong ? Devoxx PL 2021Event sourcing  - what could possibly go wrong ? Devoxx PL 2021
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
 
Event Sourcing - what could possibly go wrong?
Event Sourcing - what could possibly go wrong?Event Sourcing - what could possibly go wrong?
Event Sourcing - what could possibly go wrong?
 
Performance tests - it's a trap
Performance tests - it's a trapPerformance tests - it's a trap
Performance tests - it's a trap
 
Performance tests with Gatling
Performance tests with GatlingPerformance tests with Gatling
Performance tests with Gatling
 
Annotation processing tool
Annotation processing toolAnnotation processing tool
Annotation processing tool
 

Último

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Último (20)

Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 

Cassandra - lesson learned

  • 1. Cassandra - lesson learned Andrzej Ludwikowski
  • 2. About me? - http://aludwikowski.blogspot.com/ - https://github.com/aludwiko - @aludwikowski - SoftwareMill
  • 3. Why cassandra? - BigData!!! - Volume (petabytes of data, trillions of entities) - Velocity (real-time, streams, millions of transactions per second) - Variety (un-, semi-, structured) - Near-linear horizontal scaling (in proper use cases) - Fully distributed, with no single point of failure - Data replication - By default
  • 4. What is cassandra vs CAP? - CAP Theorem - pick two
  • 5. What is cassandra vs CAP? - CAP Theorem - pick two
  • 6. What is cassandra vs CAP? - CAP Theorem - pick two
  • 10. Write path Node 1 Node 2 Node 3 Node 4 Client (driver)
  • 11. Write path Node 1 Node 2 Node 3 Node 4 Client (driver) - Any node can coordinate any request (NSPOF)
  • 12. - Any node can coordinate any request (NSPOF) - Replication Factor Write path Node 1 Node 2 Node 3 Node 4 Client RF=3
  • 13. - Any node can coordinate any request (NSPOF) - Replication Factor - Consistency Level Write path Node 1 Node 2 Node 3 Node 4 Client RF=3 CL=2
  • 14. - Token ring from -2^63 to 2^64 Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 0100
  • 15. - Token ring from -2^63 to 2^64 - Partitioner: partition key -> token Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 Client Partitioner 0-25 25-50 51-75 76-100 77
  • 16. - Token ring from -2^63 to 2^64 - Partitioner: primary key -> token Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 Client Partitioner 0-25 25-50 51-75 76-100 77
  • 17. - Token ring from -2^63 to 2^64 - Partitioner: primary key -> token Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 Client Partitioner 0-25 25-50 51-75 76-100 77 77 77
  • 18. - Token ring from -2^63 to 2^64 - Partitioner: primary key -> token Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 Client 0-25 Partitioner 77 25-50 51-75 76-100 77 77
  • 19. DEMO
  • 20. Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 21. - Hinted handoff Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 22. - Hinted handoff - Retry idempotent inserts - build-in policies Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 23. - Hinted handoff - Retry idempotent inserts - build-in policies - Lightweight transactions (Paxos) Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 24. - Hinted handoff - Retry idempotent inserts - build-in policies - Lightweight transactions (Paxos) - Batches Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  • 25. Write path - node level
  • 26. Write path - why so fast? - Commit log - append only
  • 27. Write path - why so fast?
  • 28. Write path - why so fast? 50,000 t/s 50 t/ms 5 t/100us 1 t/20us
  • 29. Write path - why so fast? - Commit log - append only - Periodic (10s) or batch sync to disk Node 1 Node 2 Node 3 Node 4 Client RF=2 CL=2
  • 30. D asdd R ack 2 R ack 1 Write path - why so fast? - Commit log - append only - Periodic or batch sync to disk - Network topology aware Node 1 Node 2 Node 3 Node 4 Client RF=2 CL=2
  • 31. Write path - why so fast? Client - Commit log - append only - Periodic or batch sync to disk - Network topology aware Asia DC Europe DC
  • 32. - Most recent win - Eager retries - In-memory - MemTable - Row Cache - Bloom Filters - Key Caches - Partition Summaries - On disk - Partition Indexes - SSTables Node 1 Node 2 Node 3 Node 4 Client RF=3 CL=3 Read path timestamp 67 timestamp 99 timestamp 88
  • 33. Immediate vs. Eventual Consistency - if (writeCL + readCL) > replication_factor then immediate consistency - writeCL=ALL, readCL=1 - writeCL=1, readCL=ALL - writeCL,readCL=QUORUM - If "stale" is measured in milliseconds, how much are those milliseconds worth? Node 1 Node 2 Node 3 Node 4 Client RF=3
  • 34. Modeling - new mindset - QDD, Query Driven Development - Nesting is ok - Duplication is ok - Writes are cheap
  • 35. QDD - Conceptual model - Technology independent - Chen notation
  • 36. QDD - Application workflow
  • 37. QDD - Logical model - Chebotko diagram
  • 38. QDD - Physical model - Technology dependent - Analysis and validation (finding problems) - Physical optimization (fixing problems) - Data types
  • 39. Physical storage - Primary key - Partition key CREATE TABLE videos ( id int, title text, runtime int, year int, PRIMARY KEY (id) ); id | title | runtime | year ----+---------------------+---------+------ 1 | dzien swira | 93 | 2002 2 | chlopaki nie placza | 96 | 2000 3 | psy | 104 | 1992 4 | psy 2 | 96 | 1994 1 title runtime year dzien swira 93 2002 2 title runtime year chlopaki... 96 2000 3 title runtime year psy 104 1992 4 title runtime year psy 2 96 1994 SELECT FROM videos WHERE title = ‘dzien swira’
  • 40. Physical storage CREATE TABLE videos_with_clustering ( title text, runtime int, year int, PRIMARY KEY ((title), year) ); - Primary key (could be compound) - Partition key - Clustering column (order, uniqueness) title | year | runtime -------------+------+--------- godzilla | 1954 | 98 godzilla | 1998 | 140 godzilla | 2014 | 123 psy | 1992 | 104 godzilla 1954 runtime 98 1998 runtime 140 2014 runtime 123 1992 runtime 104 psy SELECT FROM videos_with_clustering WHERE title = ‘godzilla’
  • 41. Physical storage CREATE TABLE videos_with_composite_pk( title text, runtime int, year int, PRIMARY KEY ((title, year)) ); - Primary key (could be compound) - Partition key (could be composite) - Clustering column (order, uniqueness) title | year | runtime -------------+------+--------- godzilla | 1954 | 98 godzilla | 1998 | 140 godzilla | 2014 | 123 psy | 1992 | 104 godzilla:1954 runtime 93 godzilla:1998 runtime 140 godzilla:2014 runtime 123 psy:1992 runtime 104 SELECT FROM videos_with_composite_pk WHERE title = ‘godzilla’ AND year = 1954
  • 42. Modeling - clustering column(s) Q: Retrieve videos an actor has appeared in (newest first).
  • 43. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ( ) ) WITH CLUSTERING ORDER BY ( ); Q: Retrieve videos an actor has appeared in (newest first).
  • 44. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date) ) WITH CLUSTERING ORDER BY (added_date desc); Q: Retrieve videos an actor has appeared in (newest first).
  • 45. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date, video_id) ) WITH CLUSTERING ORDER BY (added_date desc); Q: Retrieve videos an actor has appeared in (newest first).
  • 46. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date, video_id, character_name) ) WITH CLUSTERING ORDER BY (added_date desc); Q: Retrieve videos an actor has appeared in (newest first).
  • 47. Modeling - compound partition key CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ( ) ) WITH CLUSTERING ORDER BY ( ); Q: Retrieve last 1000 measurement from given day.
  • 48. Modeling - compound partition key CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id), date, event_time) ) WITH CLUSTERING ORDER BY (event_time desc); Q: Retrieve last 1000 measurement from given day. 1 day = 86 400 rows 1 week = 604 800 rows 1 month = 2 592 000 rows 1 year = 31 536 000 rows
  • 49. Modeling - compound partition key CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id, date), event_time) ) WITH CLUSTERING ORDER BY (event_time desc); Q: Retrieve last 1000 measurement from given day.
  • 50. Modeling - TTL CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id, date), event_time) ) WITH CLUSTERING ORDER BY (event_time desc); Retention policy - keep data only from last week. INSERT INTO temperature_by_day … USING TTL 604800;
  • 51. Modeling - bit map index CREATE TABLE car ( year timestamp, model text, color timestamp, vehicle_id int, //other columns PRIMARY KEY ((year, model, color), vehicle_id) ); Q: Find car by year and/or model and/or color. INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', 'blue', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', '', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', 'blue', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', '', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', 'blue', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', '', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', '', 'blue', 13, ...); SELECT * FROM car WHERE year=2000 and model=’’ and color=’blue’;
  • 52. Modeling - wide rows CREATE TABLE user ( email text, name text, age int, PRIMARY KEY (email) ); Q: Find user by email.
  • 53. Modeling - wide rows CREATE TABLE user ( domain text, user text, name text, age int, PRIMARY KEY ((domain), user) ); Q: Find user by email.
  • 54. Modeling - versioning with lightweight transactions CREATE TABLE document ( id text, content text, version int, locked_by text, PRIMARY KEY ((id)) ); INSERT INTO document (id, content , version ) VALUES ( 'my doc', 'some content', 1) IF NOT EXISTS; UPDATE document SET locked_by = 'andrzej' WHERE id = 'my doc' IF locked_by = null; UPDATE document SET content = 'better content', version = 2, locked_by = null WHERE id = 'my doc' IF locked_by = 'andrzej';
  • 55. Modeling - JSON with UDT and tuples { "title": "Example Schema", "type": "object", "properties": { "firstName": “andrzej”, "lastName": “ludwikowski”, "age": { "description": "Age in years", "type": "integer", "minimum": 0 } }, “x_dimension”: “1”, “y_dimension”: “2”, } CREATE TYPE age ( description text, type int, minimum int ); CREATE TYPE prop ( firstName text, lastName text, age frozen <age> ); CREATE TABLE json ( title text, type text, properties list<frozen <prop>>, dimensions tuple<int, int> PRIMARY KEY (title) );
  • 56. Common use cases - Sensor data (Zonar) - Fraud detection (Barracuda) - Playlist and collections (Spotify) - Personalization and recommendation engines (Ebay) - Messaging (Instagram)
  • 57. Common anti use cases - Queue - Search engine