TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at - Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...tdc-globalcode
Más contenido relacionado
Similar a TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at - Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
Similar a TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at - Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema? (20)
TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at - Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?
1. Globalcode – Open4education
Cassandra
Why will the relational thinking destroy your system
performance?
Paulo Ricardo R. Almeida
OCJP, 2 years working with Cassandra
2. Globalcode – Open4education
Agenda
• What is Cassandra?
• Why Cassandra?
• Quick Review
• The Problem to tackle
• Relational solution and its drawbacks
• Addressing the problem with C* thinking
• Goals and Non-Goals
• Query First
• The Cassandra solution
• Benchmarking
• Additional resources
5. Globalcode – Open4education
Why Cassandra ?
● Distributed Cache (Netflix EVCache)
● Real time Processing
● Data doesn't fit in one place
● High write workload
○ Time series data
○ Log storage/analysis
● Geographical distribution
● Performance
18. Globalcode – Open4education
Secondary Index
0312 Paulo Almeida
2315 Gessica Dutra
...
0003 Jefferson
….
5 lookups 1 response = poor performance
SELECT * FROM tdc.speaker
WHERE name = 'Paulo Almeida'
19. Globalcode – Open4education
Limitations
● No JOIN, LIKE… support
● No constraints
● No transaction (ACID)
● No consistency (Strong)
● Secondary Index doesn't scale well
20. Globalcode – Open4education
Goals and Non-Goals
● Non-Goals
○ Minimize number of writes
○ Minimize data duplication
● Goals
○ Spread data evenly around the cluster
○ Minimize the number of partitions read
21. Globalcode – Open4education
Query first!
● Know your queries first and model around them
○ Don't model around relations
○ Don't model around objects
○ Try to create a CF where you can satisfy the query by
reading one partition
22. Globalcode – Open4education
● Speaker by state
● Speaker by name
● Talks by speaker name
● Talks by keywords
● Talks by track
Queries
25. Globalcode – Open4education
Data Modeling
CREATE KEYSPACE tdc WITH REPLICATION =
{
'class': 'SimpleStrategy',
'replication_factor': 3
}
26. Globalcode – Open4education
Data Modeling
CREATE TABLE tdc.speaker (
id uuid,
name text,
email text,
bio text,
city text,
state text,
PRIMARY KEY (id)
);
keyspace
PartitionKey
27. Globalcode – Open4education
Data Modeling
CREATE TABLE tdc.speaker_by_name (
speaker_id uuid,
name text
PRIMARY KEY (name, speaker_id)
);
SELECT speaker_id FROM tdc.speaker_by_name;
SELECT * FROM tdc.speaker = $speaker_id
Better approach, requires 2 lookups in any case
Partition Key
28. Globalcode – Open4education
Data Modeling
SELECT * FROM tdc.speaker_by_state
WHERE state = 'PR'
CREATE TABLE tdc.speaker_by_state (
speaker_id uuid,
name text,
state text,
bio text,
PRIMARY KEY (state, name, speaker_id)
) WITH CLUSTERING ORDER BY (city ASC, name ASC);
Partition Key
Clustering Key
29. Globalcode – Open4education
Data Modeling
CREATE TABLE tdc.speaker_by_state (
speaker_id uuid,
name text,
state text,
bio text,
PRIMARY KEY (state, city, name, speaker_id)
) WITH CLUSTERING ORDER BY (city ASC, name ASC);
Partition Key Clustering Key
30. Globalcode – Open4education
Data Modeling
BEGIN BATCH
INSERT INTO speaker (id, …) VALUES (...);
INSERT INTO speaker_by_name (name, ...) VALUES (...);
INSERT INTO speaker_by_state (state, ...) VALUES (...);
APPLY BATCH;
31. Globalcode – Open4education
Data Modeling
CREATE TABLE tdc.talk_by_speaker_name(
talk_id uuid,
talk_name text,
speaker_name text,
date timestamp,
PRIMARY KEY (speaker_name, date DESC, talk_id)
);
32. Globalcode – Open4education
Data Modeling
CREATE INDEX talk_by_track_name ON tdc.talk (track_name)
SELECT * FROM tdc.talk WHERE track_name = 'Test';