NoSQL Data Modeling 101

NoSQL Data Modeling 101
Tzach Livyatan

Content
■ Basic Data Modeling
● CQL
● Partition Key
● Clustering Key
■ Materialized Views
2

3
NoSQL Vs. Relational
Application
Data Model (Schema)
Model (Schema)
Application Data
Relational
NoSQL

➔ Cluster
◆ Keyspace
● Table
● Partition
● Row
○ Column - name / value pair
4
Data Modeling Terminology

What is CQL
■ Cassandra Query Language
■ Similar to SQL (Structured Query Language)
■ Data Definition (DDL)
● CREATE / DELETE / ALTER Keyspace
● CREATE / DELETE / ALTER Table
■ Data Manipulation (DML)
● SELECT
● INSERT
● UPDATE
● DELETE
● BATCH
5

Keyspace
A top-level object that controls the replication per DC.
Contain tables, index, materialized views and user-defined types.
CREATE KEYSPACE Excalibur
WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1'
: 1, 'DC2' : 3}
AND durable_writes = true;
6

Keyspace Example
CREATE KEYSPACE mykeyspace WITH
replication = {'class':
'NetworkTopologyStrategy',
'AWS_US_EAST_1' : 3} AND
durable_writes = true;
USE mykeyspace;
7

Common Data Types
■ ASCII
■ BIGINT
■ BLOB
■ BOOLEAN
■ COUNTER
■ DATE
■ DECIMAL
■ DOUBLE
■ DURATION
■ FLOAT
■ INET
8
■ INT
■ SMALLINT
■ TEXT
■ TIME
■ TIMESTAMP
■ TIMEUUID
■ TINYINT
■ UUID
■ VARCHAR
■ VARINT
* https://docs.scylladb.com/getting-started/types/

Collections
■ Sets
■ Lists
■ Maps
■ UDT
9

11
Key / Value Example
SELECT pet_chip_id,owner,pet_name FROM pet_owner;
pet_chip_id owner pet_name
80d39c78-9dc0-11eb-a8b3-
0242ac130003 642adfee-6ad9-... Buddy
80d39c78-9dc0-11eb-a8b3-
0242ac130003 642adfee-6ad9-... Rocky
80d39c78-9dc0-11eb-a8b3-
0242ac130003 642adfee-6ad9-... Cat
... ... ...

Key / Value Example
CREATE TABLE IF NOT EXISTS pet_owner (
pet_chip_id uuid,
owner uuid,
pet_name text,
PRIMARY KEY (pet_chip_id)
);
Partition Key
pet_chip_id owner pet_name
80d39c78-9dc0-11eb-a8b3-
0242ac130003 642adfee-6ad9-... Buddy
80d39c78-9dc0-11eb-a8b3-
0242ac130003 642adfee-6ad9-... Rocky
80d39c78-9dc0-11eb-a8b3-
0242ac130003 642adfee-6ad9-... Cat
... ... ...
12

13
INSERT INTO pet_owner(pet_chip_id,owner,pet_name) VALUES (a2a60505-3e17-4ad4-8e1a-
f11139caa1cc, 642adfee-6ad9-4ca5-aa32-a72e506b8ad8, 'Buddy');
INSERT INTO pet_owner(pet_chip_id,owner,pet_name) VALUES (80d39c78-9dc0-11eb-a8b3-
0242ac130003, 642adfee-6ad9-4ca5-aa32-a72e506b8ad8, 'Rocky');
INSERT INTO pet_owner(pet_chip_id,owner,pet_name) VALUES (92cf4f94-9dc0-11eb-a8b3-
0242ac130003, b4a63c18-9dc0-11eb-a8b3-0242ac130003, 'Rin Tin Tin');
SELECT * FROM pet_owner;
SELECT * FROM pet_owner WHERE pet_chip_id = 80d39c78-9dc0-11eb-a8b3-0242ac130003;
SELECT * FROM pet_owner WHERE pet_name = 'Rocky'; (?)
Key / Value Example

14
UPDATE pet_owner SET pet_name = 'Cat' WHERE pet_chip_id = 92cf4f94-9dc0-11eb-
a8b3-0242ac130003;
DELETE FROM pet_owner WHERE pet_chip_id = 80d39c78-9dc0-11eb-a8b3-0242ac130003;
SELECT * FROM pet_owner;
Key / Value Example

Choosing a Partition Key
■ High Cardinality
■ Even Distribution
Avoid
■ Low Cardinality
■ Hot Partition
■ Large Partition
16
https://www.codedrome.com/zipfs-law-in-python/

Choosing a Partition Key
17
■ User Name
■ User ID
■ User ID + Time
■ Sensor ID
■ Sensor ID + Time
■ Customer
■ State
■ Age
■ Favorite NBA Team
■ Team Angel or Team Spike
https://commons.wikimedia.org/

Query:
SELECT * from heartrate_v10 WHERE
pet_chip_id = 80d39c78-9dc0-11eb-a8b3-0242ac130003 LIMIT 1;
pet_chip_id = 80d39c78-9dc0-11eb-a8b3-0242ac130003 AND
time >= '2021-05-01 01:00+0000' AND
time < '2021-05-01 01:03+0000';
18
https://gist.github.com/tzach/7486f1a0cc904c52f4514f20f14d2a97
Wide Partition Example

CREATE TABLE heartrate_v10 (
pet_chip_id uuid,
owner uuid,
time timestamp,
heart_rate int,
PRIMARY KEY (pet_chip_id, time)
);
pet_chip_id time heart_rate
80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:00:00.000000+0000 120
80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:01:00.000000+0000 121
80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:02:00.000000+0000 120
Partition Key Clustering Key
19

20
Large
Partition?

Choosing a Clustering Key
21
■ Allow useful range queries
■ Allow useful LIMIT
https://commons.wikimedia.org/

22
pet_chip_id = 80d39c78-9dc0-11eb-a8b3-0242ac130003 LIMIT 1;
pet_chip_id uuid,
time timestamp,
heart_rate int,
) WITH CLUSTERING ORDER BY (time DESC);

23
pet_chip_id uuid,
date text,
time timestamp,
heart_rate int,
PRIMARY KEY ((pet_chip_id, date), time));
Too Wide Partition ?

Example - Query by Owner
SELECT * FROM heartrate_v10 WHERE pet_chip_id = a2a60505-3e17-4ad4-8e1a-
f11139caa1cc;
SELECT * FROM heartrate_v10 WHERE owner = 642adfee-6ad9-4ca5-aa32-
a72e506b8ad8;
SELECT * FROM heartrate_v10 WHERE owner = 642adfee-6ad9-4ca5-aa32-
a72e506b8ad8 ALLOW FILTERING;
25
https://gist.github.com/tzach/4b9dadbc6e8a9c50369da05631c5e13e
Try
TRACING ON;
TRACING OFF;

Solution - Materialized Views
pet_chip_id uuid, owner uuid, time timestamp, heart_rate int,
);
SELECT * FROM heartrate_by_owner WHERE owner = 642adfee-6ad9-4ca5-aa32-
a72e506b8ad8;
CREATE MATERIALIZED VIEW heartrate_by_owner AS
SELECT * FROM heartrate_v10
WHERE owner IS NOT NULL AND pet_chip_id IS NOT NULL AND time IS NOT NULL
PRIMARY KEY(owner, pet_chip_id, time);
DROP MATERIALIZED VIEW heartrate_by_owner;
ALTER MATERIALIZED VIEW heartrate_by_owner [WITH table_options];
https://docs.scylladb.com/getting-started/mv/ 26

Example
27
pet_chip_id time owner heart_rate
80d39c78-9dc0-11eb-a8b3-
0242ac130003 2021-05-01 01:00:00.000000+0000
642adfee-6ad9-4ca5-aa32-
a72e506b8ad8 120
80d39c78-9dc0-11eb-a8b3-
0242ac130003 2021-05-01 01:01:00.000000+0000
a72e506b8ad8 121
80d39c78-9dc0-11eb-a8b3-
0242ac130003 2021-05-01 01:02:00.000000+0000
a72e506b8ad8 120
owner pet_chip_id time heart_rate
a72e506b8ad8
80d39c78-9dc0-11eb-a8b3-
0242ac130003 2021-05-01 01:00:00.000000+0000 120
a72e506b8ad8
80d39c78-9dc0-11eb-a8b3-
0242ac130003 2021-05-01 01:01:00.000000+0000 121
a72e506b8ad8
80d39c78-9dc0-11eb-a8b3-
0242ac130003 2021-05-01 01:02:00.000000+0000 120
Base Table
View

1. INSERT INTO heartrate
(pet_chip_id,
Owner,
Time,
heart_rate)
VALUES (..);
2. INSERT INTO
heartrate
Base replica
View replica
Coordinator
3. INSERT INTO
heartrate_by_owner
MV - Write Path
29

MV - Read Path
2.
SELECT * FROM
heartrate_by_owner
WHERE owner = ‘642a..’;
Base replica
View replica
Coordinator
1.
SELECT * FROM
heartrate_by_owner
WHERE owner = ‘642a..’;
30

http://localhost:3000/d/overview-2019-1/overview 31

Keep in touch!
Tzach Livyatan
ScyllaDB
tzach@scylladb.com
@tzachL

NoSQL Data Modeling 101

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NoSQL Data Modeling 101

Similar to NoSQL Data Modeling 101 (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

NoSQL Data Modeling 101

Editor's Notes