Cassandra NYC 2011 Data Modeling

Data Modeling Examples
Matthew F. Dennis // @mdennis

Overview
● general guiding goals for Cassandra data
models

● Interesting and/or common examples/questions
to get us started

● Should be plenty of time at the end for
questions, so bring them up if you have them !

Data Modeling Goals
● Keep data queried together on disk together
● In a more general sense think about the
efficiency of querying your data and work
backward from there to a model in Cassandra
● Usually, you shouldn't try to normalize your data
(contrary to many use cases in relational
databases)
● Usually better to keep a record that something
happened as opposed to changing a value (not
always the best approach though)

Time Series Data
● Easily the most common use of Cassandra
● Financial tick data
● Click streams
● Sensor data
● Performance metrics
● GPS data
● Event logs
● etc, etc, etc ...
● All of the above are essentially the same as far as
C* is concerned

Time Series Thought Model
● Things happen in some timestamp ordered
stream and consist of values associated with
the given timestamp (i.e. “data points”)
– Every 30 seconds record location, speed, heading and
engine temp
– Every 5 minutes record CPU, IO and Memory usage
● We are interested in recreating, aggregating
and/or analyzing arbitrary time slices of the
stream
– Where was agent:007 and what was he doing between
11:21am and 2:38pm yesterday?
– What are the last N actions foo did on my site?

Data Points Defined
● Each data point has 1-N values

● Each data point corresponds to a specific point
in time or an interval/bucket (e.g. 5 th minute of
17th hour on some date)

Data Points Mapped to Cassandra
● Row Key is id of the data point stream bucketed by time
– e.g. plane01:jan_2011 or plane01:jan_01_2011 for month or day buckets
respectively

● Column Name is TimeUUID(timestamp of date point)

● Column Value is serialized data point
– JSON, XML, pickle, msgpack, thrift, protobuf, avro, BSON, WTFe

● Bucketing
– Avoids always requiring multiple seeks when only small slices of the stream are
requested (e.g. stream is 5 years old but I'm on only interested in Jan 5 th 3 years
ago and/or yesterday between 2pm and 3pm).
– Make it easy to lazily aggregate old stream activity
– Reduces compaction overhead since old rows will never have to be merged again
(until you “back fill” and/or delete something)

A Slightly More Concrete Example
● Sensor data from airplanes

● Every 30 seconds each plane sends
latitude+longitude, altitude and wine remaining
in mdennis' glass.

The Visual
plane5:jan_2011

TimeUUID0 TimeUUID1 TimeUUID2

p5:j11 28.90, 124.30
45K feet
28.85, 124.25
44K feet
28.81, 124.22
44K feet
70% 50% 95%

Middle of the ocean and half
a glass of wine at 44K feet

● Row Key is the id of stream being recorded (e.g.
plane5:jan_2011)
● Column Name is timestamp (or TimeUUID) associated with
the data point
● Column Value is the value of the event (e.g. protobuf
serialized lat/long+alt+wine_level)

Querying
● When querying, construct TimeUUIDs for
the min/max of the time range in question
and use them as the start/end in your
get_slice call

● Or use a empty start and/or end along with
a count

Bucket Sizes?
● Depends greatly on
● Average size of time slice queried
● Average data point size
● Write rate of data points to a stream
● IO capacity of the nodes

So... Bucket Sizes?
● No Bigger than a few GB per row
● bucket_size * write_rate * sizeof(avg_data_point)
● Bucket size >= average size of time slice queried
● No more than maybe 10M entries per row
● No more than a month if you have lots of different
streams
● NB: there are exceptions to all of the above, which
are really nothing more than guidelines

Ordering
● In cases where the most recent data is the
most interesting (e.g. last N events for entity foo
or last hour of events for entity bar), you can
reverse the comparator (i.e. sort descending
instead of ascending)
● http://thelastpickle.com/2011/10/03/Reverse-Comparators/
● https://issues.apache.org/jira/browse/CASSANDRA-2355

Spanning Buckets
● If your time slice spans buckets, you'll need to
construct all the row keys in question (i.e. number of
unique row keys = spans+1)
● If you want all the results between the dates, pass
all the row keys to multiget_slice with the start and
end of the desired time slice
● If you only want the first N results within your time
slice, lowest latency comes from multiget_slice as
above but best efficiency comes from serially paging
one row key at a time until your desired count is
reached

Expiring Streams
(e.g. “I only care about the past year”)

● Just set the TTL to the age you want to keep
● yeah, that's pretty much it ...

Counters
● Sometimes you're only interested in counting
things that happened within some time slice
● Minor adaptation to the previous content to use
counters (be aware they are not idempotent)
● Column names become buckets
● Values become counters

Example: Counting User Logins
user3:system5:logins:by_day

20110107 ... 20110523
U3:S5:L:D
2 ... 7

2 logins on Jan 7th 2011 7 logins on May 23rd 2011
for user 3 on system 5 for user 3 on system 5

user3:system5:logins:by_hour

2011010710 ... 2011052316
U3:S5:L:H
1 ... 7

one login for user 3 on system 5 2 logins for user 3 on system 5
on Jan 7th 2011 for the 10th hour on May 23rd 2011 for the 16th hour

Eventually Atomic
● In a legacy RDBMS atomicity is “easy”
● Attempting full ACID compliance in distributed systems is a
bad idea (and actually impossible in the strictest sense)
● However, consistency is important and can certainly be
achieved in C*
● Many approaches / alternatives
● I like a transaction log approach, especially in the context
of C*

Transaction Logs
(in this context)
● Records what is going to be performed before it
is actually performed
● Performs the actions that need to be atomic (in
the indivisible sense, not the all at once sense
which is usually what people mean when they
say isolation)
● Marks that the actions were performed

In Cassandra
● Serialize all actions that need to be performed
in a single column – JSON, XML, YAML (yuck!),
pickle, JSO, msgpack, protobuf, et cetera
● Row Key = randomly chosen C* node token
● Column Name = TimeUUID(nowish)
● Perform actions
● Delete Column

Configuration Details
● Short gc_grace_seconds on the XACT_LOG
Column Family (e.g. 5 minutes)
● Write to XACT_LOG at CL.QUORUM or
CL.LOCAL_QUORUM for durability
● if it fails with an unavailable exception, pick a
different node token and/or node and try again
(gives same semantics as a relational DB in terms
of knowing the state of your transaction)

Failures
● Before insert into the XACT_LOG
● After insert, before actions
● After insert, in middle of actions
● After insert, after actions, before delete
● After insert, after actions, after delete

Recovery
● Each C* has a crond job offset from every other
by some time period
● Each job runs the same code: multiget_slice for
all node tokens for all columns older than some
time period (the “recovery period”)
● Any columns need to be replayed in their
entirety and are deleted after replay (normally
there are no columns because normally things
are working)

XACT_LOG Comments
● Idempotent writes are awesome (that's why this
works so well)
● Doesn't work so well for counters (they're not
idempotent)
● Clients must be able to deal with temporarily
inconsistent data (they have to do this anyway)

Q?
Cassandra Data Modeling Examples
Matthew F. Dennis // @mdennis

Cassandra NYC 2011 Data Modeling

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Cassandra NYC 2011 Data Modeling

Similar to Cassandra NYC 2011 Data Modeling (20)

Recently uploaded

Recently uploaded (20)

Cassandra NYC 2011 Data Modeling