SlideShare una empresa de Scribd logo
1 de 40
Descargar para leer sin conexión
©2013 DataStax Confidential. Do not distribute without consent.
@PatrickMcFadin
Patrick McFadin

Chief Evangelist
Data Modeling Time Series
1
Internet Of Things
• 15B devices by 2015
• 40B devices by 2020!
Why Cassandra for Time Series
Scales
Resilient
Good data model
Efficient Storage Model
What about that?
Example 1: Weather Station
• Weather station collects data
• Cassandra stores in sequence
• Application reads in sequence
Use case
• Store data per weather station
• Store time series in order: first to last
• Get all data for one weather station
• Get data for a single date and time
• Get data for a range of dates and times
Needed Queries
Data Model to support queries
Data Model
• Weather Station Id and Time
are unique
• Store as many as needed
CREATE TABLE temperature (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time)
);
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:01:00','72F');
!
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:02:00','73F');
!
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:03:00','73F');
!
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
Storage Model - Logical View
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD';
1234ABCD
1234ABCD
1234ABCD
weatherstation_id event_time temperature
2013-04-03 07:04:00
74F
1234ABCD
Storage Model - Disk Layout
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
1234ABCD
2013-04-03 07:04:00
74F
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD';
Merged, Sorted and Stored Sequentially
2013-04-03 07:05:00!
!
74F
2013-04-03 07:06:00!
!
75F
Query patterns
• Range queries
• “Slice” operation on disk
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD'
AND event_time >= '2013-04-03 07:01:00'
AND event_time <= '2013-04-03 07:04:00';
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
1234ABCD
2013-04-03 07:04:00
74F
2013-04-03 07:05:00!
!
74F
2013-04-03 07:06:00!
!
75F
Single seek on disk
Query patterns
• Range queries
• “Slice” operation on disk
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD'
AND event_time >= '2013-04-03 07:01:00'
AND event_time <= '2013-04-03 07:04:00';
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
1234ABCD
2013-04-03 07:04:00
74F
weatherstation_id event_time temperature
1234ABCD
1234ABCD
1234ABCD
Programmers like this
Sorted by event_time
Additional help on the storage engine
SSTable seeks
• Each read minimum
1 seek
• Cache and bloom
filter help minimize
Total seek time = Disk Latency * number of seeks
The key to speed
Use the first part of the primary key to get the node
(data localization)
Minimize seeks for SStables
(Key Cache, Bloom Filter)
Find the data fast in the SSTable
(Indexes)
Min/Max Value Hint
• New since 2.0
• Range index on primary key values per SSTable
• Minimizes seeks on range data
CASSANDRA-5514 if you are interested in details
SELECT temperature
FROM event_time,temperature
WHERE weatherstation_id='1234ABCD'
AND event_time > '2013-04-03 07:01:00'
AND event_time < '2013-04-03 07:04:00';
Row Key: 1234ABCD
Min event_time: 2013-04-01 00:00:00
Max event_time: 2013-04-04 23:59:59
Row Key: 1234ABCD
Min event_time: 2013-04-05 00:00:00
Max event_time: 2013-04-09 23:59:59
Row Key: 1234ABCD
Min event_time: 2013-03-27 00:00:00
Max event_time: 2013-03-31 23:59:59
?
This one
Ingestion models
• Apache Kafka
• Apache Flume
• Storm
• Custom Applications
Apache Kafka
Your totally!
killer!
application
Kafka + Storm
• Kafka provides reliable queuing
• Storm processes (rollups, counts)
• Cassandra stores at the same speed
• Storm lookup on Cassandra
Apache Kafka
Apache Storm
Queue Process Store
Flume
• Source accepts data
• Channel buffers data
• Sink processes and stores
• Popular for log processing
Sink
Channel
Source
Application
Load
Balancer
Syslog
Dealing with data at speed
• 1 million writes per second?
• 1 insert every microsecond
• Collisions?
• Primary Key determines node
placement
• Random partitioning
• Special data type - TimeUUID
Your totally!
killer!
application weatherstation_id='1234ABCD'
weatherstation_id='5678EFGH'
How does data replicate?
Primary key determines placement*
Partitioning
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru gender: F
johnny age:12 gender: M
suzy age:10 gender: F
jim
carol
johnny
suzy
PK
5e02739678...
a9a0198010...
f4eb27cea7...
78b421309e...
MD5 Hash
MD5* hash
operation yields
a 128-bit
number for keys
of any size.
Key Hashing
Node A
Node D Node C
Node B
The Token Ring
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
Node A
Node D Node C
Node B
carol a9a0198010...
Replication
Node A
Node D Node C
Node B
carol a9a0198010...
Replication
Node A
Node D Node C
Node B
carol a9a0198010...
Replication
Replication factor = 3
Consistency is a
different topic for
later
TimeUUID
• Also known as a Version 1 UUID
• Sortable
• Reversible
Timestamp to Microsecond + UUID = TimeUUID
04d580b0-9412-11e3-baa8-0800200c9a66 Wednesday, February 12, 2014 6:18:06 PM GMT
http://www.famkruithof.net/uuid/uuidgen
=
Example 2: Financial Transactions
• Trading of stocks
• When did they happen?
• Massive speeds and volumes
“Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of
financial data, ingesting into its database 2million pieces of information a second from every
major trading exchange.”*
* http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html
Use case
• Store data per symbol and date
• Store time series in reverse order: last to first
• Make sure every transaction is unique
• Get all trades for symbol and day
• Get trade for a single date and time
• Get last 10 trades for symbol and date
Needed Queries
Data Model to support queries
Data Model
• date is int of days since epoch
• timeuuid keeps it unique
• Reverse the times for later
queries
CREATE TABLE stock_ticks (
symbol text,
date int,
trade timeuuid,
trade_details text,
PRIMARY KEY ((symbol, date), trade)
) WITH CLUSTERING ORDER BY (trade DESC);
INSERT INTO stock_ticks(symbol, date, trade, trade_details)
VALUES (‘NFLX’,340,04d580b0-1431-1e33-baf8-0833200c98a6,'BUY:2000');
!
INSERT INTO stock_ticks(symbol, date, trade, trade_details)
VALUES (‘NFLX’,340,05d580b0-6472-1ef3-a3a8-0430200c9a66,'BUY:300');
!
INSERT INTO stock_ticks(symbol, date, trade, trade_details)
VALUES (‘NFLX’,340,02d580b0-9412-d223-55a8-0976200c9a25,'SELL:450');
!
INSERT INTO stock_ticks(symbol, date, trade, trade_details)
VALUES (‘NFLX’,340,08d580b0-4482-11e3-5fd3-3421200c9a65,'SELL:3000');
Storage Model - Logical View
08d580b0-4482-11e3-5fd3-
3421200c9a65
SELL:3000
02d580b0-9412-
d223-55a8-0976200c9a25
SELL:450
05d580b0-6472-1ef3-
a3a8-0430200c9a66
BUY:300
SELECT trade,trade_details
FROM stock_ticks
WHERE symbol =‘NFLX’ AND date=‘340’;
NFLX:340
NFLX:340
NFLX:340
symbol:date trade trade_details
04d580b0-1431-1e33-
baf8-0833200c98a6
BUY:2000
NFLX:340
04d580b0-1431-1e33-
baf8-0833200c98a6
05d580b0-6472-1ef3-
a3a8-0430200c9a66
02d580b0-9412-d223-55a8
BUY:2000BUY:300
08d580b0-4482-11e3-5fd3-
3421200c9a65
SELL:3000 SELL:450
Storage Model - Disk Layout
NFLX:340
Order is from last trade to first
SELECT trade,trade_details
FROM stock_ticks
WHERE symbol =‘NFLX’ AND date=‘340’;
04d580b0-1431-1e33-
baf8-0833200c98a6
05d580b0-6472-1ef3-
a3a8-0430200c9a66
02d580b0-9412-
d223-55a8-0976200c9a25
Query patterns
• Limit queries
• Get last X trades
From here
SELECT trade,trade_details
FROM stock_ticks
WHERE symbol =‘NFLX’ AND date=‘340’
LIMIT 3;
BUY:2000BUY:300
08d580b0-4482-11e3-5fd3-
3421200c9a65
SELL:3000 SELL:450
NFLX:340
to here
Query patterns
Reverse sorted by trade
Last 3 trades
08d580b0-4482-11e3-5fd3-
3421200c9a65
SELL:3000
02d580b0-9412-
d223-55a8-0976200c9a25
SELL:450
05d580b0-6472-1ef3-
a3a8-0430200c9a66
BUY:300
NFLX:340
NFLX:340
NFLX:340
symbol:date trade trade_details
• Limit queries
• Get last X trades
SELECT trade,trade_details
FROM stock_ticks
WHERE symbol =‘NFLX’ AND date=‘340’
LIMIT 3;
Way more examples
• 5 minute interviews
• Use cases
• Free training!
!
www.planetcassandra.org
Thank You!
Follow me for more updates all the time: @PatrickMcFadin

Más contenido relacionado

La actualidad más candente

Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced previewPatrick McFadin
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Markus Klems
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingVassilis Bekiaris
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache CassandraPatrick McFadin
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized ViewsCarl Yeksigian
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesPatrick McFadin
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data modelPatrick McFadin
 
Cassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerCassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerDataStax
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014jbellis
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
 
Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDuyhai Doan
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1jbellis
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015jbellis
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 

La actualidad más candente (20)

Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized Views
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data model
 
Cassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerCassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super Modeler
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 
Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basics
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 

Similar a Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling

Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...it-people
 
Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2DataStax Academy
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Databasejavier ramirez
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin airKonstantine Krutiy
 
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101DataStax Academy
 
Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101DataStax Academy
 
Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101DataStax Academy
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databasejavier ramirez
 
Data Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkData Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkChristopher Batey
 
Cassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on FireCassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on FireDataStax
 
Introduction to Dating Modeling for Cassandra
Introduction to Dating Modeling for CassandraIntroduction to Dating Modeling for Cassandra
Introduction to Dating Modeling for CassandraDataStax Academy
 
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...Databricks
 
Speedment & Sencha at Oracle Open World 2015
Speedment & Sencha at Oracle Open World 2015Speedment & Sencha at Oracle Open World 2015
Speedment & Sencha at Oracle Open World 2015Speedment, Inc.
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC OsloDavid Pilato
 
Managing your black friday logs - Code Europe
Managing your black friday logs - Code EuropeManaging your black friday logs - Code Europe
Managing your black friday logs - Code EuropeDavid Pilato
 
Scalable Event Processing with WSO2CEP @ WSO2Con2015eu
Scalable Event Processing with WSO2CEP @  WSO2Con2015euScalable Event Processing with WSO2CEP @  WSO2Con2015eu
Scalable Event Processing with WSO2CEP @ WSO2Con2015euSriskandarajah Suhothayan
 

Similar a Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling (20)

Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
 
1 Dundee - Cassandra 101
1 Dundee - Cassandra 1011 Dundee - Cassandra 101
1 Dundee - Cassandra 101
 
Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin air
 
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
 
Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101
 
Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101
 
MySQL vs. MonetDB
MySQL vs. MonetDBMySQL vs. MonetDB
MySQL vs. MonetDB
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
 
Data Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkData Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and Spark
 
Cassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on FireCassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on Fire
 
Introduction to Dating Modeling for Cassandra
Introduction to Dating Modeling for CassandraIntroduction to Dating Modeling for Cassandra
Introduction to Dating Modeling for Cassandra
 
WSO2 Complex Event Processor
WSO2 Complex Event ProcessorWSO2 Complex Event Processor
WSO2 Complex Event Processor
 
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
 
Speedment & Sencha at Oracle Open World 2015
Speedment & Sencha at Oracle Open World 2015Speedment & Sencha at Oracle Open World 2015
Speedment & Sencha at Oracle Open World 2015
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC Oslo
 
Managing your black friday logs - Code Europe
Managing your black friday logs - Code EuropeManaging your black friday logs - Code Europe
Managing your black friday logs - Code Europe
 
Scalable Event Processing with WSO2CEP @ WSO2Con2015eu
Scalable Event Processing with WSO2CEP @  WSO2Con2015euScalable Event Processing with WSO2CEP @  WSO2Con2015eu
Scalable Event Processing with WSO2CEP @ WSO2Con2015eu
 

Más de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and DriversDataStax Academy
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph DatabasesDataStax Academy
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkDataStax Academy
 

Más de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph Databases
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with Spark
 

Último

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Último (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling

  • 1. ©2013 DataStax Confidential. Do not distribute without consent. @PatrickMcFadin Patrick McFadin
 Chief Evangelist Data Modeling Time Series 1
  • 2. Internet Of Things • 15B devices by 2015 • 40B devices by 2020!
  • 3. Why Cassandra for Time Series Scales Resilient Good data model Efficient Storage Model What about that?
  • 4. Example 1: Weather Station • Weather station collects data • Cassandra stores in sequence • Application reads in sequence
  • 5. Use case • Store data per weather station • Store time series in order: first to last • Get all data for one weather station • Get data for a single date and time • Get data for a range of dates and times Needed Queries Data Model to support queries
  • 6. Data Model • Weather Station Id and Time are unique • Store as many as needed CREATE TABLE temperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time) ); INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:01:00','72F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:02:00','73F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:03:00','73F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
  • 7. Storage Model - Logical View 2013-04-03 07:01:00 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 73F SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD'; 1234ABCD 1234ABCD 1234ABCD weatherstation_id event_time temperature 2013-04-03 07:04:00 74F 1234ABCD
  • 8. Storage Model - Disk Layout 2013-04-03 07:01:00 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 73F 1234ABCD 2013-04-03 07:04:00 74F SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD'; Merged, Sorted and Stored Sequentially 2013-04-03 07:05:00! ! 74F 2013-04-03 07:06:00! ! 75F
  • 9. Query patterns • Range queries • “Slice” operation on disk SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD' AND event_time >= '2013-04-03 07:01:00' AND event_time <= '2013-04-03 07:04:00'; 2013-04-03 07:01:00 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 73F 1234ABCD 2013-04-03 07:04:00 74F 2013-04-03 07:05:00! ! 74F 2013-04-03 07:06:00! ! 75F Single seek on disk
  • 10. Query patterns • Range queries • “Slice” operation on disk SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD' AND event_time >= '2013-04-03 07:01:00' AND event_time <= '2013-04-03 07:04:00'; 2013-04-03 07:01:00 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 73F 1234ABCD 2013-04-03 07:04:00 74F weatherstation_id event_time temperature 1234ABCD 1234ABCD 1234ABCD Programmers like this Sorted by event_time
  • 11. Additional help on the storage engine
  • 12. SSTable seeks • Each read minimum 1 seek • Cache and bloom filter help minimize Total seek time = Disk Latency * number of seeks
  • 13. The key to speed Use the first part of the primary key to get the node (data localization) Minimize seeks for SStables (Key Cache, Bloom Filter) Find the data fast in the SSTable (Indexes)
  • 14. Min/Max Value Hint • New since 2.0 • Range index on primary key values per SSTable • Minimizes seeks on range data CASSANDRA-5514 if you are interested in details SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00'; Row Key: 1234ABCD Min event_time: 2013-04-01 00:00:00 Max event_time: 2013-04-04 23:59:59 Row Key: 1234ABCD Min event_time: 2013-04-05 00:00:00 Max event_time: 2013-04-09 23:59:59 Row Key: 1234ABCD Min event_time: 2013-03-27 00:00:00 Max event_time: 2013-03-31 23:59:59 ? This one
  • 15. Ingestion models • Apache Kafka • Apache Flume • Storm • Custom Applications Apache Kafka Your totally! killer! application
  • 16. Kafka + Storm • Kafka provides reliable queuing • Storm processes (rollups, counts) • Cassandra stores at the same speed • Storm lookup on Cassandra Apache Kafka Apache Storm Queue Process Store
  • 17. Flume • Source accepts data • Channel buffers data • Sink processes and stores • Popular for log processing Sink Channel Source Application Load Balancer Syslog
  • 18. Dealing with data at speed • 1 million writes per second? • 1 insert every microsecond • Collisions? • Primary Key determines node placement • Random partitioning • Special data type - TimeUUID Your totally! killer! application weatherstation_id='1234ABCD' weatherstation_id='5678EFGH'
  • 19. How does data replicate?
  • 20. Primary key determines placement* Partitioning jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F
  • 22. Node A Node D Node C Node B The Token Ring
  • 23. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  • 24. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  • 25. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  • 26. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  • 27. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  • 28. Node A Node D Node C Node B carol a9a0198010... Replication
  • 29. Node A Node D Node C Node B carol a9a0198010... Replication
  • 30. Node A Node D Node C Node B carol a9a0198010... Replication Replication factor = 3 Consistency is a different topic for later
  • 31. TimeUUID • Also known as a Version 1 UUID • Sortable • Reversible Timestamp to Microsecond + UUID = TimeUUID 04d580b0-9412-11e3-baa8-0800200c9a66 Wednesday, February 12, 2014 6:18:06 PM GMT http://www.famkruithof.net/uuid/uuidgen =
  • 32. Example 2: Financial Transactions • Trading of stocks • When did they happen? • Massive speeds and volumes “Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of financial data, ingesting into its database 2million pieces of information a second from every major trading exchange.”* * http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html
  • 33. Use case • Store data per symbol and date • Store time series in reverse order: last to first • Make sure every transaction is unique • Get all trades for symbol and day • Get trade for a single date and time • Get last 10 trades for symbol and date Needed Queries Data Model to support queries
  • 34. Data Model • date is int of days since epoch • timeuuid keeps it unique • Reverse the times for later queries CREATE TABLE stock_ticks ( symbol text, date int, trade timeuuid, trade_details text, PRIMARY KEY ((symbol, date), trade) ) WITH CLUSTERING ORDER BY (trade DESC); INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,04d580b0-1431-1e33-baf8-0833200c98a6,'BUY:2000'); ! INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,05d580b0-6472-1ef3-a3a8-0430200c9a66,'BUY:300'); ! INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,02d580b0-9412-d223-55a8-0976200c9a25,'SELL:450'); ! INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,08d580b0-4482-11e3-5fd3-3421200c9a65,'SELL:3000');
  • 35. Storage Model - Logical View 08d580b0-4482-11e3-5fd3- 3421200c9a65 SELL:3000 02d580b0-9412- d223-55a8-0976200c9a25 SELL:450 05d580b0-6472-1ef3- a3a8-0430200c9a66 BUY:300 SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’; NFLX:340 NFLX:340 NFLX:340 symbol:date trade trade_details 04d580b0-1431-1e33- baf8-0833200c98a6 BUY:2000 NFLX:340
  • 36. 04d580b0-1431-1e33- baf8-0833200c98a6 05d580b0-6472-1ef3- a3a8-0430200c9a66 02d580b0-9412-d223-55a8 BUY:2000BUY:300 08d580b0-4482-11e3-5fd3- 3421200c9a65 SELL:3000 SELL:450 Storage Model - Disk Layout NFLX:340 Order is from last trade to first SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’;
  • 37. 04d580b0-1431-1e33- baf8-0833200c98a6 05d580b0-6472-1ef3- a3a8-0430200c9a66 02d580b0-9412- d223-55a8-0976200c9a25 Query patterns • Limit queries • Get last X trades From here SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’ LIMIT 3; BUY:2000BUY:300 08d580b0-4482-11e3-5fd3- 3421200c9a65 SELL:3000 SELL:450 NFLX:340 to here
  • 38. Query patterns Reverse sorted by trade Last 3 trades 08d580b0-4482-11e3-5fd3- 3421200c9a65 SELL:3000 02d580b0-9412- d223-55a8-0976200c9a25 SELL:450 05d580b0-6472-1ef3- a3a8-0430200c9a66 BUY:300 NFLX:340 NFLX:340 NFLX:340 symbol:date trade trade_details • Limit queries • Get last X trades SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’ LIMIT 3;
  • 39. Way more examples • 5 minute interviews • Use cases • Free training! ! www.planetcassandra.org
  • 40. Thank You! Follow me for more updates all the time: @PatrickMcFadin