Time series data is everywhere: IoT, sensor data or financial transactions. The industry has moved to databases like Cassandra to handle the high velocity and high volume of data that is now common place. In this talk I will present how we have used Cassandra to store time series data. I will highlight both the Cassandra data model as well as the architecture we put in place for collecting and ingesting data into Cassandra, using Apache Kafka and Apache Storm.
2. Guido Schmutz
Working for Trivadis for more than 19 years
Oracle ACE Director for Fusion Middleware and SOA
Co-Author of different books
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Member of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 25 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://de.slideshare.net/gschmutz
Twitter: gschmutz
2
7. Data Science Lab @ Armasuisse W&T
W+T flagship project, standing
for innovation & tech transfer
Building capabilities in the
areas of:
• Social Media Intelligence
(SOCMINT)
• Big Data Technologies &
Architectures
Invest into new, innovative and not
widely-proven technology
• Batch / Real-time analysis
• NoSQL databases
• Text analysis (NLP)
• Graph Data
• …
3 Phases: June 2013 – June 2015
7
8. SOCMINT System – Time Dimension
Major data model: Time
series (TS)
TS reflect user behaviors
over time
Activities correlate with
events
Anomaly detection
Event detection &
prediction
8
9. SOCMINT System – Social Dimension
User-user networks (social
graphs);
Twitter: follower, retweet and
mention graphs
Who is central in a social
network?
Who has retweeted a given
tweet to whom?
9
10. SOCMINT System - “Lambda Architecture” for Big Data
Data
Collection
(Analytical) Batch Data Processing
Batch
compute
Batch Result StoreData
Sources
Channel
Data
Access
Reports
Service
Analytic
Tools
Alerting
Tools
Social
RDBMS
Sensor
ERP
Logfiles
Mobile
Machine
(Analytical) Real-Time Data Processing
Stream/Event Processing
Batch
compute
Real-Time Result
Store
Messaging
Result Store
Query
Engine
Result Store
Computed
Information
Raw Data
(Reservoir)
= Data in Motion = Data at Rest
10
11. SOCMINT System – Frameworks & Components in Use
Data
Collection
(Analytical) Batch Data Processing
Batch
compute
Batch Result StoreData
Sources
Channel
Data
Access
Reports
Service
Analytic
Tools
Alerting
Tools
Social
(Analytical) Real-Time Data Processing
Stream/Event Processing
Batch
compute
Real-Time Result
Store
Messaging
Result Store
Query
Engine
Result Store
Computed
Information
Raw Data
(Reservoir)
= Data in Motion = Data at Rest
11
14. Cassandra Data Modelling
14
• Don’t think relational !
• Denormalize, Denormalize, Denormalize ….
• Rows are gigantic and sorted = one row is stored on one node
• Know your application/use cases => from query to model
• Index is not an afterthought, anymore => “index” upfront
• Control physical storage structure
25. Summary - Know your domain
Connectedness of Datalow high
Document
Data
Store
Key-Value
Stores
Wide-
Column
Store
Graph
Databases
Relational
Databases