SlideShare a Scribd company logo
1 of 39
Details And Data Modeling
Agenda 
 Quick Review Of Cassandra 
 New Developments In Cassandra 
 Basic Data Modeling Concepts 
 Materialized Views 
 Secondary Indexes 
 Counters 
 Time Series Data 
 Expiring Data 
2
Cassandra High Level 
Cassandra's architecture is based on the 
combination of two technologies 
 Google BigTable – Data Model 
 Amazon Dynamo – Distributed 
Architecture 
 Cassandra = C* 
3
Architecture Basics & 
Terminology 
 Nodes are single instances of C* 
 Cluster is a group of nodes 
 Data is organized by keys (tokens) which 
are distributed across the cluster 
 Replication Factor (rf) determines how 
many copies are key 
 Data Center Aware 
 Consistency Level – powerful feature to 
tune consistency vs speed vs availability.’ 
4
C* Ring 
5
More Architecture 
 Information on who has what data and 
who is available is transferred using 
gossip. 
 No single point of failure (SPF), every 
node can service requests. 
 Data Center Aware 
6
CAP Theorem 
 Distributed Systems Law: 
 Consistency 
 Availability 
 Partition Tolerance 
(you can only really have two in a distributed system) 
 Cassandra is AP with Eventual 
Consistency 
7
Consistency 
 Cassandra Uses the concept of Tunable 
Consistency, which make it very 
powerful and flexible for system needs. 
8
C* Persistence Model 
9
Read Path 
10
Write Path 
11
Data Model Architecture 
 Keyspace – container of column families 
(tables). Defines RF among others. 
 Table – column family. Contains 
definition of schema. 
 Row – a “record” identified by a key 
 Column - a key and a value 
12
13
Keys 
 Primary Key 
 Partition Key – identifies a row 
 Cluster Key – sorting within a row 
 Using CQL these are defined together 
as a compound (composite) key 
 Compound keys are how you implement 
“wide rows” which we will look at a lot! 
14
Single Primary Key 
create table users ( 
user_id UUID PRIMARY KEY, 
firstname text, 
lastname text, 
emailaddres text 
); 
** Cassandra Data Types 
http://www.datastax.com/documentation/cql/3.0/cql/cql 
_reference/cql_data_types_c.html 
15
Compound Key 
create table users ( 
emailaddress text, 
department text, 
firstname text, 
lastname text, 
PRIMARY KEY (emailaddress, department) 
); 
 Partition Key plus Cluster Key 
 emailaddress is partition key 
 department is cluster key 
16
Compound Key 
create table users ( 
emailaddress text, 
department text, 
country text, 
firstname text, 
lastname text, 
PRIMARY KEY ((emailaddress, department), country) 
); 
 Partition Key plus Cluster Key 
 Emailaddress & department is partition key 
 country is cluster key 
17
Deletions 
 Distributed systems present unique 
problem for deletes. If it actually deleted 
data and a node was down and didn’t 
receive the delete notice it would try and 
create record when came back online. 
So… 
 Tombstone - The data is replaced with a 
special value called a Tombstone, works 
within distributed architecture 
18
New Rules 
 Writes Are Cheap 
 Denormalize All You Need 
 Model Your Queries, Not Data 
(understand access patterns) 
 Application Worries About Joins 
19
What’s New In 2.0 
Conditional DDL 
IF Exists or If Not Exists 
Drop Column Support 
ALTER TABLE users DROP lastname; 
20
More New Stuff 
 Triggers 
CREATE TRIGGER myTrigger 
ON myTable 
USING 'com.thejavaexperts.cassandra.updateevt' 
 Lightweight Transactions (CAS) 
UPDATE users 
SET firstname = 'tim' 
WHERE emailaddress = 'tpeters@example.com' 
IF firstname = 'tom'; 
** Not like an ACID Transaction!! 
21
CAS & Transactions 
 CAS - compare-and-set operations. In a 
single, atomic operation compares a 
value of a column in the database and 
applying a modification depending on 
the result of the comparison. 
 Consider performance hit. CAS is (was) 
considered an anti-pattern. 
22
Data Modeling… The 
Basics 
 Cassandra now is very familiar to 
RDBMS/SQL users. 
 Very nicely hides the underlying data 
storage model. 
 Still have all the power of Cassandra, it 
is all in the key definition. 
RDBMS = model data 
Cassandra = model access (queries) 
23
Side-Note On Querying 
 Create table with compound key 
 Select using ALLOW FILTERING 
 Counts 
 Select using IN or = 
24
Batch Operations 
 Saves Network Roundtrips 
 Can contain INSERT, UPDATE, 
DELETE 
 Atomic by default (all or nothing) 
 Can use timestamp for specific ordering 
25
Batch Operation Example 
BEGIN BATCH 
INSERT INTO users (emailaddress, firstname, lastname, country) 
values ('brian.enochson@gmail.com', 'brian', 'enochson', 'USA'); 
INSERT INTO users (emailaddress, firstname, lastname, country) 
values ('tpeters@example.com', 'tom', 'peters', 'DE'); 
INSERT INTO users (emailaddress, firstname, lastname, country) 
values ('jsmith@example.com', 'jim', 'smith', 'USA'); 
INSERT INTO users (emailaddress, firstname, lastname, country) 
values ('arogers@example.com', 'alan', 'rogers', 'USA'); 
DELETE FROM users WHERE emailaddress = 'jsmith@example.com'; 
APPLY BATCH; 
 select in cqlsh 
 List in cassandra-cli with timestamp 
26
More Data Modeling… 
 No Joins 
 No Foreign Keys 
 No Third (or any other) Normal Form 
Concerns 
 Redundant Data Encouraged. Apps 
maintain consistency. 
27
Secondary Indexes 
 Allow defining indexes to allow other 
access than partition key. 
 Each node has a local index for its data. 
 They have uses, but shouldn’t be used 
all the time without consideration. 
 We will look at alternatives. 
28
Secondary Index Example 
 Create a table 
 Try to select with column not in PK 
 Add Secondary Index 
 Try select again. 
29
When to use? 
 Low Cardinality – small number of unique 
values 
 High Cardinality – high number of distinct 
values 
 Secondary Indexes are good for Low 
Cardinality. So country codes, department 
codes etc. Not email addresses. 
30
Materialized View 
 Want full distribution can use what is 
called a Materialized View pattern. 
 Remember redundant data is fine. 
 Model the queries 
31
Materialized View Example 
 Show normal able with compound key and 
querying limitations 
 Create Materialized View Table With 
Different Compound Key, support alternate 
access. 
 Selects use partition key. 
 Secondary indexes local, not distributed 
 Allow filtering. Can cause performance issues 
32
Counters 
 Updated in 2.1 and now work in a more 
distributed and accurate manner. 
 Table organization, example 
 How to update, view etc. 
33
Time Series Example…. 
 Time series table model. 
 Need to consider interval for event 
frequency and wide row size. 
 Make what is tracked by time and unit of 
interval partition key. 
34
Time Series Data 
 Due to its quick writing model 
Cassandra is suited for storing time 
series data. 
 The Cassandra wide row is a perfect fit 
for modeling time series / time based 
events. 
 Let’s look at an example…. 
35
Event Data 
 Notice primary key and cluster key. 
 Insert some data 
 View in CQL, then in CLI as wide row 
36
TTL – Self Expiring Data 
 Another technique is data that has a 
defined lifespan. 
 For instance session identifiers, 
temporary passwords etc. 
 For this Cassandra provides a Time To 
Live (TTL) mechanism. 
37
TTL Example… 
 Create table 
 Insert data using TTL 
 Can update specific column with table 
 Show using selects. 
38
Questions 
 Email: brian.enochson@gmail.com 
 Twitter: @benochso 
 G+: https://plus.google.com/+BrianEnochson 
39

More Related Content

What's hot

«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»Olga Lavrentieva
 
Bt0075, rdbms and my sql
Bt0075, rdbms and my sqlBt0075, rdbms and my sql
Bt0075, rdbms and my sqlsmumbahelp
 
Lecture 07 - CS-5040 - modern database systems
Lecture 07 -  CS-5040 - modern database systemsLecture 07 -  CS-5040 - modern database systems
Lecture 07 - CS-5040 - modern database systemsMichael Mathioudakis
 
JovianDATA MDX Engine Comad oct 22 2011
JovianDATA MDX Engine Comad oct 22 2011JovianDATA MDX Engine Comad oct 22 2011
JovianDATA MDX Engine Comad oct 22 2011Satya Ramachandran
 
SQL interview questions by jeetendra mandal - part 4
SQL interview questions by jeetendra mandal - part 4SQL interview questions by jeetendra mandal - part 4
SQL interview questions by jeetendra mandal - part 4jeetendra mandal
 
Most useful queries
Most useful queriesMost useful queries
Most useful queriesSam Depp
 
no sql presentation
no sql presentationno sql presentation
no sql presentationchandanm2
 
SQL interview questions jeetendra mandal - part 5
SQL interview questions jeetendra mandal - part 5SQL interview questions jeetendra mandal - part 5
SQL interview questions jeetendra mandal - part 5jeetendra mandal
 
Dynamic Width File in Spark_2016
Dynamic Width File in Spark_2016Dynamic Width File in Spark_2016
Dynamic Width File in Spark_2016Subhasish Guha
 
最新のデータベース技術の方向性で思うこと
最新のデータベース技術の方向性で思うこと最新のデータベース技術の方向性で思うこと
最新のデータベース技術の方向性で思うことMasayoshi Hagiwara
 
Efficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsEfficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsIJERA Editor
 
ETL and pivoting in spark
ETL and pivoting in sparkETL and pivoting in spark
ETL and pivoting in sparkSubhasish Guha
 
Data decomposition techniques
Data decomposition techniquesData decomposition techniques
Data decomposition techniquesMohamed Ramadan
 

What's hot (20)

Lesson 2
Lesson 2Lesson 2
Lesson 2
 
ASP.NET- database connectivity
ASP.NET- database connectivityASP.NET- database connectivity
ASP.NET- database connectivity
 
Database connectivity in asp.net
Database connectivity in asp.netDatabase connectivity in asp.net
Database connectivity in asp.net
 
DataBase Management System Lab File
DataBase Management System Lab FileDataBase Management System Lab File
DataBase Management System Lab File
 
«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»
 
SWL 8
SWL 8SWL 8
SWL 8
 
Bt0075, rdbms and my sql
Bt0075, rdbms and my sqlBt0075, rdbms and my sql
Bt0075, rdbms and my sql
 
Lecture 07 - CS-5040 - modern database systems
Lecture 07 -  CS-5040 - modern database systemsLecture 07 -  CS-5040 - modern database systems
Lecture 07 - CS-5040 - modern database systems
 
JovianDATA MDX Engine Comad oct 22 2011
JovianDATA MDX Engine Comad oct 22 2011JovianDATA MDX Engine Comad oct 22 2011
JovianDATA MDX Engine Comad oct 22 2011
 
SQL interview questions by jeetendra mandal - part 4
SQL interview questions by jeetendra mandal - part 4SQL interview questions by jeetendra mandal - part 4
SQL interview questions by jeetendra mandal - part 4
 
Most useful queries
Most useful queriesMost useful queries
Most useful queries
 
Bi architect
Bi architectBi architect
Bi architect
 
no sql presentation
no sql presentationno sql presentation
no sql presentation
 
SQL interview questions jeetendra mandal - part 5
SQL interview questions jeetendra mandal - part 5SQL interview questions jeetendra mandal - part 5
SQL interview questions jeetendra mandal - part 5
 
Dynamic Width File in Spark_2016
Dynamic Width File in Spark_2016Dynamic Width File in Spark_2016
Dynamic Width File in Spark_2016
 
最新のデータベース技術の方向性で思うこと
最新のデータベース技術の方向性で思うこと最新のデータベース技術の方向性で思うこと
最新のデータベース技術の方向性で思うこと
 
Cassandra v1.0
Cassandra v1.0Cassandra v1.0
Cassandra v1.0
 
Efficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsEfficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in Clouds
 
ETL and pivoting in spark
ETL and pivoting in sparkETL and pivoting in spark
ETL and pivoting in spark
 
Data decomposition techniques
Data decomposition techniquesData decomposition techniques
Data decomposition techniques
 

Viewers also liked

Android应用开发简介
Android应用开发简介Android应用开发简介
Android应用开发简介easychen
 
Creative Challenge Call @ Congres Matching
Creative Challenge Call @ Congres MatchingCreative Challenge Call @ Congres Matching
Creative Challenge Call @ Congres MatchingKennisland
 
Produtos e serviços da Web 2.0
Produtos e serviços da Web 2.0Produtos e serviços da Web 2.0
Produtos e serviços da Web 2.0Elcio Ferreira
 
Week 01
Week 01Week 01
Week 01tjutel
 
Publizitate Eraginkortasunaren Baliosoena 2
Publizitate Eraginkortasunaren Baliosoena 2Publizitate Eraginkortasunaren Baliosoena 2
Publizitate Eraginkortasunaren Baliosoena 2katixa
 
St Ann's Hospice and Social Media - IoF North West Annual Conference 2009
St Ann's Hospice and Social Media - IoF North West Annual Conference 2009St Ann's Hospice and Social Media - IoF North West Annual Conference 2009
St Ann's Hospice and Social Media - IoF North West Annual Conference 2009Jonathan Waddingham
 
Wicked notes #3
Wicked notes #3Wicked notes #3
Wicked notes #3Kennisland
 
Yimby and growing your audience from zero to lots
Yimby and growing your audience from zero to lotsYimby and growing your audience from zero to lots
Yimby and growing your audience from zero to lotsJonathan Waddingham
 
MicroKernel & NodeStore
MicroKernel & NodeStoreMicroKernel & NodeStore
MicroKernel & NodeStoreJukka Zitting
 
Dejiny národov vo svetle božieho slova - časť 1.
Dejiny národov vo svetle božieho slova - časť 1.Dejiny národov vo svetle božieho slova - časť 1.
Dejiny národov vo svetle božieho slova - časť 1.Cirkev bratská Svätý Jur
 
Sv jur 2015 02 22_hľadanie boha uprostred zaneprázdnenosti.
Sv jur 2015 02 22_hľadanie boha uprostred zaneprázdnenosti.Sv jur 2015 02 22_hľadanie boha uprostred zaneprázdnenosti.
Sv jur 2015 02 22_hľadanie boha uprostred zaneprázdnenosti.Cirkev bratská Svätý Jur
 

Viewers also liked (20)

Digital Storytelling
Digital StorytellingDigital Storytelling
Digital Storytelling
 
Kazen evanjelizacia&ucenictvo-16.02.2014
Kazen evanjelizacia&ucenictvo-16.02.2014Kazen evanjelizacia&ucenictvo-16.02.2014
Kazen evanjelizacia&ucenictvo-16.02.2014
 
Android应用开发简介
Android应用开发简介Android应用开发简介
Android应用开发简介
 
Creative Challenge Call @ Congres Matching
Creative Challenge Call @ Congres MatchingCreative Challenge Call @ Congres Matching
Creative Challenge Call @ Congres Matching
 
Book Trailers2
Book Trailers2Book Trailers2
Book Trailers2
 
Produtos e serviços da Web 2.0
Produtos e serviços da Web 2.0Produtos e serviços da Web 2.0
Produtos e serviços da Web 2.0
 
Something About The Web
Something About The WebSomething About The Web
Something About The Web
 
Tag!
Tag!Tag!
Tag!
 
Week 01
Week 01Week 01
Week 01
 
Publizitate Eraginkortasunaren Baliosoena 2
Publizitate Eraginkortasunaren Baliosoena 2Publizitate Eraginkortasunaren Baliosoena 2
Publizitate Eraginkortasunaren Baliosoena 2
 
Springbreak 21042009
Springbreak 21042009Springbreak 21042009
Springbreak 21042009
 
St Ann's Hospice and Social Media - IoF North West Annual Conference 2009
St Ann's Hospice and Social Media - IoF North West Annual Conference 2009St Ann's Hospice and Social Media - IoF North West Annual Conference 2009
St Ann's Hospice and Social Media - IoF North West Annual Conference 2009
 
Wicked notes #3
Wicked notes #3Wicked notes #3
Wicked notes #3
 
Prezentacia1haiti
Prezentacia1haitiPrezentacia1haiti
Prezentacia1haiti
 
Yimby and growing your audience from zero to lots
Yimby and growing your audience from zero to lotsYimby and growing your audience from zero to lots
Yimby and growing your audience from zero to lots
 
MicroKernel & NodeStore
MicroKernel & NodeStoreMicroKernel & NodeStore
MicroKernel & NodeStore
 
Dejiny národov vo svetle božieho slova - časť 1.
Dejiny národov vo svetle božieho slova - časť 1.Dejiny národov vo svetle božieho slova - časť 1.
Dejiny národov vo svetle božieho slova - časť 1.
 
Fotomuseum
FotomuseumFotomuseum
Fotomuseum
 
Common H1B Cap 2017 Filing Mistakes: How To Avoid Them?
Common H1B Cap 2017 Filing Mistakes: How To Avoid Them?Common H1B Cap 2017 Filing Mistakes: How To Avoid Them?
Common H1B Cap 2017 Filing Mistakes: How To Avoid Them?
 
Sv jur 2015 02 22_hľadanie boha uprostred zaneprázdnenosti.
Sv jur 2015 02 22_hľadanie boha uprostred zaneprázdnenosti.Sv jur 2015 02 22_hľadanie boha uprostred zaneprázdnenosti.
Sv jur 2015 02 22_hľadanie boha uprostred zaneprázdnenosti.
 

Similar to Cassandra20141009

Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureKent Graziano
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandraPL dream
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelJenny Liu
 
2005 fall cs523_lecture_4
2005 fall cs523_lecture_42005 fall cs523_lecture_4
2005 fall cs523_lecture_4abhineetverma
 
Apache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machineryApache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machineryAndrey Lomakin
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQLPankaj Khattar
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data ModelingBen Knear
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cqlzznate
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELJenny Liu
 
Ado.net by Awais Majeed
Ado.net by Awais MajeedAdo.net by Awais Majeed
Ado.net by Awais MajeedAwais Majeed
 
White paper on cassandra
White paper on cassandraWhite paper on cassandra
White paper on cassandraNavanit Katiyar
 
Vsam interview questions and answers.
Vsam interview questions and answers.Vsam interview questions and answers.
Vsam interview questions and answers.Sweta Singh
 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresSteven Johnson
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 

Similar to Cassandra20141009 (20)

Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
 
Cassandra
CassandraCassandra
Cassandra
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
2005 fall cs523_lecture_4
2005 fall cs523_lecture_42005 fall cs523_lecture_4
2005 fall cs523_lecture_4
 
Apache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machineryApache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machinery
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQL
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cql
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
 
Ado.net by Awais Majeed
Ado.net by Awais MajeedAdo.net by Awais Majeed
Ado.net by Awais Majeed
 
White paper on cassandra
White paper on cassandraWhite paper on cassandra
White paper on cassandra
 
Vsam interview questions and answers.
Vsam interview questions and answers.Vsam interview questions and answers.
Vsam interview questions and answers.
 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome Measures
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
7. SQL.pptx
7. SQL.pptx7. SQL.pptx
7. SQL.pptx
 

More from Brian Enochson

Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBrian Enochson
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionBrian Enochson
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandraBrian Enochson
 
Cassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data ModelingCassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data ModelingBrian Enochson
 

More from Brian Enochson (6)

Hadoop20141125
Hadoop20141125Hadoop20141125
Hadoop20141125
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and Cassasdra
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
Cassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data ModelingCassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data Modeling
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 

Cassandra20141009

  • 1. Details And Data Modeling
  • 2. Agenda  Quick Review Of Cassandra  New Developments In Cassandra  Basic Data Modeling Concepts  Materialized Views  Secondary Indexes  Counters  Time Series Data  Expiring Data 2
  • 3. Cassandra High Level Cassandra's architecture is based on the combination of two technologies  Google BigTable – Data Model  Amazon Dynamo – Distributed Architecture  Cassandra = C* 3
  • 4. Architecture Basics & Terminology  Nodes are single instances of C*  Cluster is a group of nodes  Data is organized by keys (tokens) which are distributed across the cluster  Replication Factor (rf) determines how many copies are key  Data Center Aware  Consistency Level – powerful feature to tune consistency vs speed vs availability.’ 4
  • 6. More Architecture  Information on who has what data and who is available is transferred using gossip.  No single point of failure (SPF), every node can service requests.  Data Center Aware 6
  • 7. CAP Theorem  Distributed Systems Law:  Consistency  Availability  Partition Tolerance (you can only really have two in a distributed system)  Cassandra is AP with Eventual Consistency 7
  • 8. Consistency  Cassandra Uses the concept of Tunable Consistency, which make it very powerful and flexible for system needs. 8
  • 12. Data Model Architecture  Keyspace – container of column families (tables). Defines RF among others.  Table – column family. Contains definition of schema.  Row – a “record” identified by a key  Column - a key and a value 12
  • 13. 13
  • 14. Keys  Primary Key  Partition Key – identifies a row  Cluster Key – sorting within a row  Using CQL these are defined together as a compound (composite) key  Compound keys are how you implement “wide rows” which we will look at a lot! 14
  • 15. Single Primary Key create table users ( user_id UUID PRIMARY KEY, firstname text, lastname text, emailaddres text ); ** Cassandra Data Types http://www.datastax.com/documentation/cql/3.0/cql/cql _reference/cql_data_types_c.html 15
  • 16. Compound Key create table users ( emailaddress text, department text, firstname text, lastname text, PRIMARY KEY (emailaddress, department) );  Partition Key plus Cluster Key  emailaddress is partition key  department is cluster key 16
  • 17. Compound Key create table users ( emailaddress text, department text, country text, firstname text, lastname text, PRIMARY KEY ((emailaddress, department), country) );  Partition Key plus Cluster Key  Emailaddress & department is partition key  country is cluster key 17
  • 18. Deletions  Distributed systems present unique problem for deletes. If it actually deleted data and a node was down and didn’t receive the delete notice it would try and create record when came back online. So…  Tombstone - The data is replaced with a special value called a Tombstone, works within distributed architecture 18
  • 19. New Rules  Writes Are Cheap  Denormalize All You Need  Model Your Queries, Not Data (understand access patterns)  Application Worries About Joins 19
  • 20. What’s New In 2.0 Conditional DDL IF Exists or If Not Exists Drop Column Support ALTER TABLE users DROP lastname; 20
  • 21. More New Stuff  Triggers CREATE TRIGGER myTrigger ON myTable USING 'com.thejavaexperts.cassandra.updateevt'  Lightweight Transactions (CAS) UPDATE users SET firstname = 'tim' WHERE emailaddress = 'tpeters@example.com' IF firstname = 'tom'; ** Not like an ACID Transaction!! 21
  • 22. CAS & Transactions  CAS - compare-and-set operations. In a single, atomic operation compares a value of a column in the database and applying a modification depending on the result of the comparison.  Consider performance hit. CAS is (was) considered an anti-pattern. 22
  • 23. Data Modeling… The Basics  Cassandra now is very familiar to RDBMS/SQL users.  Very nicely hides the underlying data storage model.  Still have all the power of Cassandra, it is all in the key definition. RDBMS = model data Cassandra = model access (queries) 23
  • 24. Side-Note On Querying  Create table with compound key  Select using ALLOW FILTERING  Counts  Select using IN or = 24
  • 25. Batch Operations  Saves Network Roundtrips  Can contain INSERT, UPDATE, DELETE  Atomic by default (all or nothing)  Can use timestamp for specific ordering 25
  • 26. Batch Operation Example BEGIN BATCH INSERT INTO users (emailaddress, firstname, lastname, country) values ('brian.enochson@gmail.com', 'brian', 'enochson', 'USA'); INSERT INTO users (emailaddress, firstname, lastname, country) values ('tpeters@example.com', 'tom', 'peters', 'DE'); INSERT INTO users (emailaddress, firstname, lastname, country) values ('jsmith@example.com', 'jim', 'smith', 'USA'); INSERT INTO users (emailaddress, firstname, lastname, country) values ('arogers@example.com', 'alan', 'rogers', 'USA'); DELETE FROM users WHERE emailaddress = 'jsmith@example.com'; APPLY BATCH;  select in cqlsh  List in cassandra-cli with timestamp 26
  • 27. More Data Modeling…  No Joins  No Foreign Keys  No Third (or any other) Normal Form Concerns  Redundant Data Encouraged. Apps maintain consistency. 27
  • 28. Secondary Indexes  Allow defining indexes to allow other access than partition key.  Each node has a local index for its data.  They have uses, but shouldn’t be used all the time without consideration.  We will look at alternatives. 28
  • 29. Secondary Index Example  Create a table  Try to select with column not in PK  Add Secondary Index  Try select again. 29
  • 30. When to use?  Low Cardinality – small number of unique values  High Cardinality – high number of distinct values  Secondary Indexes are good for Low Cardinality. So country codes, department codes etc. Not email addresses. 30
  • 31. Materialized View  Want full distribution can use what is called a Materialized View pattern.  Remember redundant data is fine.  Model the queries 31
  • 32. Materialized View Example  Show normal able with compound key and querying limitations  Create Materialized View Table With Different Compound Key, support alternate access.  Selects use partition key.  Secondary indexes local, not distributed  Allow filtering. Can cause performance issues 32
  • 33. Counters  Updated in 2.1 and now work in a more distributed and accurate manner.  Table organization, example  How to update, view etc. 33
  • 34. Time Series Example….  Time series table model.  Need to consider interval for event frequency and wide row size.  Make what is tracked by time and unit of interval partition key. 34
  • 35. Time Series Data  Due to its quick writing model Cassandra is suited for storing time series data.  The Cassandra wide row is a perfect fit for modeling time series / time based events.  Let’s look at an example…. 35
  • 36. Event Data  Notice primary key and cluster key.  Insert some data  View in CQL, then in CLI as wide row 36
  • 37. TTL – Self Expiring Data  Another technique is data that has a defined lifespan.  For instance session identifiers, temporary passwords etc.  For this Cassandra provides a Time To Live (TTL) mechanism. 37
  • 38. TTL Example…  Create table  Insert data using TTL  Can update specific column with table  Show using selects. 38
  • 39. Questions  Email: brian.enochson@gmail.com  Twitter: @benochso  G+: https://plus.google.com/+BrianEnochson 39