SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Cassandra
Data Modeling
Presented By: Charmy Garg
Software Consultant
Knoldus Inc.
01 Keys in Cassandra
02 Basic Goals
03 Model your own Queries
04 Applying Rules: Examples
05 Glance at Use cases
Our Agenda
What is Apache Cassandra?
Cassandra vs Relational
Cassandra Data Model Relational Data Model
Keyspace Database
Column family Table
Partition key Primary Key
Column Name/Key Column Name
Column value Column value
Equivalent to the Partition
Key in a single-field-key
table (i.e. Simple).
Just any multiple-column
key.
Responsible for data
distribution across your
nodes.
Responsible for data
sorting within the
partition.
1
2
3
4
Primary Key
Composite Key
Partition Key
Clustering Key
“Keys to Recall for
Cassandra Data
Modeling”
Primary Key
Composite Key
Clustering Key &
Partition Key
How Cassandra organizes data
Partitioning and Hashing
Non-Goals
Minimize Data
Duplication
Minimize the
Number of
Writes
As Cassandra is a
distributed database,
so data duplication
provides instant data
availability and no
single point of failure.
Cassandra is
optimized for high
write throughput,
and almost all writes
are equally efficient.
2
1
4
1Spread data evenly
around the cluster
Rows are spread around the cluster
based on a hash of the partition key,
which is the first element of the PRIMARY
KEY. So, the key to spreading data evenly
is this: pick a good primary key.
Minimize the number of
partitions read
Partitions are groups of rows
that share the same partition
key. When you issue a read
query, you want to read rows
from as few partitions as
possible.
Basic Goals
Model Your Data
The way to minimize partition reads is to model your data to fit your queries. Don't model around
relations. Don't model around objects. Model around your queries. Here's how you do that:
Determine what queries
you want to support
Create table according to
your queries
Step 2Step 1
www.website.com
Try to determine exactly what queries you need to support. This can
include a lot of considerations that you may not think of at first. For
example, you may need to think about:
● Grouping by an attribute
● Ordering by an attribute
● Filtering based on some set of conditions
● Enforcing uniqueness in the result set
Changes to just one of these query requirements will frequently warrant a data model change for maximum
efficiency.
Step 1:
Determine What Queries to Support
www.website.com
Use one table per query pattern. If you need to support multiple query
patterns, you usually need more than one table.
If you need different types of answers, you usually need different tables. This is how you optimize for reads.
Remember, in Cassandra data duplication is okay. Many of your tables may repeat the same data.
Step 2:
Create table for Queries
Applying the Rules: Examples
c
Example 1:
Table Music
Playlist
In the example, table Music Playlist,
● SongId is the partition key, and
● SongName is the clustering
column
c
Example 1:
Table Music
Playlist
In the example, table Music Playlist,
● SongId and Year are the
partition key, and
● SongName is the clustering
column.
Glance at Use Cases
Use Case 1
Suppose that we are storing Facebook posts of different users in
Cassandra.
Query: Fetch the top ‘N‘ posts made by a given user.
We require user_id, post_id and content as fields. The Cassandra
table schema for this use case would look like:
Stores all data for a particular user on a single
partition as per the above guidelines.
Using the post timestamp as the clustering key will
be helpful for retrieving the top ‘N‘ posts more
efficiently.
Use Case 2
Suppose that we are storing the details of different partner gyms
across the different cities and states of many countries.
Query: Fetch the sorted gyms for a given city.
We require country_code, state, city, gym_name and opening_date
as fields.
The Cassandra table schema for this use case would look like:
Also, let’s say we need to return the results having
gyms sorted by their opening date.
Store the gyms located in a given city of a specific
state and country on a single partition and use the
opening date and gym name as a clustering key.
References
Baeldung - Cassandra Data Modeling
Guru99 - Data Modeling rules in Cassandra
Simple Learn - Cassandra Data Modeling
Datastax - Cassandra Data Modeling rules
Q&A
Please email your queries at
charmy.garg@knoldus.in
Thank You!
@charmygarg
@charmygarg
/facebook.com/charmiigarg

Más contenido relacionado

Similar a Cassandra Data Modelling

Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011
bostonrb
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
xlight
 
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docxFai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
ssuser454af01
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
supertom
 

Similar a Cassandra Data Modelling (20)

Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQL
 
Chapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortalsChapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortals
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsChapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
 
Presentation
PresentationPresentation
Presentation
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...
Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...
Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...
 
Tech talk
Tech talkTech talk
Tech talk
 
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docxFai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
 
Document databases
Document databasesDocument databases
Document databases
 
Choosing your NoSQL storage
Choosing your NoSQL storageChoosing your NoSQL storage
Choosing your NoSQL storage
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB Database
 
Mongo db
Mongo dbMongo db
Mongo db
 
NoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to CouchbaseNoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to Couchbase
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really Doing
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013
 
MongoDB performance
MongoDB performanceMongoDB performance
MongoDB performance
 

Más de Knoldus Inc.

Más de Knoldus Inc. (20)

Supply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptxSupply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptx
 
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On Introduction
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptx
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptx
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdf
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptx
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose Kubernetes
 
Introduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxIntroduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptx
 
Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptx
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptx
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptx
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptx
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake Presentation
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics Presentation
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIs
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II Presentation
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRA
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Cassandra Data Modelling

  • 1. Cassandra Data Modeling Presented By: Charmy Garg Software Consultant Knoldus Inc.
  • 2. 01 Keys in Cassandra 02 Basic Goals 03 Model your own Queries 04 Applying Rules: Examples 05 Glance at Use cases Our Agenda
  • 3. What is Apache Cassandra?
  • 4. Cassandra vs Relational Cassandra Data Model Relational Data Model Keyspace Database Column family Table Partition key Primary Key Column Name/Key Column Name Column value Column value
  • 5. Equivalent to the Partition Key in a single-field-key table (i.e. Simple). Just any multiple-column key. Responsible for data distribution across your nodes. Responsible for data sorting within the partition. 1 2 3 4 Primary Key Composite Key Partition Key Clustering Key “Keys to Recall for Cassandra Data Modeling”
  • 11. Non-Goals Minimize Data Duplication Minimize the Number of Writes As Cassandra is a distributed database, so data duplication provides instant data availability and no single point of failure. Cassandra is optimized for high write throughput, and almost all writes are equally efficient.
  • 12. 2 1 4 1Spread data evenly around the cluster Rows are spread around the cluster based on a hash of the partition key, which is the first element of the PRIMARY KEY. So, the key to spreading data evenly is this: pick a good primary key. Minimize the number of partitions read Partitions are groups of rows that share the same partition key. When you issue a read query, you want to read rows from as few partitions as possible. Basic Goals
  • 13. Model Your Data The way to minimize partition reads is to model your data to fit your queries. Don't model around relations. Don't model around objects. Model around your queries. Here's how you do that: Determine what queries you want to support Create table according to your queries Step 2Step 1
  • 14. www.website.com Try to determine exactly what queries you need to support. This can include a lot of considerations that you may not think of at first. For example, you may need to think about: ● Grouping by an attribute ● Ordering by an attribute ● Filtering based on some set of conditions ● Enforcing uniqueness in the result set Changes to just one of these query requirements will frequently warrant a data model change for maximum efficiency. Step 1: Determine What Queries to Support
  • 15. www.website.com Use one table per query pattern. If you need to support multiple query patterns, you usually need more than one table. If you need different types of answers, you usually need different tables. This is how you optimize for reads. Remember, in Cassandra data duplication is okay. Many of your tables may repeat the same data. Step 2: Create table for Queries
  • 17. c Example 1: Table Music Playlist In the example, table Music Playlist, ● SongId is the partition key, and ● SongName is the clustering column
  • 18. c Example 1: Table Music Playlist In the example, table Music Playlist, ● SongId and Year are the partition key, and ● SongName is the clustering column.
  • 19. Glance at Use Cases
  • 20. Use Case 1 Suppose that we are storing Facebook posts of different users in Cassandra. Query: Fetch the top ‘N‘ posts made by a given user. We require user_id, post_id and content as fields. The Cassandra table schema for this use case would look like: Stores all data for a particular user on a single partition as per the above guidelines. Using the post timestamp as the clustering key will be helpful for retrieving the top ‘N‘ posts more efficiently.
  • 21. Use Case 2 Suppose that we are storing the details of different partner gyms across the different cities and states of many countries. Query: Fetch the sorted gyms for a given city. We require country_code, state, city, gym_name and opening_date as fields. The Cassandra table schema for this use case would look like: Also, let’s say we need to return the results having gyms sorted by their opening date. Store the gyms located in a given city of a specific state and country on a single partition and use the opening date and gym name as a clustering key.
  • 22. References Baeldung - Cassandra Data Modeling Guru99 - Data Modeling rules in Cassandra Simple Learn - Cassandra Data Modeling Datastax - Cassandra Data Modeling rules
  • 23. Q&A Please email your queries at charmy.garg@knoldus.in