SlideShare una empresa de Scribd logo
1 de 50
Cassandra Insider
By :
Bhavya Aggarwal
Manjot kaur
CONTENTS
● Why NoSQL
● Features of Cassandra
● Gossip Protocol
● Data Distribution in Cassandra
● Write Path
● Read Path
WHY NOSQL
● Within corporations, around 80% of data is
unstructured.
● Availability and Scalability issues with RDBMS.
● NoSQL dbs have horizontal scalability and high
availability, in some cases at the cost of strong
consistency and ACID semantics.
CASSANDRA
● Apache Cassandra is a massively scalable
NoSQL database.
Big Companies using cassandra
More than 30,000 Companies use(or have used)
Apache Cassandra in Production.
FEATURES
● Distributed
● Decentralized
● Linearly scalability
● Tunable consistency
Distributed
Distributed i.e. capable of running on multiple
machines while appearing to users as a unified
whole.
Decentralized
● Decentralized i.e every node is identical
● There is no single point of failure.
Linear Scalability
It means that your cluster can seamlessly scale up
and scale back down.
Tunable Consistency
You can have strict, weak or causal consistency in
cassandra with the help of Replication Factor and
Consistency Level.
Brewer’s CAP Theorem
Cassandra vs RDBMS
Cassandra RDBMS
ACID ❌ yes
Foreign Keys ❌ yes
Joins ❌ yes
Secondary Indexes yes yes
Distributed yes ❌
Linear Scalability yes ❌
Fault Tolerance yes ❌
Cassandra Architecture
In cassandra all the nodes are identical.
A Cassandra cluster has no special nodes i.e. the
cluster has no masters, no slaves or elected leaders.
Cassandra cluster
Cassandra supports a masterless ring architecture.
Tracking Nodes
Lets see how cassandra keeps a track of nodes in a
cluster.
● Gossip Protocol
● Snitches
Gossip protocol
A node/initiator in a cluster chooses a node/peer
randomly to gossip with.
Sends the metadata it has about itself and other
nodes in the cluster.
Receives metadata/updates that the other node has.
Main points
● Every node gossips with every other node in a
cluster every second.
● The Gossiper class maintains a list of nodes that
are alive and dead.
● The gossiper runs every second on a timer on
every node of a cluster.
3 Way Handshake
Snitches
The job of a snitch is to determine relative host
proximity for each node in a cluster, which is used to
determine which nodes to read and write from.
Example: Snitch in Read
Operation
While reading data cassandra must contact a number
of replicas determined by the consistency level. For
fast read operations, it selects a single replica to
query for the full object, and take hash values from
others in order to ensure the latest version of the
requested data is returned.
Snitch finds the closest replica and the coordinator
node queries it for full data.
Example: Snitch in Read
Operation
Data Distribution Across Nodes
● Tokens
● Partitioners
Single Token Architecture
Rings and Tokens
● Each node in the ring is assigned one or more
ranges of data described by a token, which
determines its position in the ring.
● A token is a 64-bit integer ID used to identify each
partition.
Partitioners
● A partitioner, is a hash function for computing the
token of a partition key.
● Each row of data is distributed within the ring
according to the value of the partition key token
calculated by the partitioner at every node.
● Murmur3Partitioner is the default partitioner.
Virtual Nodes
● Cassandra’s 1.2 release introduced the concept of
virtual nodes, instead of assigning a single token
to a node, a range of tokens is assigned.
● By default, each node will be assigned 256 of
these tokens, meaning that it contains 256 virtual
nodes.
Vnode Ring Architecture
Advantages
● Tokens are generated automatically by cassandra.
● Smaller Partitions.
● Less load on nodes.
Replication Strategies
● Cassandra replicates data across nodes in a
manner transparent to the user, and the replication
factor is the number of nodes in your cluster that
will receive copies (replicas) of the same data.
● If your replication factor is 3, then three nodes in
the ring will have copies of each row.
Replication in SimpleStrategy
Consistency Levels
● For read queries, the consistency level specifies
how many replica nodes must respond to a read
request before returning the data.
● For write operations, the consistency level
specifies how many replica nodes must respond
for the write to be reported as successful to the
client.
A Write Request in Cassandra
Write Path in Cassandra
Interactions Within a Node
Hinted Handoff
Tombstones
When you execute a delete operation, the data is not
immediately deleted. Instead, it’s treated as an
update operation that places a tombstone on the
value. A tombstone is a deletion marker that is
required to suppress older data in SSTables until
compaction can run.
READ PATH
Row cache and Key cache
Request flow
Bloom Filters
● Bloom filters condense a larger data set into a
digest string using a hash function.
● The digest strings are stored in memory and are
used to improve performance by reducing the
need for disk access on key lookups.
● So a Bloom filter is a special kind of cache. When
a query is performed, the Bloom filter is checked
first before accessing disk.
Compaction
Replica synchronization
Read repair refers to the synchronization of replicas
as data is read. While reading if any replicas have out
of date values a read repair is performed immediately
to update the out of date replicas.
Anti-entropy repair (manual repair) is a manually
initiated operation performed on nodes as part of a
regular maintenance process. This type of repair is
executed by running nodetool repair on a node to
execute a major compaction
References
● https://docs.datastax.com/en/landing_page/doc/landing_
● https://www.youtube.com/watch?v=FuP1Fvrv6ZQ
● https://www.youtube.com/watch?v=FNfiYJm1GJs&t=153
● Cassandra The Definative Guide O’REILLY 2nd
Edition.
Thank you

Más contenido relacionado

Similar a Cassandra Insider

Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
Sean Murphy
 

Similar a Cassandra Insider (20)

Cassandra
CassandraCassandra
Cassandra
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Cassandra advanced-I
Cassandra advanced-ICassandra advanced-I
Cassandra advanced-I
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
 
Cassandra consistency
Cassandra consistencyCassandra consistency
Cassandra consistency
 
Database Shrading and cassandra architecture
Database Shrading and cassandra architectureDatabase Shrading and cassandra architecture
Database Shrading and cassandra architecture
 
Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 

Más de Knoldus Inc.

Más de Knoldus Inc. (20)

Supply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptxSupply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptx
 
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On Introduction
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptx
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptx
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdf
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptx
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose Kubernetes
 
Introduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxIntroduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptx
 
Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptx
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptx
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptx
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptx
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake Presentation
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics Presentation
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIs
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II Presentation
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRA
 

Último

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

Cassandra Insider

  • 1. Cassandra Insider By : Bhavya Aggarwal Manjot kaur
  • 2. CONTENTS ● Why NoSQL ● Features of Cassandra ● Gossip Protocol ● Data Distribution in Cassandra ● Write Path ● Read Path
  • 3. WHY NOSQL ● Within corporations, around 80% of data is unstructured. ● Availability and Scalability issues with RDBMS. ● NoSQL dbs have horizontal scalability and high availability, in some cases at the cost of strong consistency and ACID semantics.
  • 4. CASSANDRA ● Apache Cassandra is a massively scalable NoSQL database.
  • 5. Big Companies using cassandra More than 30,000 Companies use(or have used) Apache Cassandra in Production.
  • 6. FEATURES ● Distributed ● Decentralized ● Linearly scalability ● Tunable consistency
  • 7. Distributed Distributed i.e. capable of running on multiple machines while appearing to users as a unified whole.
  • 8. Decentralized ● Decentralized i.e every node is identical ● There is no single point of failure.
  • 9. Linear Scalability It means that your cluster can seamlessly scale up and scale back down.
  • 10. Tunable Consistency You can have strict, weak or causal consistency in cassandra with the help of Replication Factor and Consistency Level.
  • 12. Cassandra vs RDBMS Cassandra RDBMS ACID ❌ yes Foreign Keys ❌ yes Joins ❌ yes Secondary Indexes yes yes Distributed yes ❌ Linear Scalability yes ❌ Fault Tolerance yes ❌
  • 13.
  • 14. Cassandra Architecture In cassandra all the nodes are identical. A Cassandra cluster has no special nodes i.e. the cluster has no masters, no slaves or elected leaders.
  • 15. Cassandra cluster Cassandra supports a masterless ring architecture.
  • 16. Tracking Nodes Lets see how cassandra keeps a track of nodes in a cluster. ● Gossip Protocol ● Snitches
  • 17. Gossip protocol A node/initiator in a cluster chooses a node/peer randomly to gossip with. Sends the metadata it has about itself and other nodes in the cluster. Receives metadata/updates that the other node has.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Main points ● Every node gossips with every other node in a cluster every second. ● The Gossiper class maintains a list of nodes that are alive and dead. ● The gossiper runs every second on a timer on every node of a cluster.
  • 26. Snitches The job of a snitch is to determine relative host proximity for each node in a cluster, which is used to determine which nodes to read and write from.
  • 27. Example: Snitch in Read Operation While reading data cassandra must contact a number of replicas determined by the consistency level. For fast read operations, it selects a single replica to query for the full object, and take hash values from others in order to ensure the latest version of the requested data is returned. Snitch finds the closest replica and the coordinator node queries it for full data.
  • 28. Example: Snitch in Read Operation
  • 29. Data Distribution Across Nodes ● Tokens ● Partitioners
  • 31. Rings and Tokens ● Each node in the ring is assigned one or more ranges of data described by a token, which determines its position in the ring. ● A token is a 64-bit integer ID used to identify each partition.
  • 32. Partitioners ● A partitioner, is a hash function for computing the token of a partition key. ● Each row of data is distributed within the ring according to the value of the partition key token calculated by the partitioner at every node. ● Murmur3Partitioner is the default partitioner.
  • 33. Virtual Nodes ● Cassandra’s 1.2 release introduced the concept of virtual nodes, instead of assigning a single token to a node, a range of tokens is assigned. ● By default, each node will be assigned 256 of these tokens, meaning that it contains 256 virtual nodes.
  • 35. Advantages ● Tokens are generated automatically by cassandra. ● Smaller Partitions. ● Less load on nodes.
  • 36. Replication Strategies ● Cassandra replicates data across nodes in a manner transparent to the user, and the replication factor is the number of nodes in your cluster that will receive copies (replicas) of the same data. ● If your replication factor is 3, then three nodes in the ring will have copies of each row.
  • 38. Consistency Levels ● For read queries, the consistency level specifies how many replica nodes must respond to a read request before returning the data. ● For write operations, the consistency level specifies how many replica nodes must respond for the write to be reported as successful to the client.
  • 39. A Write Request in Cassandra
  • 40. Write Path in Cassandra
  • 43. Tombstones When you execute a delete operation, the data is not immediately deleted. Instead, it’s treated as an update operation that places a tombstone on the value. A tombstone is a deletion marker that is required to suppress older data in SSTables until compaction can run.
  • 45. Row cache and Key cache Request flow
  • 46. Bloom Filters ● Bloom filters condense a larger data set into a digest string using a hash function. ● The digest strings are stored in memory and are used to improve performance by reducing the need for disk access on key lookups. ● So a Bloom filter is a special kind of cache. When a query is performed, the Bloom filter is checked first before accessing disk.
  • 48. Replica synchronization Read repair refers to the synchronization of replicas as data is read. While reading if any replicas have out of date values a read repair is performed immediately to update the out of date replicas. Anti-entropy repair (manual repair) is a manually initiated operation performed on nodes as part of a regular maintenance process. This type of repair is executed by running nodetool repair on a node to execute a major compaction
  • 49. References ● https://docs.datastax.com/en/landing_page/doc/landing_ ● https://www.youtube.com/watch?v=FuP1Fvrv6ZQ ● https://www.youtube.com/watch?v=FNfiYJm1GJs&t=153 ● Cassandra The Definative Guide O’REILLY 2nd Edition.

Notas del editor

  1. UNDER HIGH LOADS JOINS MAKES OUR QUERIES SLOW SO WE TEND TO DENORMALIZE OUR TABLES
  2. Big companies effectively managing their big data . Started with facebook Inbox search in 2009.
  3. We have a cluster in cassandra , which is a group of several nodes. A node is a cassandra server/or a cassandra instance that we run on a machine.
  4. There is no master- slave architecture in cassandra, no special nodes every node is same and have similar responsibilities in cassandra. There is no single point of failure means that is any node in the cluster fails then it does not affect any functionalities(read/ write) of cassandra. Cassandra stores replicas in various nodes so if a node fails then also the data belonging to that node can be retrieved.
  5. If we add nodes to our cluster then the throughput increases linearly without affecting performance. Cassandra can handle data loads gracefully.
  6. We set replication factor per keyspace in cassandra . Replication Factor = How many replicas we want for our data in our system. Consistency can be set per read write query .
  7. Cassandra has partition tolerance and availability and is eventually consistent.
  8. A row must is indexed by partition key and can searched only by partition key. We define the partition key while defining the table itself. We have to set replication factor and strategy for every keyspace in cassandra.
  9. So how does nodes in cassandra store information about other nodes in a cluster ?
  10. A communication protocol
  11. Explain replicas for read and write path.
  12. Partitioner is present at every node of the cluster. This partition key token generated by the partitioner is compared to the token values for the various nodes to identify the range, and therefore the node, that owns the data. Token ranges are represented by the org.apache.cassandra.dht.Range class.
  13. Early versions of Cassandra assigned a single token to each node, in a fairly static manner, requiring you to calculate tokens for each node.
  14. To understand read and write paths we must understand Replication Strategies and consistency level.
  15. Use for a single data center only. If you ever intend more than one data center, use the NetworkTopologyStrategy.
  16. Because Cassandra is eventually consistent, updates to other replica nodes may continue in the background. ALL, QUORUM, ONE are some of the consistency levels available. Consistency level can be configured on a cluster, datacenter, or individual I/O operation basis. Consistency among participating nodes can be set globally and also controlled on a per-operation basis (for example insert or update) using Cassandra’s drivers and client libraries.
  17. Suppose a write request is sent to Cassandra, but a replica node where the write belongs is not available ,then the coordinator will create a hint for the other node and store it and once it detects via gossip that the other is back online, the coordinator node will send hint to other node. consider a cluster consisting of three nodes, A, B, and C,with a replication factor of 2. When a row K is written to the coordinator (node A in this case), even if node C is down, the consistency level of ONE or QUORUM can be met. Why? Both nodes A and B will receive the data, so the consistency level requirement is met. A hint is stored for node C and written when node C comes up. In the meantime, the coordinator can acknowledge that the write succeeded.
  18. A compaction operation in Cassandra is performed in order to merge SSTables. During compaction, the data in SSTables is merged: the keys are merged, columns are combined, tombstones are discarded, and a new index is created. Compaction is the process of freeing up space by merging large accumulated datafiles