SlideShare una empresa de Scribd logo
1 de 12
Cassandra Tutorial
Apache Cassandra is a free open source
and distributed database management
system.It is highly scalable and designed
to manage very large amounts of
structured data. It provides high
availability with no single point of failure.
NoSQLDatabase
• A NoSQL database (sometimes called as Not Only SQL) is a
database that provides a mechanism to store and retrieve data other
than the tabular relations used in relational databases. These
databases are schema-free, support easy replication, have simple
API, eventually consistent, and can handle huge amounts of data.
• The primary objective of a NoSQL database is to have
• simplicity of design,
• horizontal scaling
• finer control over availability.
• NoSql databases use different data structures compared to
relational databases. It makes some operations faster in NoSQL. The
suitability of a given NoSQL database depends on the problem it
must solve.
• Apache Cassandra is an open source distributed database
system that is designed for storing and managing large
amounts of data across commodity servers. Cassandra can
serve as both a real-time operational data store for online
transactional applications and a read-intensive database for
large-scale business intelligence systems.
• Originally created for facebook, Cassandra is designed to have
peer to peer symmetric nodes, instead of master or named
nodes, to ensure there can never be a single point of failure
Cassandra automatically partitions data across all the nodes
in the database cluster, but the administrator has the power to
determine what data will be replicated and how many copies
of the data will be created.
Features of Cassandra
• Cassandra Features:
• Elastic scalability - Cassandra is highly scalable; it allows to add more hardware to
accommodate more customers and more data as per requirement.
• Always on architecture - Cassandra has no single point of failure and it is continuously
available for business-critical applications that cannot afford a failure.
• Fast linear-scale performance - Cassandra is linearly scalable, i.e., it increases your
throughput as you increase the number of nodes in the cluster. Therefore it maintains a
quick response time.
• Flexible data storage - Cassandra accommodates all possible data formats including:
structured, semi-structured, and unstructured. It can dynamically accommodate changes to
your data structures according to your need.
• Easy data distribution - Cassandra provides the flexibility to distribute data where you
need by replicating data across multiple data centers.
• Transaction support - Cassandra supports properties like Atomicity, Consistency,
Isolation, and Durability (ACID).
• Fast writes - Cassandra was designed to run on cheap commodity hardware. It performs
blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the
read efficiency.
Components of Cassandra
• Cassandra uses the Gossip Protocol in the background to allow the nodes
to communicate with each other and detect any faulty nodes in the cluster.
• The key components of Cassandra are as follows −
• Node − It is the place where data is stored.
• Data center − It is a collection of related nodes.
• Cluster − A cluster is a component that contains one or more data centers.
• Commit log − The commit log is a crash-recovery mechanism in
Cassandra. Every write operation is written to the commit log.
• Mem-table − A mem-table is a memory-resident data structure. After
commit log, the data will be written to the mem-table. Sometimes, for a
single-column family, there will be multiple mem-tables.
• SSTable − It is a disk file to which the data is flushed from the mem-table
when its contents reach a threshold value.
• Bloom filter − These are nothing but quick, nondeterministic, algorithms
for testing whether an element is a member of a set. It is a special kind of
cache. Bloom filters are accessed after every query.
Apache Cassandra data types
• Apache Cassandra NoSQL DBMS supports the most
common data types, including ASCII, bigint, BLOB,
Boolean, counter, decimal, double, float, int, text,
timestamp, UUID, VARCHAR and varint.
• Cassandra's data model offers the convenience of
column indexes with the performance of log-
structured updates, strong support for
denormalization and materialized views, and built-
in caching.
• Data access is performed using Cassandra Query
Language (CQL), which resembles SQL.
Cassandra Query Language
• Users can access Cassandra through its nodes using
Cassandra Query Language (CQL). CQL treats the
database (Keyspace) as a container of tables.
Programmers use cqlsh: a prompt to work with CQL or
separate application language drivers.
• Clients approach any of the nodes for their read-write
operations. That node (coordinator) plays a proxy
between the client and the nodes holding the data.
• Data storage in Cassandra is row-oriented, meaning that
all contents of a row are serialized together on disk.
Every row of columns has its unique key. Each row can
hold up to 2 billion columns .Furthermore, each row
must fit onto a single server, because data is partitioned
solely by row-key.
• To understand why databases like Cassandra, HBase and
BigTable (I’ll call them DSS, Distributed Storage
Services, from now on) were designed the way they are,
we’ll first have to understand what they were built to be
used for.
• DSS(A decision support system (DSS) is a computer-based
information system that supports business or organizational
decision-making activities. were designed to handle enormous
amounts of data, stored in billions of rows on large clusters.
Relational databases incorporate a lot of things that make it hard to
efficiently distribute them over multiple machines. DSS simply
remove some or all of these ties. No operations are allowed, that
require scanning extensive parts of the dataset, meaning no JOINS
or rich-queries
• Cassandra is a NoSQL Column family implementation supporting
the Big Table data model using the architectural aspects introduced
by Amazon Dynamo.
column family
• Cassandra consists of many storage nodes and stores each row
within a single storage node. Within each row, Cassandra
always stores columns sorted by their column names. Using
this sort order, Cassandra supports slice queries where given a
row, users can retrieve a subset of its columns falling within a
given column name range. For example, a slice query with
range tag0 to tag9999 will get all the columns whose names
fall between tag0 and tag9999.
• Keyspace – a group of many column families together. It is
only a logical grouping of column families and provides an
isolated scope for names.
• Finally, super columns reside within a column family that
groups several columns under a one key.
• Cassandra provides very fast writes, and they are actually
faster than reads where it can transfer data about 80-
360MB/sec per node. It achieves this using two
techniques.Cassandra keeps most of the data within memory
at the responsible node, and any updates are done in the
memory and written to the persistent storage (file system) in a
lazy fashion. To avoid losing data, however, Cassandra writes
all transactions to a commit log in the disk. Unlike updating
data items in the disk, writes to commit logs are append-only
and, therefore, avoid rotational delay while writing to the
disk. For more information on disk-drive performance
characteristics, see Resources.
• Unless writes have requested full consistency, Cassandra writes data to enough
nodes without resolving any data inconsistencies where it resolves
inconsistencies only at the first read. This process is called "read repair.“
• Healing from failure is manual
• If a node in a Cassandra cluster has failed, the cluster will continue to work if
you have replicas. Full recovery, which is to redistribute data and compensate
for missing replicas, is a manual operation through a command line tool
called node tool. Also, while the manual operation happens, the system will be
unavailable.
• It remembers deletes
• Cassandra is designed such that it continues to work without a problem even if a
node goes down (or gets disconnected) and comes back later. A consequence is
this complicates data deletions. For example, assume a node is down. While
down, a data item has been deleted in replicas. When the unavailable node
comes back on, it will reintroduce the deleted data item at the syncing process
unless Cassandra remembers that data item has been deleted.

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Cassandra
CassandraCassandra
Cassandra
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud Database
 
Cassandra ppt 2
Cassandra ppt 2Cassandra ppt 2
Cassandra ppt 2
 
Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overview
 
Why Cassandra?
Why Cassandra?Why Cassandra?
Why Cassandra?
 
Apache Cassandra
Apache CassandraApache Cassandra
Apache Cassandra
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraIntroduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache Cassandra
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
 

Similar a Cassandra tutorial

04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
hothyfa
 

Similar a Cassandra tutorial (20)

04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
cassandra
cassandracassandra
cassandra
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
Dsm project-h base-cassandra
Dsm project-h base-cassandraDsm project-h base-cassandra
Dsm project-h base-cassandra
 
cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
 
Column db dol
Column db dolColumn db dol
Column db dol
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMCASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Cassndra (4).pptx
Cassndra (4).pptxCassndra (4).pptx
Cassndra (4).pptx
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
 
Apache Cassandra.pptx
Apache Cassandra.pptxApache Cassandra.pptx
Apache Cassandra.pptx
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 

Más de Ramakrishna kapa

Más de Ramakrishna kapa (20)

Load balancer in mule
Load balancer in muleLoad balancer in mule
Load balancer in mule
 
Anypoint connectors
Anypoint connectorsAnypoint connectors
Anypoint connectors
 
Batch processing
Batch processingBatch processing
Batch processing
 
Msmq connectivity
Msmq connectivityMsmq connectivity
Msmq connectivity
 
Scopes in mule
Scopes in muleScopes in mule
Scopes in mule
 
Data weave more operations
Data weave more operationsData weave more operations
Data weave more operations
 
Basic math operations using dataweave
Basic math operations using dataweaveBasic math operations using dataweave
Basic math operations using dataweave
 
Dataweave types operators
Dataweave types operatorsDataweave types operators
Dataweave types operators
 
Operators in mule dataweave
Operators in mule dataweaveOperators in mule dataweave
Operators in mule dataweave
 
Data weave in mule
Data weave in muleData weave in mule
Data weave in mule
 
Servicenow connector
Servicenow connectorServicenow connector
Servicenow connector
 
Introduction to testing mule
Introduction to testing muleIntroduction to testing mule
Introduction to testing mule
 
Choice flow control
Choice flow controlChoice flow control
Choice flow control
 
Message enricher example
Message enricher exampleMessage enricher example
Message enricher example
 
Mule exception strategies
Mule exception strategiesMule exception strategies
Mule exception strategies
 
Anypoint connector basics
Anypoint connector basicsAnypoint connector basics
Anypoint connector basics
 
Mule global elements
Mule global elementsMule global elements
Mule global elements
 
Mule message structure and varibles scopes
Mule message structure and varibles scopesMule message structure and varibles scopes
Mule message structure and varibles scopes
 
How to create an api in mule
How to create an api in muleHow to create an api in mule
How to create an api in mule
 
Log4j is a reliable, fast and flexible
Log4j is a reliable, fast and flexibleLog4j is a reliable, fast and flexible
Log4j is a reliable, fast and flexible
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Cassandra tutorial

  • 1. Cassandra Tutorial Apache Cassandra is a free open source and distributed database management system.It is highly scalable and designed to manage very large amounts of structured data. It provides high availability with no single point of failure.
  • 2. NoSQLDatabase • A NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases. These databases are schema-free, support easy replication, have simple API, eventually consistent, and can handle huge amounts of data. • The primary objective of a NoSQL database is to have • simplicity of design, • horizontal scaling • finer control over availability. • NoSql databases use different data structures compared to relational databases. It makes some operations faster in NoSQL. The suitability of a given NoSQL database depends on the problem it must solve.
  • 3. • Apache Cassandra is an open source distributed database system that is designed for storing and managing large amounts of data across commodity servers. Cassandra can serve as both a real-time operational data store for online transactional applications and a read-intensive database for large-scale business intelligence systems. • Originally created for facebook, Cassandra is designed to have peer to peer symmetric nodes, instead of master or named nodes, to ensure there can never be a single point of failure Cassandra automatically partitions data across all the nodes in the database cluster, but the administrator has the power to determine what data will be replicated and how many copies of the data will be created.
  • 4. Features of Cassandra • Cassandra Features: • Elastic scalability - Cassandra is highly scalable; it allows to add more hardware to accommodate more customers and more data as per requirement. • Always on architecture - Cassandra has no single point of failure and it is continuously available for business-critical applications that cannot afford a failure. • Fast linear-scale performance - Cassandra is linearly scalable, i.e., it increases your throughput as you increase the number of nodes in the cluster. Therefore it maintains a quick response time. • Flexible data storage - Cassandra accommodates all possible data formats including: structured, semi-structured, and unstructured. It can dynamically accommodate changes to your data structures according to your need. • Easy data distribution - Cassandra provides the flexibility to distribute data where you need by replicating data across multiple data centers. • Transaction support - Cassandra supports properties like Atomicity, Consistency, Isolation, and Durability (ACID). • Fast writes - Cassandra was designed to run on cheap commodity hardware. It performs blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the read efficiency.
  • 5. Components of Cassandra • Cassandra uses the Gossip Protocol in the background to allow the nodes to communicate with each other and detect any faulty nodes in the cluster. • The key components of Cassandra are as follows − • Node − It is the place where data is stored. • Data center − It is a collection of related nodes. • Cluster − A cluster is a component that contains one or more data centers. • Commit log − The commit log is a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log. • Mem-table − A mem-table is a memory-resident data structure. After commit log, the data will be written to the mem-table. Sometimes, for a single-column family, there will be multiple mem-tables. • SSTable − It is a disk file to which the data is flushed from the mem-table when its contents reach a threshold value. • Bloom filter − These are nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.
  • 6. Apache Cassandra data types • Apache Cassandra NoSQL DBMS supports the most common data types, including ASCII, bigint, BLOB, Boolean, counter, decimal, double, float, int, text, timestamp, UUID, VARCHAR and varint. • Cassandra's data model offers the convenience of column indexes with the performance of log- structured updates, strong support for denormalization and materialized views, and built- in caching. • Data access is performed using Cassandra Query Language (CQL), which resembles SQL.
  • 7. Cassandra Query Language • Users can access Cassandra through its nodes using Cassandra Query Language (CQL). CQL treats the database (Keyspace) as a container of tables. Programmers use cqlsh: a prompt to work with CQL or separate application language drivers. • Clients approach any of the nodes for their read-write operations. That node (coordinator) plays a proxy between the client and the nodes holding the data.
  • 8. • Data storage in Cassandra is row-oriented, meaning that all contents of a row are serialized together on disk. Every row of columns has its unique key. Each row can hold up to 2 billion columns .Furthermore, each row must fit onto a single server, because data is partitioned solely by row-key. • To understand why databases like Cassandra, HBase and BigTable (I’ll call them DSS, Distributed Storage Services, from now on) were designed the way they are, we’ll first have to understand what they were built to be used for.
  • 9. • DSS(A decision support system (DSS) is a computer-based information system that supports business or organizational decision-making activities. were designed to handle enormous amounts of data, stored in billions of rows on large clusters. Relational databases incorporate a lot of things that make it hard to efficiently distribute them over multiple machines. DSS simply remove some or all of these ties. No operations are allowed, that require scanning extensive parts of the dataset, meaning no JOINS or rich-queries • Cassandra is a NoSQL Column family implementation supporting the Big Table data model using the architectural aspects introduced by Amazon Dynamo.
  • 10. column family • Cassandra consists of many storage nodes and stores each row within a single storage node. Within each row, Cassandra always stores columns sorted by their column names. Using this sort order, Cassandra supports slice queries where given a row, users can retrieve a subset of its columns falling within a given column name range. For example, a slice query with range tag0 to tag9999 will get all the columns whose names fall between tag0 and tag9999. • Keyspace – a group of many column families together. It is only a logical grouping of column families and provides an isolated scope for names. • Finally, super columns reside within a column family that groups several columns under a one key.
  • 11. • Cassandra provides very fast writes, and they are actually faster than reads where it can transfer data about 80- 360MB/sec per node. It achieves this using two techniques.Cassandra keeps most of the data within memory at the responsible node, and any updates are done in the memory and written to the persistent storage (file system) in a lazy fashion. To avoid losing data, however, Cassandra writes all transactions to a commit log in the disk. Unlike updating data items in the disk, writes to commit logs are append-only and, therefore, avoid rotational delay while writing to the disk. For more information on disk-drive performance characteristics, see Resources.
  • 12. • Unless writes have requested full consistency, Cassandra writes data to enough nodes without resolving any data inconsistencies where it resolves inconsistencies only at the first read. This process is called "read repair.“ • Healing from failure is manual • If a node in a Cassandra cluster has failed, the cluster will continue to work if you have replicas. Full recovery, which is to redistribute data and compensate for missing replicas, is a manual operation through a command line tool called node tool. Also, while the manual operation happens, the system will be unavailable. • It remembers deletes • Cassandra is designed such that it continues to work without a problem even if a node goes down (or gets disconnected) and comes back later. A consequence is this complicates data deletions. For example, assume a node is down. While down, a data item has been deleted in replicas. When the unavailable node comes back on, it will reintroduce the deleted data item at the syncing process unless Cassandra remembers that data item has been deleted.