SlideShare a Scribd company logo
1 of 31
Download to read offline
Introduction to
  Cassandra




    Shimi Kiviti
    @shimi_k
Motivation

            Scaling

How do you scale your database?
 ● reads
 ● writes
Influential Papers

 ● Bigtable: A distributed storage system for structured data,
   2006
 ● Dynamo: amazon's highly available key-value store, 2007


Cassandra:
 ● partition and replication - Dynamo
 ● log structure column family - Bigtable
Cassandra Highlights

● Symmetric - all nodes are exactly the same
   ○ No single point of failure
   ○ Linearly scalable
   ○ Ease of administration
● High availability with multiple datacenters
● Consistency vs Latency
● Read/Write anywhere
● Flexible Schema
● Column TTL
● Distributed Counters
DHT - Distributed Hash Table
DHT

● O(1) node lookup
● Explicit replication
● Linear Scalability
Consistency

N = Replication factor
R = Number of replicas to block when read <= N
W = Number of replicas to block when write <= N
Quorum = N/2 + 1

When W + R > N there is a full consistency
examples:
 ● W = 1, R = N
 ● W = N, R = 1
 ● W = Quorum, R = Quorum
Consistency Level

● Every request defines consistency level
   ○ Any
   ○ One
   ○ Two
   ○ Three
   ○ Quorum
   ○ Local Quorum
   ○ Each Quorum
   ○ All
Data Model

● Keyspace ~ schema
● ColumnFamilies ~ table
● Rows
● Columns
Column Family

Key1   Column   Column   Column


Key2   Column   Column
Column Family

ColumnFamily: {
  TOK: {
    chen: 1,
    ronen: 7
  }
  CityPath: {
    yuval: 5
  }
}
Super Column Family
          Super1   Column Column Column
Key
          Super2   Column Column Column

 ColumnFamily: {
   Key: {
     super1: {
       name: value,
       name: value
     }
     super2: {
       name: value
     }
   }
 }
Write

● Any node
● Partitioner
● Commit log, memtable
● Wait for W responses
Write
Write

● No reads
● No seeks
● Sequential disk access
● Atomic within a column family
● Fast
● Always writeable (hinted hand-off)
Read

● Choose any node
● Partitioner
● Wait for R responses
● tunable read repair in the background
Read




Read can be from multiple SSTables
Slower then writes
Cache

● There is no need to use memcached
● There is an internal configurable cache
   ○ Key cache
   ○ Row cache
Sorting

When you preform get the result is sorted
 ● Rows are sorted according to the partitioner
 ● Columns in a row are sorted according to the type of the
   column name
Partitioner

● RandomPartitioner - Uses hash values as tokens. useful for
  distributing the load on all nodes.
  If you use it, set the nodes tokens manually

● OrderPreservePartioner - You can get sorted rows but it will
  cost you with an even cluster
Column Types

Available types:
 ● Bytes
 ● UTF8
 ● Ascii
 ● Long
 ● Date
 ● UUID
 ● Composite - <Type1>:<Type2>
Column Types

Examples:

Sort1:
8            10
9      vs    8
10           9

Sort2:
dan:8             dan:10
dan:10      vs    dan:8
shimi:1           shimi:1
Clients

● Thrift - Cassandra driver level interface
● CQL - Cassandra query language (SQL like)
● High level clients:
   ○ Python
   ○ Java
   ○ Scala
   ○ Clojure
   ○ .Net
   ○ Ruby
   ○ PHP
   ○ Perl
   ○ C++
   ○ Haskel
Cascal - Scala client

Insert column:

session.insert("app"  "users"  "shimi"  "passwd"  "mypass")

val key = "app"  "users"  "shimi"
session.insert(key  "email"  "shimi.k@...")


Get column value:

val pass = session.get(key  "passwd")
Cascal

Get multiple columns:

val row = session.list(key)
val cols = session.list(key, RangePredicate("email", "passwd"))
val cols = session.list(key, ColumnPredicate( List("passwd", "email") ))
Cascal

Get multiple rows:

val family = "app"  "users"
val rows = session.list(family, RangePredicate("dan", "shimi"))
val rows = session.list(family, KeyPrdicate("dan", "shimi"))
Cascal

Remove column:
session.remove("app"  "users"  "shimi"  "passwd")


Remove row:
session.remove("app"  "users"  "shimi")


Batch operations:

val deleteCols = Delete(key, ColumnPredicate("age" :: "sex"))
val insertEmail = Insert(key  "email"  "shimi.k@...")
session.batch(insertEmail :: deleteCols)
Guidelines

● Keep together the data you query together
● Think about your use case and how you should fetch your
  data.
● Don't try to normalize your data
● You can't win the disk
● Be ready to get your hands dirty
● There is no single solution for everything. You might
  consider using different solutions together
The End

Useful links:
 ● Cassandra, http://cassandra.apache.org/
 ● Wiki http://wiki.apache.org/cassandra/
 ● Cassandra mailing list
 ● IRC
 ● Bigtable, http://labs.google.com/papers/bigtable.html
 ● Dynamo http://www.allthingsdistributed.
   com/2007/10/amazons_dynamo.html
 ● Cascal, https://github.com/shimi/cascal

More Related Content

What's hot

openTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldopenTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldOliver Hankeln
 
"Scala in Goozy", Alexey Zlobin
"Scala in Goozy", Alexey Zlobin "Scala in Goozy", Alexey Zlobin
"Scala in Goozy", Alexey Zlobin Vasil Remeniuk
 
Log stage zero-cost structured logging
Log stage  zero-cost structured loggingLog stage  zero-cost structured logging
Log stage zero-cost structured loggingMaksym Ratoshniuk
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overviewbtoddb
 
Viliam Ganz - Domain Specific Languages
Viliam Ganz - Domain Specific LanguagesViliam Ganz - Domain Specific Languages
Viliam Ganz - Domain Specific LanguagesDavinci software
 
Query hierarchical data the easy way, with CTEs
Query hierarchical data the easy way, with CTEsQuery hierarchical data the easy way, with CTEs
Query hierarchical data the easy way, with CTEsMariaDB plc
 
XML Schema and RELAX NG Element Comparison
XML Schema and RELAX NG Element ComparisonXML Schema and RELAX NG Element Comparison
XML Schema and RELAX NG Element ComparisonOverdue Books LLC
 
Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureDr. Christian Betz
 
C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner
C* Summit 2013: Time-Series Metrics with Cassandra by Mike HeffnerC* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner
C* Summit 2013: Time-Series Metrics with Cassandra by Mike HeffnerDataStax Academy
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017HBaseCon
 

What's hot (14)

openTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldopenTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed world
 
"Scala in Goozy", Alexey Zlobin
"Scala in Goozy", Alexey Zlobin "Scala in Goozy", Alexey Zlobin
"Scala in Goozy", Alexey Zlobin
 
Log stage zero-cost structured logging
Log stage  zero-cost structured loggingLog stage  zero-cost structured logging
Log stage zero-cost structured logging
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overview
 
11 bytecode
11 bytecode11 bytecode
11 bytecode
 
Viliam Ganz - Domain Specific Languages
Viliam Ganz - Domain Specific LanguagesViliam Ganz - Domain Specific Languages
Viliam Ganz - Domain Specific Languages
 
Query hierarchical data the easy way, with CTEs
Query hierarchical data the easy way, with CTEsQuery hierarchical data the easy way, with CTEs
Query hierarchical data the easy way, with CTEs
 
XML Schema and RELAX NG Element Comparison
XML Schema and RELAX NG Element ComparisonXML Schema and RELAX NG Element Comparison
XML Schema and RELAX NG Element Comparison
 
Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and Clojure
 
Clojure Small Intro
Clojure Small IntroClojure Small Intro
Clojure Small Intro
 
C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner
C* Summit 2013: Time-Series Metrics with Cassandra by Mike HeffnerC* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner
C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017
 
Meet the-other-elephant
Meet the-other-elephantMeet the-other-elephant
Meet the-other-elephant
 
SAX PARSER
SAX PARSER SAX PARSER
SAX PARSER
 

Viewers also liked

Dare to be Digital 2012 - Information presentation
Dare to be Digital 2012 - Information presentation Dare to be Digital 2012 - Information presentation
Dare to be Digital 2012 - Information presentation Dare to be Digital
 
Pa yessy
Pa yessyPa yessy
Pa yessySJM
 
7 สามัญ อังกฤษ
7 สามัญ อังกฤษ7 สามัญ อังกฤษ
7 สามัญ อังกฤษWarangkana Singthong
 
Roshoman: The Truth About the Truth
Roshoman: The Truth About the Truth Roshoman: The Truth About the Truth
Roshoman: The Truth About the Truth Azzikorn
 
Shreya bhaveshreception airport
Shreya bhaveshreception airportShreya bhaveshreception airport
Shreya bhaveshreception airportdoshi15
 
ΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑ
ΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑ
ΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑEleni Papadopoulou
 
2nd Equitarian Workshop Logistics
2nd Equitarian Workshop Logistics2nd Equitarian Workshop Logistics
2nd Equitarian Workshop Logisticsequitarian
 
ituren eta zubieta inauteriak
ituren eta zubieta inauteriakituren eta zubieta inauteriak
ituren eta zubieta inauteriakIratxe Allende
 
L'onada perillosa
L'onada perillosaL'onada perillosa
L'onada perillosacarmeo
 
Movi moves
Movi movesMovi moves
Movi movesmiloherr
 

Viewers also liked (20)

Dare to be Digital 2012 - Information presentation
Dare to be Digital 2012 - Information presentation Dare to be Digital 2012 - Information presentation
Dare to be Digital 2012 - Information presentation
 
Pa yessy
Pa yessyPa yessy
Pa yessy
 
Gp
GpGp
Gp
 
Lantz inauteri
Lantz inauteriLantz inauteri
Lantz inauteri
 
Halo3 .pdf
Halo3 .pdfHalo3 .pdf
Halo3 .pdf
 
Front covers comparison
Front covers comparisonFront covers comparison
Front covers comparison
 
Maintenance Engineering
Maintenance EngineeringMaintenance Engineering
Maintenance Engineering
 
7 สามัญ อังกฤษ
7 สามัญ อังกฤษ7 สามัญ อังกฤษ
7 สามัญ อังกฤษ
 
Roshoman: The Truth About the Truth
Roshoman: The Truth About the Truth Roshoman: The Truth About the Truth
Roshoman: The Truth About the Truth
 
Ituren eta zubieta3
Ituren eta zubieta3Ituren eta zubieta3
Ituren eta zubieta3
 
IKT PROIEKTUA
IKT PROIEKTUAIKT PROIEKTUA
IKT PROIEKTUA
 
Lantz inauteri
Lantz inauteriLantz inauteri
Lantz inauteri
 
Shreya bhaveshreception airport
Shreya bhaveshreception airportShreya bhaveshreception airport
Shreya bhaveshreception airport
 
Amit kumar mishra
Amit kumar mishraAmit kumar mishra
Amit kumar mishra
 
ΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑ
ΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑ
ΧΡΗΣΙΜΕΣ ΔΙΕΥΘΥΝΣΕΙΣ ΓΙΑ ΔΙΔΑΚΤΙΚΑ ΣΕΝΑΡΙΑ
 
2nd Equitarian Workshop Logistics
2nd Equitarian Workshop Logistics2nd Equitarian Workshop Logistics
2nd Equitarian Workshop Logistics
 
ituren eta zubieta inauteriak
ituren eta zubieta inauteriakituren eta zubieta inauteriak
ituren eta zubieta inauteriak
 
Gruppo_8_tirapelle_sean
Gruppo_8_tirapelle_seanGruppo_8_tirapelle_sean
Gruppo_8_tirapelle_sean
 
L'onada perillosa
L'onada perillosaL'onada perillosa
L'onada perillosa
 
Movi moves
Movi movesMovi moves
Movi moves
 

Similar to Introduction to Cassandra

On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache CassandraStu Hood
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUGStu Hood
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in productionvalstadsve
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache CassandraSaeid Zebardast
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamojbellis
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
Rust All Hands Winter 2011
Rust All Hands Winter 2011Rust All Hands Winter 2011
Rust All Hands Winter 2011Patrick Walton
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraRobbie Strickland
 
GBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O APIGBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O APISri Ambati
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfCédrick Lunven
 
Programming in scala - 1
Programming in scala - 1Programming in scala - 1
Programming in scala - 1Mukesh Kumar
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introductionShehaaz Saif
 

Similar to Introduction to Cassandra (20)

On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in production
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Rust All Hands Winter 2011
Rust All Hands Winter 2011Rust All Hands Winter 2011
Rust All Hands Winter 2011
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
GBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O APIGBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O API
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
 
Programming in scala - 1
Programming in scala - 1Programming in scala - 1
Programming in scala - 1
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introduction
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Introduction to Cassandra

  • 1. Introduction to Cassandra Shimi Kiviti @shimi_k
  • 2. Motivation Scaling How do you scale your database? ● reads ● writes
  • 3.
  • 4. Influential Papers ● Bigtable: A distributed storage system for structured data, 2006 ● Dynamo: amazon's highly available key-value store, 2007 Cassandra: ● partition and replication - Dynamo ● log structure column family - Bigtable
  • 5. Cassandra Highlights ● Symmetric - all nodes are exactly the same ○ No single point of failure ○ Linearly scalable ○ Ease of administration ● High availability with multiple datacenters ● Consistency vs Latency ● Read/Write anywhere ● Flexible Schema ● Column TTL ● Distributed Counters
  • 6. DHT - Distributed Hash Table
  • 7. DHT ● O(1) node lookup ● Explicit replication ● Linear Scalability
  • 8.
  • 9. Consistency N = Replication factor R = Number of replicas to block when read <= N W = Number of replicas to block when write <= N Quorum = N/2 + 1 When W + R > N there is a full consistency examples: ● W = 1, R = N ● W = N, R = 1 ● W = Quorum, R = Quorum
  • 10. Consistency Level ● Every request defines consistency level ○ Any ○ One ○ Two ○ Three ○ Quorum ○ Local Quorum ○ Each Quorum ○ All
  • 11. Data Model ● Keyspace ~ schema ● ColumnFamilies ~ table ● Rows ● Columns
  • 12. Column Family Key1 Column Column Column Key2 Column Column
  • 13. Column Family ColumnFamily: { TOK: { chen: 1, ronen: 7 } CityPath: { yuval: 5 } }
  • 14. Super Column Family Super1 Column Column Column Key Super2 Column Column Column ColumnFamily: { Key: { super1: { name: value, name: value } super2: { name: value } } }
  • 15. Write ● Any node ● Partitioner ● Commit log, memtable ● Wait for W responses
  • 16. Write
  • 17. Write ● No reads ● No seeks ● Sequential disk access ● Atomic within a column family ● Fast ● Always writeable (hinted hand-off)
  • 18. Read ● Choose any node ● Partitioner ● Wait for R responses ● tunable read repair in the background
  • 19. Read Read can be from multiple SSTables Slower then writes
  • 20. Cache ● There is no need to use memcached ● There is an internal configurable cache ○ Key cache ○ Row cache
  • 21. Sorting When you preform get the result is sorted ● Rows are sorted according to the partitioner ● Columns in a row are sorted according to the type of the column name
  • 22. Partitioner ● RandomPartitioner - Uses hash values as tokens. useful for distributing the load on all nodes. If you use it, set the nodes tokens manually ● OrderPreservePartioner - You can get sorted rows but it will cost you with an even cluster
  • 23. Column Types Available types: ● Bytes ● UTF8 ● Ascii ● Long ● Date ● UUID ● Composite - <Type1>:<Type2>
  • 24. Column Types Examples: Sort1: 8 10 9 vs 8 10 9 Sort2: dan:8 dan:10 dan:10 vs dan:8 shimi:1 shimi:1
  • 25. Clients ● Thrift - Cassandra driver level interface ● CQL - Cassandra query language (SQL like) ● High level clients: ○ Python ○ Java ○ Scala ○ Clojure ○ .Net ○ Ruby ○ PHP ○ Perl ○ C++ ○ Haskel
  • 26. Cascal - Scala client Insert column: session.insert("app" "users" "shimi" "passwd" "mypass") val key = "app" "users" "shimi" session.insert(key "email" "shimi.k@...") Get column value: val pass = session.get(key "passwd")
  • 27. Cascal Get multiple columns: val row = session.list(key) val cols = session.list(key, RangePredicate("email", "passwd")) val cols = session.list(key, ColumnPredicate( List("passwd", "email") ))
  • 28. Cascal Get multiple rows: val family = "app" "users" val rows = session.list(family, RangePredicate("dan", "shimi")) val rows = session.list(family, KeyPrdicate("dan", "shimi"))
  • 29. Cascal Remove column: session.remove("app" "users" "shimi" "passwd") Remove row: session.remove("app" "users" "shimi") Batch operations: val deleteCols = Delete(key, ColumnPredicate("age" :: "sex")) val insertEmail = Insert(key "email" "shimi.k@...") session.batch(insertEmail :: deleteCols)
  • 30. Guidelines ● Keep together the data you query together ● Think about your use case and how you should fetch your data. ● Don't try to normalize your data ● You can't win the disk ● Be ready to get your hands dirty ● There is no single solution for everything. You might consider using different solutions together
  • 31. The End Useful links: ● Cassandra, http://cassandra.apache.org/ ● Wiki http://wiki.apache.org/cassandra/ ● Cassandra mailing list ● IRC ● Bigtable, http://labs.google.com/papers/bigtable.html ● Dynamo http://www.allthingsdistributed. com/2007/10/amazons_dynamo.html ● Cascal, https://github.com/shimi/cascal