SlideShare una empresa de Scribd logo
1 de 15
UDA SE Tech Talk
Sep 20 2012




                   CASSANDRA
Agenda
•What is Cassandra?
•Architecture
•Why use Cassandra?
•Current Consumers/use cases?
•Limitations
•Demo

                2
CASSANDRA




    3
What is Cassandra?

-Free, Open Source, Distributed database
-Written by 2 Facebook Engineers
-Hybrid of
  -BigTable from Google
  -DynamoDB from Amazon
-For Structured, Semi-structured, Unstructured Data
-Designed to scale across commodity servers
-Assures AP out of CAP
  -(Consistency, Availability, Partition Tolerance)



                                    4
Architectural Overview
           •Independent nodes form cluster

           •All nodes peers

           •Gossip protocol to discover/connect nodes

           •Gossip process runs every second

           •Nodes exchanges state mesgs with max 3 nodes

           •Nodes exchange info about themselves/Others

           •Seed Nodes have cluster info in cassandra.yaml file

           •All nodes have same seed nodes in their config file

           •Nodes remember all gossip info since last restart




           5
Data Partitioning
     •Should be decided when setting up
     •Total Data managed by Cassandra like a Ring
     •Ring is divided into Ranges
     •Each node responsible for one or more
     •Before a node joins it is given a token
     •Token depends on
           •Node’s position
           •Range of data it is responsible for
     •Column Family partitioned based on row key
     •For given row key value, ring is walked clockwise
     until token is within range
     •2 High Level Partitioning Schemes:
                - Random Partitioner
                - Ordered Partitioner
     •Random Partitioner uses consistent hashing
     •Ordered Partitioner ensures sorted order.




            6
Data Partitioning (Contd)




   Partitioning in Multi-Data Center Clusters




                           7
Replication
         •Replication – process of storing copies of data

         •Replication Strategy

              •Number of Replicas

              •Distro of replicas over the nodes

         •Relies on cluster configured Snitch

         •SimpleStrategy - default

         •NetworkTopologyStrategy

         •Takes rack, data center into consideration




     8
Snitches…
    •The snitch is a configurable component of cluster

    •Defines how the nodes are grouped together

    •Types of snitches:


            •SimpleSnitch

            •BriskSimpleSnitch

            •RackInferringSnitch

            •PropertyFileSnitch

            •EC2Snitch

            •Dynamic Snitching



        9
Snitches…(contd)




         10
Why Use Cassandra?
•Very High Volume writes/reads
•All writes HAVE to succeed
•Horizontal scalability
•Commodity HW
•Integration with Hadoop/Hbase/HIVE
•SQL Like usage
•No Single point of failure
•Powerful dynamic Schema data model
  •Maximum flexibility
  •Performance at scale


                          11
Some Well known Current Customers
WebEx              Ooyala
Clearspring        Openwave
Cloudkick          OpenX
Cloudtalk          Plaxo
connex.io          Rackspace
Constant Contact   Reddit
Digg               SimpleGeo
Facebook           SoundCloud
IBM                Twitter
Netflix            Walmart Labs
Formspring         Yakaz
Mahalo.com

                   12
Limitations
Be aware of these differences when you move
  from a relational database to Cassandra.
• No transactions,
• No JOINs
• No foreign keys and keys are immutable
• Keys have to be unique
• Failed operations may leave changes
• Searching is complicated
• Super columns and order preserving partitioners are
  discouraged
• Healing from failure is manual
• It remembers deletes (until v0.8, at least)
                               13
DEMO
Questions?

Más contenido relacionado

La actualidad más candente

Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 
An Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupAn Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupCarlos Juzarte Rolo
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed DatabaseEric Evans
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...Citrix
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architectureT Jake Luciani
 
Apache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranApache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranData Con LA
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandraScyllaDB
 
NewSQL overview, Feb 2015
NewSQL overview, Feb 2015NewSQL overview, Feb 2015
NewSQL overview, Feb 2015Ivan Glushkov
 
MySQL HA Sharding-Fabric
MySQL HA Sharding-FabricMySQL HA Sharding-Fabric
MySQL HA Sharding-FabricAbdul Manaf
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0Ted Wennmark
 
MySQL Storage Engines
MySQL Storage EnginesMySQL Storage Engines
MySQL Storage EnginesKarthik .P.R
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101Bernd Ocklin
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsOleg Magazov
 

La actualidad más candente (20)

Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
No sql
No sqlNo sql
No sql
 
An Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupAn Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User Group
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
Barcamp MySQL
Barcamp MySQLBarcamp MySQL
Barcamp MySQL
 
Apache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranApache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda Moran
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
NewSQL overview, Feb 2015
NewSQL overview, Feb 2015NewSQL overview, Feb 2015
NewSQL overview, Feb 2015
 
MySQL HA Sharding-Fabric
MySQL HA Sharding-FabricMySQL HA Sharding-Fabric
MySQL HA Sharding-Fabric
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
 
MySQL Storage Engines
MySQL Storage EnginesMySQL Storage Engines
MySQL Storage Engines
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 

Similar a Cassandra tech talk

Cassandra
CassandraCassandra
Cassandraexsuns
 
CASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMSCASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMSVipul Thakur
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2ScribbleLive
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
The MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits inThe MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits inSakari Keskitalo
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanSakari Keskitalo
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanSakari Keskitalo
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical dataOleksandr Semenov
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentationSergey Enin
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationCeph Community
 

Similar a Cassandra tech talk (20)

Cassandra
CassandraCassandra
Cassandra
 
CASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMSCASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMS
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
The MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits inThe MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits in
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
 
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentation
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph Replication
 

Último

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Último (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Cassandra tech talk

  • 1. UDA SE Tech Talk Sep 20 2012 CASSANDRA
  • 2. Agenda •What is Cassandra? •Architecture •Why use Cassandra? •Current Consumers/use cases? •Limitations •Demo 2
  • 4. What is Cassandra? -Free, Open Source, Distributed database -Written by 2 Facebook Engineers -Hybrid of -BigTable from Google -DynamoDB from Amazon -For Structured, Semi-structured, Unstructured Data -Designed to scale across commodity servers -Assures AP out of CAP -(Consistency, Availability, Partition Tolerance) 4
  • 5. Architectural Overview •Independent nodes form cluster •All nodes peers •Gossip protocol to discover/connect nodes •Gossip process runs every second •Nodes exchanges state mesgs with max 3 nodes •Nodes exchange info about themselves/Others •Seed Nodes have cluster info in cassandra.yaml file •All nodes have same seed nodes in their config file •Nodes remember all gossip info since last restart 5
  • 6. Data Partitioning •Should be decided when setting up •Total Data managed by Cassandra like a Ring •Ring is divided into Ranges •Each node responsible for one or more •Before a node joins it is given a token •Token depends on •Node’s position •Range of data it is responsible for •Column Family partitioned based on row key •For given row key value, ring is walked clockwise until token is within range •2 High Level Partitioning Schemes: - Random Partitioner - Ordered Partitioner •Random Partitioner uses consistent hashing •Ordered Partitioner ensures sorted order. 6
  • 7. Data Partitioning (Contd) Partitioning in Multi-Data Center Clusters 7
  • 8. Replication •Replication – process of storing copies of data •Replication Strategy •Number of Replicas •Distro of replicas over the nodes •Relies on cluster configured Snitch •SimpleStrategy - default •NetworkTopologyStrategy •Takes rack, data center into consideration 8
  • 9. Snitches… •The snitch is a configurable component of cluster •Defines how the nodes are grouped together •Types of snitches: •SimpleSnitch •BriskSimpleSnitch •RackInferringSnitch •PropertyFileSnitch •EC2Snitch •Dynamic Snitching 9
  • 11. Why Use Cassandra? •Very High Volume writes/reads •All writes HAVE to succeed •Horizontal scalability •Commodity HW •Integration with Hadoop/Hbase/HIVE •SQL Like usage •No Single point of failure •Powerful dynamic Schema data model •Maximum flexibility •Performance at scale 11
  • 12. Some Well known Current Customers WebEx Ooyala Clearspring Openwave Cloudkick OpenX Cloudtalk Plaxo connex.io Rackspace Constant Contact Reddit Digg SimpleGeo Facebook SoundCloud IBM Twitter Netflix Walmart Labs Formspring Yakaz Mahalo.com 12
  • 13. Limitations Be aware of these differences when you move from a relational database to Cassandra. • No transactions, • No JOINs • No foreign keys and keys are immutable • Keys have to be unique • Failed operations may leave changes • Searching is complicated • Super columns and order preserving partitioners are discouraged • Healing from failure is manual • It remembers deletes (until v0.8, at least) 13
  • 14. DEMO