SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Cassandra
Structured Storage System over a P2P Network




          Avinash Lakshman, Prashant Malik
Why Cassandra?
• Lots of data
  – Copies of messages, reverse indices of
    messages, per user data.
• Many incoming requests resulting in a lot
  of random reads and random writes.
• No existing production ready solutions in
  the market meet these requirements.
Design Goals
• High availability
• Eventual consistency
  – trade-off strong consistency in favor of high
    availability
• Incremental scalability
• Optimistic Replication
• “Knobs” to tune tradeoffs between consistency,
  durability and latency
• Low total cost of ownership
• Minimal administration
Data Model                                                       Columns are
                                                                                     added and
                              ColumnFamily1 Name : MailList                           modified
                                                                         Type : Simple Sort : Name
 KEY                          Name : tid1         Name : tid2           Name : tid3 dynamically
                                                                                        Name : tid4
                              Value : <Binary>    Value : <Binary>      Value : <Binary>        Value : <Binary>
                              TimeStamp : t1      TimeStamp : t2        TimeStamp : t3          TimeStamp : t4




                        ColumnFamily2            Name : WordList            Type : Super            Sort : Time
Column Families         Name : aloha                                                     Name : dude
  are declared
                         C1             C2             C3          C4                      C2             C6
     upfront
 SuperColumns            V1             V2             V3          V4                      V2             V6

 are added and           T1             T2             T3          T4                      T2             T6

    modified
Columns are
  dynamically
 added and
  modified        ColumnFamily3 Name : System                Type : Super       Sort : Name
dynamically       Name : hint1         Name : hint2         Name : hint3       Name : hint4
                  <Column List>        <Column List>        <Column List>      <Column List>
Write Operations
• A client issues a write request to a random
  node in the Cassandra cluster.
• The “Partitioner” determines the nodes
  responsible for the data.
• Locally, write operations are logged and
  then applied to an in-memory version.
• Commit log is stored on a dedicated disk
  local to the machine.
Write cont’d
Key (CF1 , CF2 , CF3)                                                         • Data size
                                                                              • Number of Objects
                                   Memtable ( CF1)
                                                                              • Lifetime

 Commit Log                        Memtable ( CF2)
 Binary serialized
 Key ( CF1 , CF2 , CF3 )           Memtable ( CF2)

                                                                         Data file on disk
                                               <Key name><Size of key Data><Index of columns/supercolumns><
                                               Serialized column family>
                           K128 Offset         ---
                                               ---
                           K256 Offset          BLOCK Index <Key Name> Offset, <Key Name> Offset
     Dedicated Disk
                           K384 Offset         ---
                                               ---

                           Bloom Filter        <Key name><Size of key Data><Index of columns/supercolumns><
                                               Serialized column family>

                           (Index in memory)
Compactions
                                                     K2 < Serialized data >             K4 < Serialized data >
              K1 < Serialized data >
                                                     K10 < Serialized data >            K5 < Serialized data >
              K2 < Serialized data >
                                                     K30 < Serialized data >            K10 < Serialized data >
              K3 < Serialized data >



                                   DELETED
                                                     --                                 --
              --
                                        Sorted       --                        Sorted   --
Sorted        --
                                                     --                                 --
              --




                                            MERGE SORT


   Index File
                                                   K1 < Serialized data >
          Loaded in memory                         K2 < Serialized data >
                                                   K3 < Serialized data >
         K1 Offset
                                                   K4 < Serialized data >
         K5 Offset                     Sorted
                                                   K5 < Serialized data >
         K30 Offset
                                                   K10 < Serialized data >
         Bloom Filter
                                                   K30 < Serialized data >

                                                 Data File
Write Properties
•   No locks in the critical path
•   Sequential disk access
•   Behaves like a write back Cache
•   Append support without read ahead
•   Atomicity guarantee for a key
• “Always Writable”
    – accept writes during failure scenarios
Read
                         Client


                  Query       Result

                       Cassandra Cluster


          Closest replica     Result                   Read repair if
                                                       digests differ
                        Replica A


                       Digest Query
Digest Response                            Digest Response


           Replica B                   Replica C
Partitioning And Replication
                          1 0           h(key1)
                    E
                                      A           N=3

          C

h(key2)                                    F


                                       B
              D

                          1/2
                                                        10
Cluster Membership and Failure
              Detection
•   Gossip protocol is used for cluster membership.
•   Super lightweight with mathematically provable properties.
•   State disseminated in O(logN) rounds where N is the number of
    nodes in the cluster.
•   Every T seconds each member increments its heartbeat counter and
    selects one other member to send its list to.
•   A member merges the list with its own list .
Accrual Failure Detector
•   Valuable for system management, replication, load balancing etc.
•   Defined as a failure detector that outputs a value, PHI, associated
    with each process.
•   Also known as Adaptive Failure detectors - designed to adapt to
    changing network conditions.
•   The value output, PHI, represents a suspicion level.
•   Applications set an appropriate threshold, trigger suspicions and
    perform appropriate actions.
•   In Cassandra the average time taken to detect a failure is 10-15
    seconds with the PHI threshold set at 5.
Properties of the Failure Detector
•   If a process p is faulty, the suspicion level
                  Φ(t)     ∞as t     ∞.
•   If a process p is faulty, there is a time after which Φ(t) is monotonic
    increasing.
•   A process p is correct      Φ(t) has an ub over an infinite execution.
•   If process p is correct, then for any time T,
                  Φ(t) = 0 for t >= T.
Implementation
•   PHI estimation is done in three phases
     – Inter arrival times for each member are stored in a sampling
       window.
     – Estimate the distribution of the above inter arrival times.
     – Gossip follows an exponential distribution.
     – The value of PHI is now computed as follows:
         • Φ(t) = -log10( P(tnow – tlast) )
                   where P(t) is the CDF of an exponential distribution. P(t) denotes the
                   probability that a heartbeat will arrive more than t units after the previous
                   one. P(t) = ( 1 – e-tλ )
The overall mechanism is described in the figure below.
Information Flow in the
    Implementation
Performance Benchmark
• Loading of data - limited by network
  bandwidth.
• Read performance for Inbox Search in
  production:

              Search Interactions Term Search
    Min       7.69 ms            7.78 ms
    Median    15.69 ms           18.27 ms
    Average   26.13 ms           44.41 ms
MySQL Comparison
• MySQL > 50 GB Data
  Writes Average : ~300 ms
  Reads Average : ~350 ms
• Cassandra > 50 GB Data
  Writes Average : 0.12 ms
  Reads Average : 15 ms
Lessons Learnt
• Add fancy features only when absolutely
  required.
• Many types of failures are possible.
• Big systems need proper systems-level
  monitoring.
• Value simple designs
Future work
•   Atomicity guarantees across multiple keys
•   Analysis support via Map/Reduce
•   Distributed transactions
•   Compression support
•   Granular security via ACL’s
Questions?

Más contenido relacionado

La actualidad más candente

Logging Last Resource Optimization for Distributed Transactions in Oracle We...
Logging Last Resource Optimization for Distributed Transactions in  Oracle We...Logging Last Resource Optimization for Distributed Transactions in  Oracle We...
Logging Last Resource Optimization for Distributed Transactions in Oracle We...Gera Shegalov
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Steve Loughran
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1jbellis
 
Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012robosonia Mar
 
Advanced Windows Debugging
Advanced Windows DebuggingAdvanced Windows Debugging
Advanced Windows DebuggingBala Subra
 
Postgre sql run book
Postgre sql run bookPostgre sql run book
Postgre sql run bookVasudeva Rao
 
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012Big Data Spain
 

La actualidad más candente (8)

Cheatsheet of msdos
Cheatsheet of msdosCheatsheet of msdos
Cheatsheet of msdos
 
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
Logging Last Resource Optimization for Distributed Transactions in  Oracle We...Logging Last Resource Optimization for Distributed Transactions in  Oracle We...
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
 
Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012
 
Advanced Windows Debugging
Advanced Windows DebuggingAdvanced Windows Debugging
Advanced Windows Debugging
 
Postgre sql run book
Postgre sql run bookPostgre sql run book
Postgre sql run book
 
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
 

Similar a Cassandra Structured Storage System over a P2P Network

Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionSchubert Zhang
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccsrisatish ambati
 
"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TERyosuke IWANAGA
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLYan Cui
 
Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Sid Anand
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectMorningstar Tech Talks
 
How nebula graph index works
How nebula graph index worksHow nebula graph index works
How nebula graph index worksNebula Graph
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorialmubarakss
 
Cassandra deep-dive @ NoSQLNow!
Cassandra deep-dive @ NoSQLNow!Cassandra deep-dive @ NoSQLNow!
Cassandra deep-dive @ NoSQLNow!Acunu
 
SQL Server Deep Dive, Denis Reznik
SQL Server Deep Dive, Denis ReznikSQL Server Deep Dive, Denis Reznik
SQL Server Deep Dive, Denis ReznikSigma Software
 
What’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackWhat’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackElasticsearch
 
2013 london advanced-replication
2013 london advanced-replication2013 london advanced-replication
2013 london advanced-replicationMarc Schwering
 
Oracle 12.2 sharded database management
Oracle 12.2 sharded database managementOracle 12.2 sharded database management
Oracle 12.2 sharded database managementLeyi (Kamus) Zhang
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraDataStax Academy
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestDuyhai Doan
 

Similar a Cassandra Structured Storage System over a P2P Network (20)

Cassandra NoSQL
Cassandra NoSQLCassandra NoSQL
Cassandra NoSQL
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1
 
Dbms &amp; oracle
Dbms &amp; oracleDbms &amp; oracle
Dbms &amp; oracle
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
How nebula graph index works
How nebula graph index worksHow nebula graph index works
How nebula graph index works
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorial
 
Cassandra deep-dive @ NoSQLNow!
Cassandra deep-dive @ NoSQLNow!Cassandra deep-dive @ NoSQLNow!
Cassandra deep-dive @ NoSQLNow!
 
SQL Server Deep Dive, Denis Reznik
SQL Server Deep Dive, Denis ReznikSQL Server Deep Dive, Denis Reznik
SQL Server Deep Dive, Denis Reznik
 
What’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackWhat’s Evolving in the Elastic Stack
What’s Evolving in the Elastic Stack
 
2013 london advanced-replication
2013 london advanced-replication2013 london advanced-replication
2013 london advanced-replication
 
Oracle 12.2 sharded database management
Oracle 12.2 sharded database managementOracle 12.2 sharded database management
Oracle 12.2 sharded database management
 
DBMS Chapter-3.ppsx
DBMS Chapter-3.ppsxDBMS Chapter-3.ppsx
DBMS Chapter-3.ppsx
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache Cassandra
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 

Más de elliando dias

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slideselliando dias
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScriptelliando dias
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structureselliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de containerelliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Librarieselliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Webelliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduinoelliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorceryelliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Designelliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makeselliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Studyelliando dias
 

Más de elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Cassandra Structured Storage System over a P2P Network

  • 1. Cassandra Structured Storage System over a P2P Network Avinash Lakshman, Prashant Malik
  • 2. Why Cassandra? • Lots of data – Copies of messages, reverse indices of messages, per user data. • Many incoming requests resulting in a lot of random reads and random writes. • No existing production ready solutions in the market meet these requirements.
  • 3. Design Goals • High availability • Eventual consistency – trade-off strong consistency in favor of high availability • Incremental scalability • Optimistic Replication • “Knobs” to tune tradeoffs between consistency, durability and latency • Low total cost of ownership • Minimal administration
  • 4. Data Model Columns are added and ColumnFamily1 Name : MailList modified Type : Simple Sort : Name KEY Name : tid1 Name : tid2 Name : tid3 dynamically Name : tid4 Value : <Binary> Value : <Binary> Value : <Binary> Value : <Binary> TimeStamp : t1 TimeStamp : t2 TimeStamp : t3 TimeStamp : t4 ColumnFamily2 Name : WordList Type : Super Sort : Time Column Families Name : aloha Name : dude are declared C1 C2 C3 C4 C2 C6 upfront SuperColumns V1 V2 V3 V4 V2 V6 are added and T1 T2 T3 T4 T2 T6 modified Columns are dynamically added and modified ColumnFamily3 Name : System Type : Super Sort : Name dynamically Name : hint1 Name : hint2 Name : hint3 Name : hint4 <Column List> <Column List> <Column List> <Column List>
  • 5. Write Operations • A client issues a write request to a random node in the Cassandra cluster. • The “Partitioner” determines the nodes responsible for the data. • Locally, write operations are logged and then applied to an in-memory version. • Commit log is stored on a dedicated disk local to the machine.
  • 6. Write cont’d Key (CF1 , CF2 , CF3) • Data size • Number of Objects Memtable ( CF1) • Lifetime Commit Log Memtable ( CF2) Binary serialized Key ( CF1 , CF2 , CF3 ) Memtable ( CF2) Data file on disk <Key name><Size of key Data><Index of columns/supercolumns>< Serialized column family> K128 Offset --- --- K256 Offset BLOCK Index <Key Name> Offset, <Key Name> Offset Dedicated Disk K384 Offset --- --- Bloom Filter <Key name><Size of key Data><Index of columns/supercolumns>< Serialized column family> (Index in memory)
  • 7. Compactions K2 < Serialized data > K4 < Serialized data > K1 < Serialized data > K10 < Serialized data > K5 < Serialized data > K2 < Serialized data > K30 < Serialized data > K10 < Serialized data > K3 < Serialized data > DELETED -- -- -- Sorted -- Sorted -- Sorted -- -- -- -- MERGE SORT Index File K1 < Serialized data > Loaded in memory K2 < Serialized data > K3 < Serialized data > K1 Offset K4 < Serialized data > K5 Offset Sorted K5 < Serialized data > K30 Offset K10 < Serialized data > Bloom Filter K30 < Serialized data > Data File
  • 8. Write Properties • No locks in the critical path • Sequential disk access • Behaves like a write back Cache • Append support without read ahead • Atomicity guarantee for a key • “Always Writable” – accept writes during failure scenarios
  • 9. Read Client Query Result Cassandra Cluster Closest replica Result Read repair if digests differ Replica A Digest Query Digest Response Digest Response Replica B Replica C
  • 10. Partitioning And Replication 1 0 h(key1) E A N=3 C h(key2) F B D 1/2 10
  • 11. Cluster Membership and Failure Detection • Gossip protocol is used for cluster membership. • Super lightweight with mathematically provable properties. • State disseminated in O(logN) rounds where N is the number of nodes in the cluster. • Every T seconds each member increments its heartbeat counter and selects one other member to send its list to. • A member merges the list with its own list .
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. Accrual Failure Detector • Valuable for system management, replication, load balancing etc. • Defined as a failure detector that outputs a value, PHI, associated with each process. • Also known as Adaptive Failure detectors - designed to adapt to changing network conditions. • The value output, PHI, represents a suspicion level. • Applications set an appropriate threshold, trigger suspicions and perform appropriate actions. • In Cassandra the average time taken to detect a failure is 10-15 seconds with the PHI threshold set at 5.
  • 17. Properties of the Failure Detector • If a process p is faulty, the suspicion level Φ(t) ∞as t ∞. • If a process p is faulty, there is a time after which Φ(t) is monotonic increasing. • A process p is correct Φ(t) has an ub over an infinite execution. • If process p is correct, then for any time T, Φ(t) = 0 for t >= T.
  • 18. Implementation • PHI estimation is done in three phases – Inter arrival times for each member are stored in a sampling window. – Estimate the distribution of the above inter arrival times. – Gossip follows an exponential distribution. – The value of PHI is now computed as follows: • Φ(t) = -log10( P(tnow – tlast) ) where P(t) is the CDF of an exponential distribution. P(t) denotes the probability that a heartbeat will arrive more than t units after the previous one. P(t) = ( 1 – e-tλ ) The overall mechanism is described in the figure below.
  • 19. Information Flow in the Implementation
  • 20. Performance Benchmark • Loading of data - limited by network bandwidth. • Read performance for Inbox Search in production: Search Interactions Term Search Min 7.69 ms 7.78 ms Median 15.69 ms 18.27 ms Average 26.13 ms 44.41 ms
  • 21. MySQL Comparison • MySQL > 50 GB Data Writes Average : ~300 ms Reads Average : ~350 ms • Cassandra > 50 GB Data Writes Average : 0.12 ms Reads Average : 15 ms
  • 22. Lessons Learnt • Add fancy features only when absolutely required. • Many types of failures are possible. • Big systems need proper systems-level monitoring. • Value simple designs
  • 23. Future work • Atomicity guarantees across multiple keys • Analysis support via Map/Reduce • Distributed transactions • Compression support • Granular security via ACL’s