SlideShare una empresa de Scribd logo
1 de 54
Descargar para leer sin conexión
Cassandra FTW
                           Andrew Byde
                           Principal Scientist




Monday, 15 August 2011
Menu

                   • Introduction
                   • Data model + storage architecture
                   • Partitioning + replication
                   • Consistency
                   • De-normalisation

Monday, 15 August 2011
History + design




Monday, 15 August 2011
History

                   • 2007: Started at Facebook for inbox search
                   • July 2008: Open sourced by Facebook
                   • March 2009: Apache Incubator
                   • February 2010: Apache top-level project
                   • May 2011:Version 0.8
Monday, 15 August 2011
What it’s good for

                   • Horizontal scalability
                   • No single-point of failure
                   • Multi-data centre support
                   • Very high write workloads
                   • Tuneable consistency

Monday, 15 August 2011
What it’s not so good for

                   • Transactions
                   • Read heavy workloads
                   • Low latency applications
                         •   compared to in-memory dbs




Monday, 15 August 2011
Data model




Monday, 15 August 2011
Keyspaces and Column Families
                     SQL                                            Cassandra

           Database                 row/key col_1    col_2
                                                                     Keyspace
                                       row/key col_1     col_1
                                            row/  col_1    col_1


                Table                                              Column Family



                           Keyspaces & CFs have different
                            sets of configuration settings
Monday, 15 August 2011
Column Family

                         key: {
                            column: value,
                            column: value,
                            ...
                          }



Monday, 15 August 2011
Rows and columns
                         col1   col2   col3   col4   col5   col6   col7
                 row1            x                    x      x
                 row2     x      x      x      x      x
                 row3            x      x             x      x      x
                 row4            x      x      x             x
                 row5            x             x      x      x
                 row6            x
                 row7     x      x             x



Monday, 15 August 2011
Reads
               • get
               • get_slice          One row, some cols
                • name predicate
                • slice range
               • multiget_slice     Multiple rows
               • get_range_slices
Monday, 15 August 2011
get
                         col1   col2   col3   col4   col5   col6   col7
                 row1            x                    x      x
                 row2     x      x      x      x      x
                 row3            x      x             x      x      x
                 row4            x      x      x             x
                 row5            x             x      x      x
                 row6            x
                 row7     x      x             x



Monday, 15 August 2011
get_slice: name predicate
                         col1   col2   col3   col4   col5   col6   col7
                 row1            x                    x      x
                 row2     x      x      x      x      x
                 row3            x      x             x      x      x
                 row4            x      x      x             x
                 row5            x             x      x      x
                 row6            x
                 row7     x      x             x



Monday, 15 August 2011
get_slice: slice range
                          col1   col2   col3   col4   col5   col6   col7
                 row1             x                    x      x
                 row2      x      x      x      x      x
                 row3      x      x      x             x      x      x
                 row4             x      x      x             x
                 row5             x             x      x      x
                 row6             x
                 row7      x      x             x



Monday, 15 August 2011
multiget_slice: name
                              predicate
                          col1   col2   col3   col4   col5   col6   col7
                 row1             x                    x      x
                 row2      x      x      x      x      x
                 row3             x      x             x      x      x
                 row4             x      x      x             x
                 row5             x             x      x      x
                 row6             x
                 row7      x      x             x


Monday, 15 August 2011
get_range_slices: slice range
                         col1   col2   col3   col4   col5   col6   col7
                 row1            x                    x      x
                 row2     x      x      x      x      x
                 row3            x      x             x      x      x
                 row4            x      x      x             x
                 row5            x             x      x      x
                 row6            x
                 row7     x      x             x



Monday, 15 August 2011
Storage
                         architecture



Monday, 15 August 2011
Data Layout
                                     writes
                                        key-value insert
            on-disk
        un-ordered
        commit log                                                in-memory
        ...                                                     (key,col)-sorted
                                                                   memtable
                                            flush
                             on-disk        01001101110101000   01001101110101000



                         (key,col)-sorted                                           ...
                             SSTables
Monday, 15 August 2011
Data Layout
                            SSTables


                             SSTable
      Bloom Filter            01001101110101000



         Index
          Data




Monday, 15 August 2011
Data Layout
                                       reads
                                              ?



                          01001101110101000       01001101110101000   010011011101010001111010101001




Monday, 15 August 2011
Data Layout
                                       reads
                                              ?


                                    X             X
                          01001101110101000       01001101110101000   010011011101010001111010101001




Monday, 15 August 2011
Distribution:

                         Partitioning +
                          Replication


Monday, 15 August 2011
Partitioning + Replication



           (k, v)
                         ?




Monday, 15 August 2011
Partitioning + Replication
                   • Partitioning data on to nodes
                    • load balancing
                    • row-based
                   • Replication
                    • to protect against failure
                    • better availability
Monday, 15 August 2011
Partitioning
                   • Random: take hash of row key
                         •   good for load balancing

                         •   bad for range queries

                   • Ordered: subdivide key space
                         •   bad for load balancing

                         •   good for range queries

                   • Or build your own...
Monday, 15 August 2011
Simple Replication



           (k, v)




                           Nodes arranged on a ‘ring’
Monday, 15 August 2011
Simple Replication
                                     Primary location




           (k, v)




                           Nodes arranged on a ‘ring’
Monday, 15 August 2011
Simple Replication
                                     Primary location




           (k, v)                                   Extra copies
                                                   are successors
                                                     on the ring


                           Nodes arranged on a ‘ring’
Monday, 15 August 2011
Topology-aware
                                  Replication
                   • Snitch : node IP          (DataCenter, rack)

                   • EC2Snitch
                         •   Region   DC; availability_zone   rack

                   • PropertyFileSnitch
                         •   Configured from a file



Monday, 15 August 2011
Topology-aware
                           Replication
                                        DC 1     DC 2




                          (k, v)


                                   r1      r2   r1   r2


Monday, 15 August 2011
Topology-aware
                           Replication
                                        DC 1     DC 2




                          (k, v)


                                   r1      r2   r1   r2


Monday, 15 August 2011
Topology-aware
                           Replication
                                        DC 1     DC 2
       extra copies
       to different
       data center

                          (k, v)


                                   r1      r2   r1   r2


Monday, 15 August 2011
Topology-aware
                           Replication
                                        DC 1     DC 2
       extra copies
       to different
       data center

                          (k, v)

      spread across
      racks within a               r1      r2   r1   r2
       data center

Monday, 15 August 2011
Distribution:

                         Consistency



Monday, 15 August 2011
Consistency Level

                   • How many replicas must respond in order to
                         declare success
                   • W/N must succeed for write to succeed
                         •   write with client-generated timestamp

                   • R/N must succeed for read to succeed
                         •   return most recent, by timestamp


Monday, 15 August 2011
Consistency Level

                   • 1, 2, 3 responses
                   • Quorum (more than half)
                   • Quorum in local data center
                   • Quorum in each data center

Monday, 15 August 2011
Maintaining consistency

                   • Read repair
                   • Hinted handoff
                   • Anti-entropy


Monday, 15 August 2011
Read repair
                   • If the replicas disagree on read, send most
                         recent data back

                                            n1

                          read k?           n2

                                            n3


Monday, 15 August 2011
Read repair
                   • If the replicas disagree on read, send most
                         recent data back

                                            n1   v, t1

                          read k?           n2   not found!

                                            n3   v’, t2


Monday, 15 August 2011
Read repair
                   • If the replicas disagree on read, send most
                         recent data back

                                            n1   v, t1

                                            n2   not found!

                                            n3   v’, t2


Monday, 15 August 2011
Read repair
                   • If the replicas disagree on read, send most
                         recent data back

                                            n1

                                            n2

                                            n3   write (k, v’, t2)


Monday, 15 August 2011
Hinted handoff

                   • When a node is unavailable
                   • Writes can be written to any node as a hint
                   • Delivered when the node comes back
                         online




Monday, 15 August 2011
Anti-entropy

                   • Equivalent to ‘read repair all’
                   • Requires reading all data (woah)
                         •   (Although only hashes are sent to calculate diffs)

                   •          Manual process




Monday, 15 August 2011
De-normalisation




Monday, 15 August 2011
De-normalisation

                   • Disk space is much cheaper than disk seeks
                   • Read at 100 MB/s, seek at 100 IO/s
                   • => copy data to avoid seeks


Monday, 15 August 2011
Inbox
                                         user2

                         user1   msg1
                                         user3
                                 msg2


                                 msg3    user4
                                  ...




Monday, 15 August 2011
Data-centric model
                         m1: {
                           sender: user1
                           content: “Mary had a little lamb”
                           recipients: user2, user3
                         }


               • but how to do ‘recipients’ for Inbox?
               • one-to-many modelled by a join table

Monday, 15 August 2011
To join
          m1: {                                        user2: {
            sender: user1                                m1: true
            subject: “A rhyme”
            content: “Mary had a little lamb”          }
          }                                            user3: {
          m2: {
            sender: user1                                m1: true
            subject: “colours”                           m2: true
            content: “Its fleece was white as snow”
          }                                            }
          m3: {                                        user4: {
            sender: user1
            subject: “loyalty”                           m2: true
            content: “And everywhere that Mary went”     m3: true
          }
                                                       }


Monday, 15 August 2011
.. or not to join
                 • Joins are expensive, so de-normalise to trade
                         off space for time
                 • We can have lots of columns, so think BIG:
                 • Make message id a time-typed super-column.
                 • This makes get_slice an efficient way of
                         searching for messages in a time window



Monday, 15 August 2011
Super Column Family
                         user2: {
                           m1: {
                             sender: user1
                             subject: “A rhyme”
                           }
                         }
                         user3: {
                           m1: {
                             sender: user1
                             subject: “A rhyme”
                           }
                           m2: {
                             sender: user1
                             subject: “colours”
                           }
                         }
                         ...



Monday, 15 August 2011
De-normalisation +
                               Cassandra
                 • have to write a copy of the record for each
                         recipient ... but writes are very cheap
                 • get_slice fetches columns for a particular
                         row, so gets received messages for a user
                 • on-disk column order is optimal for this
                         query



Monday, 15 August 2011
Conclusion




Monday, 15 August 2011
What it’s good for

                   • Horizontal scalability
                   • No single-point of failure
                   • Multi-data centre support
                   • Very high write workloads
                   • Tuneable consistency

Monday, 15 August 2011
Q?




Monday, 15 August 2011

Más contenido relacionado

Más de DATAVERSITY

Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...DATAVERSITY
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceDATAVERSITY
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsDATAVERSITY
 
Including All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsIncluding All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsDATAVERSITY
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelDATAVERSITY
 
What’s in Your Data Warehouse?
What’s in Your Data Warehouse?What’s in Your Data Warehouse?
What’s in Your Data Warehouse?DATAVERSITY
 

Más de DATAVERSITY (20)

Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business Intelligence
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and Roadmaps
 
Including All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsIncluding All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and Analytics
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-Model
 
What’s in Your Data Warehouse?
What’s in Your Data Warehouse?What’s in Your Data Warehouse?
What’s in Your Data Warehouse?
 

Último

Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetDenis Gagné
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Roomdivyansh0kumar0
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...noida100girls
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsMichael W. Hawkins
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in managementchhavia330
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 

Último (20)

Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael Hawkins
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in management
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 

Cassandra: Two data centers and great performance

  • 1. Cassandra FTW Andrew Byde Principal Scientist Monday, 15 August 2011
  • 2. Menu • Introduction • Data model + storage architecture • Partitioning + replication • Consistency • De-normalisation Monday, 15 August 2011
  • 3. History + design Monday, 15 August 2011
  • 4. History • 2007: Started at Facebook for inbox search • July 2008: Open sourced by Facebook • March 2009: Apache Incubator • February 2010: Apache top-level project • May 2011:Version 0.8 Monday, 15 August 2011
  • 5. What it’s good for • Horizontal scalability • No single-point of failure • Multi-data centre support • Very high write workloads • Tuneable consistency Monday, 15 August 2011
  • 6. What it’s not so good for • Transactions • Read heavy workloads • Low latency applications • compared to in-memory dbs Monday, 15 August 2011
  • 7. Data model Monday, 15 August 2011
  • 8. Keyspaces and Column Families SQL Cassandra Database row/key col_1 col_2 Keyspace row/key col_1 col_1 row/ col_1 col_1 Table Column Family Keyspaces & CFs have different sets of configuration settings Monday, 15 August 2011
  • 9. Column Family key: { column: value, column: value, ... } Monday, 15 August 2011
  • 10. Rows and columns col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x Monday, 15 August 2011
  • 11. Reads • get • get_slice One row, some cols • name predicate • slice range • multiget_slice Multiple rows • get_range_slices Monday, 15 August 2011
  • 12. get col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x Monday, 15 August 2011
  • 13. get_slice: name predicate col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x Monday, 15 August 2011
  • 14. get_slice: slice range col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x x row4 x x x x row5 x x x x row6 x row7 x x x Monday, 15 August 2011
  • 15. multiget_slice: name predicate col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x Monday, 15 August 2011
  • 16. get_range_slices: slice range col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x Monday, 15 August 2011
  • 17. Storage architecture Monday, 15 August 2011
  • 18. Data Layout writes key-value insert on-disk un-ordered commit log in-memory ... (key,col)-sorted memtable flush on-disk 01001101110101000 01001101110101000 (key,col)-sorted ... SSTables Monday, 15 August 2011
  • 19. Data Layout SSTables SSTable Bloom Filter 01001101110101000 Index Data Monday, 15 August 2011
  • 20. Data Layout reads ? 01001101110101000 01001101110101000 010011011101010001111010101001 Monday, 15 August 2011
  • 21. Data Layout reads ? X X 01001101110101000 01001101110101000 010011011101010001111010101001 Monday, 15 August 2011
  • 22. Distribution: Partitioning + Replication Monday, 15 August 2011
  • 23. Partitioning + Replication (k, v) ? Monday, 15 August 2011
  • 24. Partitioning + Replication • Partitioning data on to nodes • load balancing • row-based • Replication • to protect against failure • better availability Monday, 15 August 2011
  • 25. Partitioning • Random: take hash of row key • good for load balancing • bad for range queries • Ordered: subdivide key space • bad for load balancing • good for range queries • Or build your own... Monday, 15 August 2011
  • 26. Simple Replication (k, v) Nodes arranged on a ‘ring’ Monday, 15 August 2011
  • 27. Simple Replication Primary location (k, v) Nodes arranged on a ‘ring’ Monday, 15 August 2011
  • 28. Simple Replication Primary location (k, v) Extra copies are successors on the ring Nodes arranged on a ‘ring’ Monday, 15 August 2011
  • 29. Topology-aware Replication • Snitch : node IP (DataCenter, rack) • EC2Snitch • Region DC; availability_zone rack • PropertyFileSnitch • Configured from a file Monday, 15 August 2011
  • 30. Topology-aware Replication DC 1 DC 2 (k, v) r1 r2 r1 r2 Monday, 15 August 2011
  • 31. Topology-aware Replication DC 1 DC 2 (k, v) r1 r2 r1 r2 Monday, 15 August 2011
  • 32. Topology-aware Replication DC 1 DC 2 extra copies to different data center (k, v) r1 r2 r1 r2 Monday, 15 August 2011
  • 33. Topology-aware Replication DC 1 DC 2 extra copies to different data center (k, v) spread across racks within a r1 r2 r1 r2 data center Monday, 15 August 2011
  • 34. Distribution: Consistency Monday, 15 August 2011
  • 35. Consistency Level • How many replicas must respond in order to declare success • W/N must succeed for write to succeed • write with client-generated timestamp • R/N must succeed for read to succeed • return most recent, by timestamp Monday, 15 August 2011
  • 36. Consistency Level • 1, 2, 3 responses • Quorum (more than half) • Quorum in local data center • Quorum in each data center Monday, 15 August 2011
  • 37. Maintaining consistency • Read repair • Hinted handoff • Anti-entropy Monday, 15 August 2011
  • 38. Read repair • If the replicas disagree on read, send most recent data back n1 read k? n2 n3 Monday, 15 August 2011
  • 39. Read repair • If the replicas disagree on read, send most recent data back n1 v, t1 read k? n2 not found! n3 v’, t2 Monday, 15 August 2011
  • 40. Read repair • If the replicas disagree on read, send most recent data back n1 v, t1 n2 not found! n3 v’, t2 Monday, 15 August 2011
  • 41. Read repair • If the replicas disagree on read, send most recent data back n1 n2 n3 write (k, v’, t2) Monday, 15 August 2011
  • 42. Hinted handoff • When a node is unavailable • Writes can be written to any node as a hint • Delivered when the node comes back online Monday, 15 August 2011
  • 43. Anti-entropy • Equivalent to ‘read repair all’ • Requires reading all data (woah) • (Although only hashes are sent to calculate diffs) • Manual process Monday, 15 August 2011
  • 45. De-normalisation • Disk space is much cheaper than disk seeks • Read at 100 MB/s, seek at 100 IO/s • => copy data to avoid seeks Monday, 15 August 2011
  • 46. Inbox user2 user1 msg1 user3 msg2 msg3 user4 ... Monday, 15 August 2011
  • 47. Data-centric model m1: { sender: user1 content: “Mary had a little lamb” recipients: user2, user3 } • but how to do ‘recipients’ for Inbox? • one-to-many modelled by a join table Monday, 15 August 2011
  • 48. To join m1: { user2: { sender: user1 m1: true subject: “A rhyme” content: “Mary had a little lamb” } } user3: { m2: { sender: user1 m1: true subject: “colours” m2: true content: “Its fleece was white as snow” } } m3: { user4: { sender: user1 subject: “loyalty” m2: true content: “And everywhere that Mary went” m3: true } } Monday, 15 August 2011
  • 49. .. or not to join • Joins are expensive, so de-normalise to trade off space for time • We can have lots of columns, so think BIG: • Make message id a time-typed super-column. • This makes get_slice an efficient way of searching for messages in a time window Monday, 15 August 2011
  • 50. Super Column Family user2: { m1: { sender: user1 subject: “A rhyme” } } user3: { m1: { sender: user1 subject: “A rhyme” } m2: { sender: user1 subject: “colours” } } ... Monday, 15 August 2011
  • 51. De-normalisation + Cassandra • have to write a copy of the record for each recipient ... but writes are very cheap • get_slice fetches columns for a particular row, so gets received messages for a user • on-disk column order is optimal for this query Monday, 15 August 2011
  • 53. What it’s good for • Horizontal scalability • No single-point of failure • Multi-data centre support • Very high write workloads • Tuneable consistency Monday, 15 August 2011