SlideShare una empresa de Scribd logo
1 de 53
Descargar para leer sin conexión
Intro to

Cassandra
  Tyler Hobbs
History


Dynamo                        BigTable
(clustering)                  (data model)


               Inbox search




                Cassandra
Users
Clustering

    Every node plays the same role
    – No masters, slaves, or special nodes
    – No single point of failure
Consistent Hashing

           0

     50          10




     40          20

           30
Consistent Hashing
                      Key: “www.google.com”
           0

     50          10




     40          20

           30
Consistent Hashing
                      Key: “www.google.com”
           0
                      md5(“www.google.com”)
     50          10

                               14

     40          20

           30
Consistent Hashing
                      Key: “www.google.com”
           0
                      md5(“www.google.com”)
     50          10

                               14

     40          20

           30
Consistent Hashing
                      Key: “www.google.com”
           0
                      md5(“www.google.com”)
     50          10

                               14

     40          20

           30
Consistent Hashing
                        Key: “www.google.com”
           0
                        md5(“www.google.com”)
     50          10

                                   14

     40          20

           30
                Replication Factor = 3
Clustering

    Client can talk to any node
Scaling

RF = 2             0


              50        10

The node at
50 owns the
red portion             20

                   30
Scaling

RF = 2               0


                50        10



   Add a new    40        20
   node at 40
                     30
Scaling

RF = 2               0


                50        10



   Add a new    40        20
   node at 40
                     30
Node Failures

RF = 2               0


                50        10

   Replicas
                40        20

                     30
Node Failures

RF = 2               0


                50        10

   Replicas
                40        20

                     30
Node Failures

RF = 2               0


                50        10




                40        20

                     30
Consistency, Availability

    Consistency
    – Can I read stale data?

    Availability
    – Can I write/read at all?

    Tunable Consistency
Consistency

    N = Total number of replicas

    R = Number of replicas read from
    – (before the response is returned)

    W = Number of replicas written to
    – (before the write is considered a success)
Consistency

    N = Total number of replicas

    R = Number of replicas read from
    – (before the response is returned)

    W = Number of replicas written to
    – (before the write is considered a success)


    W + R > N gives strong consistency
Consistency
 W + R > N gives strong consistency

 N=3
 W=2
 R=2

 2 + 2 > 3 ==> strongly consistent
Consistency
 W + R > N gives strong consistency

 N=3
 W=2
 R=2

 2 + 2 > 3 ==> strongly consistent

 Only 2 of the 3 replicas must be
 available.
Consistency

    Tunable Consistency
    – Specify N (Replication Factor) per data set
    – Specify R, W per operation
Consistency

    Tunable Consistency
    – Specify N (Replication Factor) per data set
    – Specify R, W per operation
    – Quorum: N/2 + 1
       • R = W = Quorum
       • Strong consistency
       • Tolerate the loss of N – Quorum replicas
    – R, W can also be 1 or N
Availability

    Can tolerate the loss of:
    – N – R replicas for reads
    – N – W replicas for writes
CAP Theorem
During node or network failure:



          100%
                                          Not
                                          Possible

   Availability
                     Possible




                     Consistency   100%
CAP Theorem
During node or network failure:



          100%
                                                 Not
                            Ca                   Possible
                              ss
                                an
                                   dr
   Availability                       a
                     Possible




                     Consistency          100%
Clustering

    No single point of failure

    Replication that works

    Scales linearly
    – 2x nodes = 2x performance
       • For both reads and writes
    – Up to 100's of nodes
    – See “Netflix: 1 million writes/sec on AWS”

    Operationally simple

    Multi-Datacenter Replication
Data Model

    Comes from Google BigTable

    Goals
    – Commodity Hardware
       • Spinning disks
    – Handle data sets much larger than memory
       • Minimize disk seeks
    – High throughput
    – Low latency
    – Durable
Column Families

    Static
    – Object data
    – Similar to a table in a relational database

    Dynamic
    – Precomputed query results
    – Materialized views

    (these are just educational classifications)
Static Column Families
                   Users
   zznate    password: *    name: Nate


   driftx    password: *   name: Brandon


   thobbs    password: *    name: Tyler


   jbellis   password: *   name: Jonathan   site: riptano.com
Dynamic Column Families

    Rows
    – Each row has a unique primary key
    – Sorted list of (name, value) tuples
       • Like an ordered hash
    – The (name, value) tuple is called a “column”
Dynamic Column Families
                     Following
zznate    driftx:   thobbs:


driftx


thobbs    zznate:


jbellis   driftx:   mdennis:   pcmanus:   thobbs:   xedin:   zznate:
Dynamic Column Families

    Other Examples:
    – Timeline of tweets by a user
    – Timeline of tweets by all of the people a user is
      following
    – List of comments sorted by score
    – List of friends grouped by state
The Data API

    RPC-based API
    – github.com/twitter/cassandra

    CQL (Cassandra Query Language)
    – code.google.com/a/apache-extras.org/p/cassandra-ruby/
Inserting Data
 INSERT INTO users (KEY, “name”, “age”)
     VALUES (“thobbs”, “Tyler”, 24);
Updating Data
 Updates are the same as inserts:
 INSERT INTO users (KEY, “age”)
     VALUES (“thobbs”, 34);


 Or
 UPDATE users SET “age” = 34
     WHERE KEY = “thobbs”;
Fetching Data
 Whole row select:
 SELECT * FROM users WHERE KEY = “thobbs”;
Fetching Data
 Explicit column select:
 SELECT “name”, “age” FROM users
     WHERE KEY = “thobbs”;
Fetching Data
 Get a slice of columns
 UPDATE letters SET 1='a', 2='b', 3='c', 4='d', 5='e'
     WHERE KEY = “key”;

 SELECT 1..3 FROM letters WHERE KEY = “key”;


 Returns [(1, a), (2, b), (3, c)]
Fetching Data
 Get a slice of columns
 SELECT FIRST 2 FROM letters WHERE KEY = “key”;


 Returns [(1, a), (2, b)]

 SELECT FIRST 2 REVERSED FROM letters
     WHERE KEY = “key”;


 Returns [(5, e), (4, d)]
Fetching Data
 Get a slice of columns
 SELECT 3..'' FROM letters WHERE KEY = “key”;


 Returns [(3, c), (4, d), (5, e)]

 SELECT FIRST 2 REVERSED 4..'' FROM letters
     WHERE KEY = “key”;


 Returns [(4, d), (3, c)]
Deleting Data
 Delete a whole row:
 DELETE FROM users WHERE KEY = “thobbs”;

 Delete specific columns:
 DELETE “age” FROM users
     WHERE KEY = “thobbs”;
Secondary Indexes
 Builtin basic indexes
 CREATE INDEX ageIndex ON users (age);

 SELECT name FROM USERS
     WHERE age = 24 AND state = “TX”;
Performance

    Writes
    – 10k – 30k per second per node
    – Sub-millisecond latency

    Reads
    – 1k – 20k per second per node (depends on data
      set, caching
    – 0.1 to 10ms latency
Other Features

    Distributed Counters
    – Can support millions of high-volume counters

    Excellent Multi-datacenter Support
    – Disaster recovery
    – Locality

    Hadoop Integration
    – Isolation of resources
    – Hive and Pig drivers

    Compression
What Cassandra Can't Do

    Transactions
    – Unless you use a distributed lock
    – Atomicity, Isolation
    – These aren't needed as often as you'd think

    Limited support for ad-hoc queries
    – Know what you want to do with the data
Not One-size-fits-all

    Use alongside an RDBMS
Problems you shouldn't solve with C*

    Prototyping

    Distributed Locking

    Small datasets
    – (When you don't need availability)

    Complex graph processing
    – Shallow graph queries work well, though

    Fundamentally highly relational/transactional
    data
The sweet spot for Cassandra

    Large dataset, low latency queries

    Simple to medium complexity queries
    – Key/value
    – Time series, ordered data
    – Lists, sets, maps

    High Availability
The sweet spot for Cassandra

    Social
    – Texts, comments, check-ins, collaboration

    Activity
    – Feeds, timelines, clickstreams, logs, sensor data

    Metrics
    – Performance data over time
    – CloudKick, DataStax OpsCenter

    Text Search
    – Inbox search at Facebook
ORMs

    Poor integration

    ORMs are not a natural fit for Cassandra
    – In C*, we mainly care about queries, not objects
    – Beyond simple K/V, abstraction breaks

    Suggestion: don't waste time with an ORM
    – C* will only be used for a specific subset of your
      data/queries
    – Use the C* API directly in your model
Questions?

          Tyler Hobbs
               @tylhobbs
       tyler@datastax.com

Más contenido relacionado

La actualidad más candente

Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011mubarakss
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsgrro
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0Joe Stein
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraIntroduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraChetan Baheti
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyBenjamin Black
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelRishikese MR
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Spark Summit
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraSoftwareMill
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsJulien Anguenot
 
Introduction to apache_cassandra_for_develope
Introduction to apache_cassandra_for_developeIntroduction to apache_cassandra_for_develope
Introduction to apache_cassandra_for_developezznate
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0HBaseCon
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXzznate
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase HBaseCon
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon
 
第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandraShun Nakamura
 
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhgIntroduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhgzznate
 

La actualidad más candente (20)

Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requests
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraIntroduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache Cassandra
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
 
Cassandra Metrics
Cassandra MetricsCassandra Metrics
Cassandra Metrics
 
Introduction to apache_cassandra_for_develope
Introduction to apache_cassandra_for_developeIntroduction to apache_cassandra_for_develope
Introduction to apache_cassandra_for_develope
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase Update
 
第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra
 
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhgIntroduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
 

Similar a Cassandra for Ruby/Rails Devs

Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to CassandraTyler Hobbs
 
Cassandra for Python Developers
Cassandra for Python DevelopersCassandra for Python Developers
Cassandra for Python DevelopersTyler Hobbs
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupMichael Wynholds
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_uploadRajini Ramesh
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoopsrisatish ambati
 
Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?DataWorks Summit
 
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopJava one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopsrisatish ambati
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectMorningstar Tech Talks
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoopsrisatish ambati
 
Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Russell Spitzer
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
Cassandra 2012 scandit
Cassandra 2012 scanditCassandra 2012 scandit
Cassandra 2012 scanditCharlie Zhu
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsUniversity of Washington
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistencyScyllaDB
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupAdam Hutson
 

Similar a Cassandra for Ruby/Rails Devs (20)

Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Cassandra for Python Developers
Cassandra for Python DevelopersCassandra for Python Developers
Cassandra for Python Developers
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL Meetup
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_upload
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
 
Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?
 
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopJava one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
 
Cassandra at no_sql
Cassandra at no_sqlCassandra at no_sql
Cassandra at no_sql
 
Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0
 
NoSQL Smackdown!
NoSQL Smackdown!NoSQL Smackdown!
NoSQL Smackdown!
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
Cassandra 2012 scandit
Cassandra 2012 scanditCassandra 2012 scandit
Cassandra 2012 scandit
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore Environments
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistency
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User Group
 

Último

Planetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in LifePlanetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in LifeBhavana Pujan Kendra
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Americas Got Grants
 
Appkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxAppkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxappkodes
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxShruti Mittal
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdfShaun Heinrichs
 
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...Hector Del Castillo, CPM, CPMM
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfShashank Mehta
 
Introducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsIntroducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsKnowledgeSeed
 
Effective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold JewelryEffective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold JewelryWhittensFineJewelry1
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdfShaun Heinrichs
 
Data Analytics Strategy Toolkit and Templates
Data Analytics Strategy Toolkit and TemplatesData Analytics Strategy Toolkit and Templates
Data Analytics Strategy Toolkit and TemplatesAurelien Domont, MBA
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMVoces Mineras
 
Guide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFGuide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFChandresh Chudasama
 
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdfChris Skinner
 
Excvation Safety for safety officers reference
Excvation Safety for safety officers referenceExcvation Safety for safety officers reference
Excvation Safety for safety officers referencessuser2c065e
 
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfGUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfDanny Diep To
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Anamaria Contreras
 
Interoperability and ecosystems: Assembling the industrial metaverse
Interoperability and ecosystems:  Assembling the industrial metaverseInteroperability and ecosystems:  Assembling the industrial metaverse
Interoperability and ecosystems: Assembling the industrial metaverseSiemens
 
Healthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare NewsletterHealthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare NewsletterJamesConcepcion7
 

Último (20)

Planetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in LifePlanetary and Vedic Yagyas Bring Positive Impacts in Life
Planetary and Vedic Yagyas Bring Positive Impacts in Life
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...
 
Appkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxAppkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptx
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptx
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf
 
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdf
 
Introducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsIntroducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applications
 
Effective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold JewelryEffective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold Jewelry
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf
 
Data Analytics Strategy Toolkit and Templates
Data Analytics Strategy Toolkit and TemplatesData Analytics Strategy Toolkit and Templates
Data Analytics Strategy Toolkit and Templates
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQM
 
Guide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFGuide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDF
 
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf
 
Excvation Safety for safety officers reference
Excvation Safety for safety officers referenceExcvation Safety for safety officers reference
Excvation Safety for safety officers reference
 
The Bizz Quiz-E-Summit-E-Cell-IITPatna.pptx
The Bizz Quiz-E-Summit-E-Cell-IITPatna.pptxThe Bizz Quiz-E-Summit-E-Cell-IITPatna.pptx
The Bizz Quiz-E-Summit-E-Cell-IITPatna.pptx
 
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfGUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.
 
Interoperability and ecosystems: Assembling the industrial metaverse
Interoperability and ecosystems:  Assembling the industrial metaverseInteroperability and ecosystems:  Assembling the industrial metaverse
Interoperability and ecosystems: Assembling the industrial metaverse
 
Healthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare NewsletterHealthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare Newsletter
 

Cassandra for Ruby/Rails Devs

  • 1. Intro to Cassandra Tyler Hobbs
  • 2. History Dynamo BigTable (clustering) (data model) Inbox search Cassandra
  • 4. Clustering  Every node plays the same role – No masters, slaves, or special nodes – No single point of failure
  • 5. Consistent Hashing 0 50 10 40 20 30
  • 6. Consistent Hashing Key: “www.google.com” 0 50 10 40 20 30
  • 7. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 8. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 9. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 10. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30 Replication Factor = 3
  • 11. Clustering  Client can talk to any node
  • 12. Scaling RF = 2 0 50 10 The node at 50 owns the red portion 20 30
  • 13. Scaling RF = 2 0 50 10 Add a new 40 20 node at 40 30
  • 14. Scaling RF = 2 0 50 10 Add a new 40 20 node at 40 30
  • 15. Node Failures RF = 2 0 50 10 Replicas 40 20 30
  • 16. Node Failures RF = 2 0 50 10 Replicas 40 20 30
  • 17. Node Failures RF = 2 0 50 10 40 20 30
  • 18. Consistency, Availability  Consistency – Can I read stale data?  Availability – Can I write/read at all?  Tunable Consistency
  • 19. Consistency  N = Total number of replicas  R = Number of replicas read from – (before the response is returned)  W = Number of replicas written to – (before the write is considered a success)
  • 20. Consistency  N = Total number of replicas  R = Number of replicas read from – (before the response is returned)  W = Number of replicas written to – (before the write is considered a success) W + R > N gives strong consistency
  • 21. Consistency W + R > N gives strong consistency N=3 W=2 R=2 2 + 2 > 3 ==> strongly consistent
  • 22. Consistency W + R > N gives strong consistency N=3 W=2 R=2 2 + 2 > 3 ==> strongly consistent Only 2 of the 3 replicas must be available.
  • 23. Consistency  Tunable Consistency – Specify N (Replication Factor) per data set – Specify R, W per operation
  • 24. Consistency  Tunable Consistency – Specify N (Replication Factor) per data set – Specify R, W per operation – Quorum: N/2 + 1 • R = W = Quorum • Strong consistency • Tolerate the loss of N – Quorum replicas – R, W can also be 1 or N
  • 25. Availability  Can tolerate the loss of: – N – R replicas for reads – N – W replicas for writes
  • 26. CAP Theorem During node or network failure: 100% Not Possible Availability Possible Consistency 100%
  • 27. CAP Theorem During node or network failure: 100% Not Ca Possible ss an dr Availability a Possible Consistency 100%
  • 28. Clustering  No single point of failure  Replication that works  Scales linearly – 2x nodes = 2x performance • For both reads and writes – Up to 100's of nodes – See “Netflix: 1 million writes/sec on AWS”  Operationally simple  Multi-Datacenter Replication
  • 29. Data Model  Comes from Google BigTable  Goals – Commodity Hardware • Spinning disks – Handle data sets much larger than memory • Minimize disk seeks – High throughput – Low latency – Durable
  • 30. Column Families  Static – Object data – Similar to a table in a relational database  Dynamic – Precomputed query results – Materialized views (these are just educational classifications)
  • 31. Static Column Families Users zznate password: * name: Nate driftx password: * name: Brandon thobbs password: * name: Tyler jbellis password: * name: Jonathan site: riptano.com
  • 32. Dynamic Column Families  Rows – Each row has a unique primary key – Sorted list of (name, value) tuples • Like an ordered hash – The (name, value) tuple is called a “column”
  • 33. Dynamic Column Families Following zznate driftx: thobbs: driftx thobbs zznate: jbellis driftx: mdennis: pcmanus: thobbs: xedin: zznate:
  • 34. Dynamic Column Families  Other Examples: – Timeline of tweets by a user – Timeline of tweets by all of the people a user is following – List of comments sorted by score – List of friends grouped by state
  • 35. The Data API  RPC-based API – github.com/twitter/cassandra  CQL (Cassandra Query Language) – code.google.com/a/apache-extras.org/p/cassandra-ruby/
  • 36. Inserting Data INSERT INTO users (KEY, “name”, “age”) VALUES (“thobbs”, “Tyler”, 24);
  • 37. Updating Data Updates are the same as inserts: INSERT INTO users (KEY, “age”) VALUES (“thobbs”, 34); Or UPDATE users SET “age” = 34 WHERE KEY = “thobbs”;
  • 38. Fetching Data Whole row select: SELECT * FROM users WHERE KEY = “thobbs”;
  • 39. Fetching Data Explicit column select: SELECT “name”, “age” FROM users WHERE KEY = “thobbs”;
  • 40. Fetching Data Get a slice of columns UPDATE letters SET 1='a', 2='b', 3='c', 4='d', 5='e' WHERE KEY = “key”; SELECT 1..3 FROM letters WHERE KEY = “key”; Returns [(1, a), (2, b), (3, c)]
  • 41. Fetching Data Get a slice of columns SELECT FIRST 2 FROM letters WHERE KEY = “key”; Returns [(1, a), (2, b)] SELECT FIRST 2 REVERSED FROM letters WHERE KEY = “key”; Returns [(5, e), (4, d)]
  • 42. Fetching Data Get a slice of columns SELECT 3..'' FROM letters WHERE KEY = “key”; Returns [(3, c), (4, d), (5, e)] SELECT FIRST 2 REVERSED 4..'' FROM letters WHERE KEY = “key”; Returns [(4, d), (3, c)]
  • 43. Deleting Data Delete a whole row: DELETE FROM users WHERE KEY = “thobbs”; Delete specific columns: DELETE “age” FROM users WHERE KEY = “thobbs”;
  • 44. Secondary Indexes Builtin basic indexes CREATE INDEX ageIndex ON users (age); SELECT name FROM USERS WHERE age = 24 AND state = “TX”;
  • 45. Performance  Writes – 10k – 30k per second per node – Sub-millisecond latency  Reads – 1k – 20k per second per node (depends on data set, caching – 0.1 to 10ms latency
  • 46. Other Features  Distributed Counters – Can support millions of high-volume counters  Excellent Multi-datacenter Support – Disaster recovery – Locality  Hadoop Integration – Isolation of resources – Hive and Pig drivers  Compression
  • 47. What Cassandra Can't Do  Transactions – Unless you use a distributed lock – Atomicity, Isolation – These aren't needed as often as you'd think  Limited support for ad-hoc queries – Know what you want to do with the data
  • 48. Not One-size-fits-all  Use alongside an RDBMS
  • 49. Problems you shouldn't solve with C*  Prototyping  Distributed Locking  Small datasets – (When you don't need availability)  Complex graph processing – Shallow graph queries work well, though  Fundamentally highly relational/transactional data
  • 50. The sweet spot for Cassandra  Large dataset, low latency queries  Simple to medium complexity queries – Key/value – Time series, ordered data – Lists, sets, maps  High Availability
  • 51. The sweet spot for Cassandra  Social – Texts, comments, check-ins, collaboration  Activity – Feeds, timelines, clickstreams, logs, sensor data  Metrics – Performance data over time – CloudKick, DataStax OpsCenter  Text Search – Inbox search at Facebook
  • 52. ORMs  Poor integration  ORMs are not a natural fit for Cassandra – In C*, we mainly care about queries, not objects – Beyond simple K/V, abstraction breaks  Suggestion: don't waste time with an ORM – C* will only be used for a specific subset of your data/queries – Use the C* API directly in your model
  • 53. Questions? Tyler Hobbs @tylhobbs tyler@datastax.com