SlideShare a Scribd company logo
1 of 45
Download to read offline
Five factors to consider when
choosing a big data solution!
Jonathan Ellis
CTO, DataStax
Project Chair, Apache Cassandra
how do I



 my application?
                 model

©2012 DataStax
Popular options
  • Key/value
  • Tabular
  • Document
  • Graph?




©2012 DataStax
Schema is your friend

{
         "id": "e451dd42-ece3-11e1-a0a3-34159e154f4c",
         "name": "jbellis",
         "state": "TX",
         "birthdate": "1/1/1976",
         "email_addresses": ["jbellis@gmail", "jbellis@datastax.com"],
}




    ©2012 DataStax
SQL can be your friend too

 CREATE TABLE users (
    id uuid PRIMARY KEY,
    name text,
    state text,
    birth_date date
 );



 CREATE INDEX ON users(state);

 SELECT * FROM users
 WHERE state=‘Texas’ AND birth_date > ‘1950-01-01’;




©2012 DataStax
Collections
 CREATE TABLE users (
    id uuid PRIMARY KEY,
    name text,
    state text,
    birth_date date
 );

 CREATE TABLE users_addresses (
    user_id uuid REFERENCES users,
    email text
 );

 SELECT *
 FROM users NATURAL JOIN users_addresses;




©2012 DataStax
Collections
 CREATE TABLE users (
    id uuid PRIMARY KEY,
    name text,
    state text,




                 X
    birth_date date
 );

 CREATE TABLE users_addresses (
    user_id uuid REFERENCES users,
    email text
 );

 SELECT *
 FROM users NATURAL JOIN users_addresses;




©2012 DataStax
Collections
 CREATE TABLE users (
    id uuid PRIMARY KEY,
    name text,
    state text,
    birth_date date,
    email_addresses set<text>
 );

 UPDATE users
 SET email_addresses = email_addresses + {‘jbellis@gmail.com’,
 ‘jbellis@datastax.com’};




©2012 DataStax
Joins don’t scale
  • No joins
  • No subqueries
  • No aggregation functions* or GROUP BY
  • ORDER BY?




©2012 DataStax
SELECT * FROM tweets
WHERE user_id IN (SELECT follower FROM followers
                  WHERE user_id = ’driftx’)

                       followers




                  ?




 ©2012 DataStax
                                    tweets
Clustering in Cassandra
CREATE TABLE timeline (     user_id   tweet_id   _author    _body
  user_id uuid,
  tweet_id timeuuid,        jbellis   3290f9da.. rbranson   lorem
  tweet_author uuid,        jbellis   3895411a..   tjake    ipsum
   tweet_body text,           ...         ...        ...
  PRIMARY KEY (user_id,
                tweet_id)   driftx    3290f9da.. rbranson   lorem
);
                            driftx    71b46a84.. yzhang     dolor
                              ...         ...       ...


                            yukim     3290f9da.. rbranson   lorem
                            yukim     e451dd42..   tjake     amet
                              ...         ...        ...



 ©2012 DataStax
Clustering in Cassandra
CREATE TABLE timeline (     user_id   tweet_id   _author    _body
  user_id uuid,
  tweet_id timeuuid,        jbellis   3290f9da.. rbranson   lorem
  tweet_author uuid,        jbellis   3895411a..   tjake    ipsum
   tweet_body text,           ...         ...        ...
  PRIMARY KEY (user_id,
                tweet_id)   driftx    3290f9da.. rbranson   lorem
);
                            driftx    71b46a84.. yzhang     dolor
                              ...         ...       ...
SELECT * FROM timeline
WHERE user_id = ’driftx’;   yukim     3290f9da.. rbranson   lorem
                            yukim     e451dd42..   tjake     amet
                              ...         ...        ...



 ©2012 DataStax
how does it

                 perform?

©2012 DataStax
Larger than memory datasets




©2012 DataStax
Locking




©2012 DataStax
Efficiency




©2012 DataStax
UPDATE users
 SET email_addresses = email_addresses + {...}
 WHERE user_id = ‘jbellis’;




©2012 DataStax
Durability




©2012 DataStax
C* storage engine very briefly
           write( k1 , c1:v1 )

                                              Memory




                                 Memtable




         Commit log


©2012 DataStax                              Hard drive
write( k1 , c1:v1 )

                                                         Memory
                                 k1 c1:v1




                                            Memtable



                 k1 c1:v1




         Commit log


©2012 DataStax                                         Hard drive
write( k1 , c2:v2 )

                                                    Memory
                                 k1 c1:v1 c2:v2




                 k1 c1:v1
                 k1 c2:v2




©2012 DataStax                                    Hard drive
write(        k2   ,   c1:v1 c2:v2   )

                                                                        Memory
                                                     k1 c1:v1 c2:v2

                                                     k2 c1:v1 c2:v2




                   k1 c1:v1
                   k1 c2:v2
                 k2 c1:v1 c2:v2




©2012 DataStax                                                        Hard drive
write(        k1   ,   c1:v4 c3:v3   )

                                                                              Memory
                                                     k1 c1:v4 c2:v2 c3:v3

                                                     k2 c1:v1 c2:v2




                   k1 c1:v1
                   k1 c2:v2
                 k2 c1:v1 c2:v2
             k1 c1:v4 c3:v3




©2012 DataStax                                                              Hard drive
Memory




                           flush




                                  index
                 cleanup    k1 c1:v4 c2:v2 c3:v3

                            k2 c1:v1 c2:v2


                                                   SSTable




©2012 DataStax                                               Hard drive
No random writes




©2012 DataStax
reads/s            writes/s

                                                                       35000



                                                                      30000


                                                                     25000


                                                                    20000


                                                                   15000


                                                                   10000

                                                               5000
                 Cassandra 0.6
                                                               0
©2012 DataStax
                                           Cassandra 1.0
how does it handle

                 failure?

©2012 DataStax
Classic partitioning with SPOF
                 partition 1   partition 2      partition 3   partition 4




                                         router


                                             client
©2012 DataStax
Availability
  • “High availability implies that a single fault will not bring
            down your system. Not ‘we’ll recover quickly.’”
            -- Ben Coverston: DataStax

     •      “The biggest problem with failover is that you're almost
            never using it until it really hurts. It's like backups that
            you never test.”
            -- Rick Branson: Instagram




©2012 DataStax
Fully distributed, no SPOF
                 client




                          p3
                                p6        p1
                           p1




                                     p1




©2012 DataStax
Multiple datacenters




©2012 DataStax
©2012 DataStax
how does it

                 scale?

©2012 DataStax
Scaling antipatterns
  • Metadata servers
  • Router bottlenecks
  • Overloading existing nodes when adding capacity




©2012 DataStax
©2012 DataStax
how


 is it?
                 flexible

©2012 DataStax
36
Data model: Realtime
     LiveStocks      stock       last
                    GOOG        $95.52
                     AAPL      $186.10
                    AMZN       $112.98


       Portfolios    user       stock       shares
                    jbellis     GOOG          80
                    jbellis     LNKD          20
                    yukim       AMZN         100

      StockHist     stock        date       price
                    GOOG      2011-01-01    $8.23
                    GOOG      2011-01-02    $6.14
                    GOOG      2011-001-03   $7.78
©2012 DataStax
Data model: Analytics
 HistLoss                     worst_date    loss
                 Portfolio1   2011-07-23   -$34.81
                 Portfolio2   2011-03-11 -$11432.24
                 Portfolio3   2011-05-21 -$1476.93




©2012 DataStax
Data model: Analytics
  10dayreturns
          stock      rdate     return
          GOOG    2011-07-25   $8.23
          GOOG    2011-07-24   $6.14
          GOOG    2011-07-23   $7.78
          AAPL    2011-07-25   $15.32
          AAPL    2011-07-24   $12.68


     INSERT OVERWRITE TABLE 10dayreturns
     SELECT a.stock,
            b.date as rdate,
            b.price - a.price
     FROM StockHist a
     JOIN StockHist b
     ON (a.stock = b.stock
         AND date_add(a.date, 10) = b.date);

©2012 DataStax
Data model: Analytics
  portfolio_returns
            portfolio       rdate      preturn
            Portfolio1   2011-07-25    $118.21
            Portfolio1   2011-07-24     $60.78
            Portfolio1   2011-07-23    -$34.81
            Portfolio2   2011-07-25   $2143.92
            Portfolio3   2011-07-24    -$10.19


       INSERT OVERWRITE TABLE portfolio_returns
       SELECT portfolio,
              rdate,
              SUM(b.return)
       FROM portfolios a JOIN 10dayreturns b
       ON (a.stock = b.stock)
       GROUP BY portfolio, rdate;

©2012 DataStax
Data model: Analytics
  HistLoss
                       worst_date    loss
          Portfolio1   2011-07-23   -$34.81
          Portfolio2   2011-03-11 -$11432.24
          Portfolio3   2011-05-21 -$1476.93



    INSERT OVERWRITE TABLE HistLoss
    SELECT a.portfolio, rdate, minp
    FROM (
      SELECT portfolio, min(preturn) as minp
      FROM portfolio_returns
      GROUP BY portfolio
    ) a
    JOIN portfolio_returns b
    ON (a.portfolio = b.portfolio and a.minp = b.preturn);

©2012 DataStax
42
Some Cassandra users




©2012 DataStax
Questions?

Image credits
•    http://www.flickr.com/photos/26817893@N05/2573006312/

•    http://www.flickr.com/photos/rowanbank/7686239548

•    http://www.flickr.com/photos/mervtheswerve/6081933265

•    http://www.flickr.com/photos/dg_pics/2526208830

•    http://www.flickr.com/photos/wainwright/351684037

•    http://www.flickr.com/photos/mikeneilson/1606662529

•    http://www.flickr.com/photos/sbisson/3852905534

•    http://www.flickr.com/photos/breadnbadger/2674928517

More Related Content

What's hot

Cassandra presentation at NoSQL
Cassandra presentation at NoSQLCassandra presentation at NoSQL
Cassandra presentation at NoSQLEvan Weaver
 
Advanced Windows Debugging
Advanced Windows DebuggingAdvanced Windows Debugging
Advanced Windows DebuggingBala Subra
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014jbellis
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectMorningstar Tech Talks
 
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17Aaron Benton
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014jbellis
 
Deployment in Oracle SOA Suite and in Oracle BPM Suite
Deployment in Oracle SOA Suite and in Oracle BPM SuiteDeployment in Oracle SOA Suite and in Oracle BPM Suite
Deployment in Oracle SOA Suite and in Oracle BPM SuiteLonneke Dikmans
 
Introduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseIntroduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseDipti Borkar
 
Akiban Technologies: Renormalize
Akiban Technologies: RenormalizeAkiban Technologies: Renormalize
Akiban Technologies: RenormalizeAriel Weil
 
The Native NDB Engine for Memcached
The Native NDB Engine for MemcachedThe Native NDB Engine for Memcached
The Native NDB Engine for MemcachedJohn David Duncan
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012Chris Richardson
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1jbellis
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
What You Need to Know to Move from a Relational to a NoSQL Database
What You Need to Know to Move from a Relational to a NoSQL DatabaseWhat You Need to Know to Move from a Relational to a NoSQL Database
What You Need to Know to Move from a Relational to a NoSQL DatabaseDATAVERSITY
 

What's hot (17)

Cassandra presentation at NoSQL
Cassandra presentation at NoSQLCassandra presentation at NoSQL
Cassandra presentation at NoSQL
 
Advanced Windows Debugging
Advanced Windows DebuggingAdvanced Windows Debugging
Advanced Windows Debugging
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014
 
Deployment in Oracle SOA Suite and in Oracle BPM Suite
Deployment in Oracle SOA Suite and in Oracle BPM SuiteDeployment in Oracle SOA Suite and in Oracle BPM Suite
Deployment in Oracle SOA Suite and in Oracle BPM Suite
 
Grails 2.0 Update
Grails 2.0 UpdateGrails 2.0 Update
Grails 2.0 Update
 
Introduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseIntroduction to NoSQL and Couchbase
Introduction to NoSQL and Couchbase
 
Akiban Technologies: Renormalize
Akiban Technologies: RenormalizeAkiban Technologies: Renormalize
Akiban Technologies: Renormalize
 
The Native NDB Engine for Memcached
The Native NDB Engine for MemcachedThe Native NDB Engine for Memcached
The Native NDB Engine for Memcached
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
 
Cassandra11
Cassandra11Cassandra11
Cassandra11
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
What You Need to Know to Move from a Relational to a NoSQL Database
What You Need to Know to Move from a Relational to a NoSQL DatabaseWhat You Need to Know to Move from a Relational to a NoSQL Database
What You Need to Know to Move from a Relational to a NoSQL Database
 
Advanced queuinginternals
Advanced queuinginternalsAdvanced queuinginternals
Advanced queuinginternals
 

Similar to Top five questions to ask when choosing a big data solution

Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandrajbellis
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012jbellis
 
On Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and FutureOn Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and Futurepcmanus
 
Paris Cassandra Meetup - Overview of New Cassandra Drivers
Paris Cassandra Meetup - Overview of New Cassandra DriversParis Cassandra Meetup - Overview of New Cassandra Drivers
Paris Cassandra Meetup - Overview of New Cassandra DriversMichaël Figuière
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Scaling DataStax in Docker
Scaling DataStax in DockerScaling DataStax in Docker
Scaling DataStax in DockerDataStax
 
Tungsten University: Replicate Between MySQL And Oracle
Tungsten University: Replicate Between MySQL And OracleTungsten University: Replicate Between MySQL And Oracle
Tungsten University: Replicate Between MySQL And OracleContinuent
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloudjbellis
 
Paris Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for DevelopersParis Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for DevelopersMichaël Figuière
 
Big Data Uses with Distributed Asynchronous Object Storage
Big Data Uses with Distributed Asynchronous Object StorageBig Data Uses with Distributed Asynchronous Object Storage
Big Data Uses with Distributed Asynchronous Object StorageIntel® Software
 
Scalability 09262012
Scalability 09262012Scalability 09262012
Scalability 09262012Mike Miller
 
Denver SQL Saturday The Next Frontier
Denver SQL Saturday The Next FrontierDenver SQL Saturday The Next Frontier
Denver SQL Saturday The Next FrontierKellyn Pot'Vin-Gorman
 
Breaking the-database-type-barrier-replicating-across-different-dbms
Breaking the-database-type-barrier-replicating-across-different-dbmsBreaking the-database-type-barrier-replicating-across-different-dbms
Breaking the-database-type-barrier-replicating-across-different-dbmsLinas Virbalas
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
 

Similar to Top five questions to ask when choosing a big data solution (20)

Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
DataStax 6 and Beyond
DataStax 6 and BeyondDataStax 6 and Beyond
DataStax 6 and Beyond
 
On Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and FutureOn Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and Future
 
Paris Cassandra Meetup - Overview of New Cassandra Drivers
Paris Cassandra Meetup - Overview of New Cassandra DriversParis Cassandra Meetup - Overview of New Cassandra Drivers
Paris Cassandra Meetup - Overview of New Cassandra Drivers
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Scaling DataStax in Docker
Scaling DataStax in DockerScaling DataStax in Docker
Scaling DataStax in Docker
 
Tungsten University: Replicate Between MySQL And Oracle
Tungsten University: Replicate Between MySQL And OracleTungsten University: Replicate Between MySQL And Oracle
Tungsten University: Replicate Between MySQL And Oracle
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloud
 
Paris Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for DevelopersParis Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for Developers
 
Big Data Uses with Distributed Asynchronous Object Storage
Big Data Uses with Distributed Asynchronous Object StorageBig Data Uses with Distributed Asynchronous Object Storage
Big Data Uses with Distributed Asynchronous Object Storage
 
Scalability 09262012
Scalability 09262012Scalability 09262012
Scalability 09262012
 
Multi-cluster k8ssandra
Multi-cluster k8ssandraMulti-cluster k8ssandra
Multi-cluster k8ssandra
 
Denver SQL Saturday The Next Frontier
Denver SQL Saturday The Next FrontierDenver SQL Saturday The Next Frontier
Denver SQL Saturday The Next Frontier
 
CouchDB
CouchDBCouchDB
CouchDB
 
Copy Data Management for the DBA
Copy Data Management for the DBACopy Data Management for the DBA
Copy Data Management for the DBA
 
Breaking the-database-type-barrier-replicating-across-different-dbms
Breaking the-database-type-barrier-replicating-across-different-dbmsBreaking the-database-type-barrier-replicating-across-different-dbms
Breaking the-database-type-barrier-replicating-across-different-dbms
 
Starburn
StarburnStarburn
Starburn
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 

More from jbellis

Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015jbellis
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013jbellis
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0jbellis
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Javajbellis
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprisejbellis
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)jbellis
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011jbellis
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)jbellis
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from javajbellis
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011jbellis
 
Brisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by CassandraBrisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by Cassandrajbellis
 
PyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialPyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialjbellis
 
Cassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability GroupCassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability Groupjbellis
 
Cassandra devoxx 2010
Cassandra devoxx 2010Cassandra devoxx 2010
Cassandra devoxx 2010jbellis
 
Cassandra FrOSCon 10
Cassandra FrOSCon 10Cassandra FrOSCon 10
Cassandra FrOSCon 10jbellis
 
State of Cassandra, August 2010
State of Cassandra, August 2010State of Cassandra, August 2010
State of Cassandra, August 2010jbellis
 
Cassandra nosql eu 2010
Cassandra nosql eu 2010Cassandra nosql eu 2010
Cassandra nosql eu 2010jbellis
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010jbellis
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamojbellis
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalabilityjbellis
 

More from jbellis (20)

Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from java
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011
 
Brisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by CassandraBrisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by Cassandra
 
PyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialPyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorial
 
Cassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability GroupCassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability Group
 
Cassandra devoxx 2010
Cassandra devoxx 2010Cassandra devoxx 2010
Cassandra devoxx 2010
 
Cassandra FrOSCon 10
Cassandra FrOSCon 10Cassandra FrOSCon 10
Cassandra FrOSCon 10
 
State of Cassandra, August 2010
State of Cassandra, August 2010State of Cassandra, August 2010
State of Cassandra, August 2010
 
Cassandra nosql eu 2010
Cassandra nosql eu 2010Cassandra nosql eu 2010
Cassandra nosql eu 2010
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 

Recently uploaded

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Top five questions to ask when choosing a big data solution

  • 1. Five factors to consider when choosing a big data solution! Jonathan Ellis CTO, DataStax Project Chair, Apache Cassandra
  • 2. how do I my application? model ©2012 DataStax
  • 3. Popular options • Key/value • Tabular • Document • Graph? ©2012 DataStax
  • 4. Schema is your friend { "id": "e451dd42-ece3-11e1-a0a3-34159e154f4c", "name": "jbellis", "state": "TX", "birthdate": "1/1/1976", "email_addresses": ["jbellis@gmail", "jbellis@datastax.com"], } ©2012 DataStax
  • 5. SQL can be your friend too CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > ‘1950-01-01’; ©2012 DataStax
  • 6. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses; ©2012 DataStax
  • 7. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, X birth_date date ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses; ©2012 DataStax
  • 8. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date, email_addresses set<text> ); UPDATE users SET email_addresses = email_addresses + {‘jbellis@gmail.com’, ‘jbellis@datastax.com’}; ©2012 DataStax
  • 9. Joins don’t scale • No joins • No subqueries • No aggregation functions* or GROUP BY • ORDER BY? ©2012 DataStax
  • 10. SELECT * FROM tweets WHERE user_id IN (SELECT follower FROM followers WHERE user_id = ’driftx’) followers ? ©2012 DataStax tweets
  • 11. Clustering in Cassandra CREATE TABLE timeline ( user_id tweet_id _author _body   user_id uuid,   tweet_id timeuuid, jbellis 3290f9da.. rbranson lorem   tweet_author uuid, jbellis 3895411a.. tjake ipsum tweet_body text, ... ... ...   PRIMARY KEY (user_id, tweet_id) driftx 3290f9da.. rbranson lorem ); driftx 71b46a84.. yzhang dolor ... ... ... yukim 3290f9da.. rbranson lorem yukim e451dd42.. tjake amet ... ... ... ©2012 DataStax
  • 12. Clustering in Cassandra CREATE TABLE timeline ( user_id tweet_id _author _body   user_id uuid,   tweet_id timeuuid, jbellis 3290f9da.. rbranson lorem   tweet_author uuid, jbellis 3895411a.. tjake ipsum tweet_body text, ... ... ...   PRIMARY KEY (user_id, tweet_id) driftx 3290f9da.. rbranson lorem ); driftx 71b46a84.. yzhang dolor ... ... ... SELECT * FROM timeline WHERE user_id = ’driftx’; yukim 3290f9da.. rbranson lorem yukim e451dd42.. tjake amet ... ... ... ©2012 DataStax
  • 13. how does it perform? ©2012 DataStax
  • 14. Larger than memory datasets ©2012 DataStax
  • 17. UPDATE users SET email_addresses = email_addresses + {...} WHERE user_id = ‘jbellis’; ©2012 DataStax
  • 19. C* storage engine very briefly write( k1 , c1:v1 ) Memory Memtable Commit log ©2012 DataStax Hard drive
  • 20. write( k1 , c1:v1 ) Memory k1 c1:v1 Memtable k1 c1:v1 Commit log ©2012 DataStax Hard drive
  • 21. write( k1 , c2:v2 ) Memory k1 c1:v1 c2:v2 k1 c1:v1 k1 c2:v2 ©2012 DataStax Hard drive
  • 22. write( k2 , c1:v1 c2:v2 ) Memory k1 c1:v1 c2:v2 k2 c1:v1 c2:v2 k1 c1:v1 k1 c2:v2 k2 c1:v1 c2:v2 ©2012 DataStax Hard drive
  • 23. write( k1 , c1:v4 c3:v3 ) Memory k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 k1 c1:v1 k1 c2:v2 k2 c1:v1 c2:v2 k1 c1:v4 c3:v3 ©2012 DataStax Hard drive
  • 24. Memory flush index cleanup k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 SSTable ©2012 DataStax Hard drive
  • 26. reads/s writes/s 35000 30000 25000 20000 15000 10000 5000 Cassandra 0.6 0 ©2012 DataStax Cassandra 1.0
  • 27. how does it handle failure? ©2012 DataStax
  • 28. Classic partitioning with SPOF partition 1 partition 2 partition 3 partition 4 router client ©2012 DataStax
  • 29. Availability • “High availability implies that a single fault will not bring down your system. Not ‘we’ll recover quickly.’” -- Ben Coverston: DataStax • “The biggest problem with failover is that you're almost never using it until it really hurts. It's like backups that you never test.” -- Rick Branson: Instagram ©2012 DataStax
  • 30. Fully distributed, no SPOF client p3 p6 p1 p1 p1 ©2012 DataStax
  • 33. how does it scale? ©2012 DataStax
  • 34. Scaling antipatterns • Metadata servers • Router bottlenecks • Overloading existing nodes when adding capacity ©2012 DataStax
  • 36. how is it? flexible ©2012 DataStax
  • 37. 36
  • 38. Data model: Realtime LiveStocks stock last GOOG $95.52 AAPL $186.10 AMZN $112.98 Portfolios user stock shares jbellis GOOG 80 jbellis LNKD 20 yukim AMZN 100 StockHist stock date price GOOG 2011-01-01 $8.23 GOOG 2011-01-02 $6.14 GOOG 2011-001-03 $7.78 ©2012 DataStax
  • 39. Data model: Analytics HistLoss worst_date loss Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-03-11 -$11432.24 Portfolio3 2011-05-21 -$1476.93 ©2012 DataStax
  • 40. Data model: Analytics 10dayreturns stock rdate return GOOG 2011-07-25 $8.23 GOOG 2011-07-24 $6.14 GOOG 2011-07-23 $7.78 AAPL 2011-07-25 $15.32 AAPL 2011-07-24 $12.68 INSERT OVERWRITE TABLE 10dayreturns SELECT a.stock, b.date as rdate, b.price - a.price FROM StockHist a JOIN StockHist b ON (a.stock = b.stock AND date_add(a.date, 10) = b.date); ©2012 DataStax
  • 41. Data model: Analytics portfolio_returns portfolio rdate preturn Portfolio1 2011-07-25 $118.21 Portfolio1 2011-07-24 $60.78 Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-07-25 $2143.92 Portfolio3 2011-07-24 -$10.19 INSERT OVERWRITE TABLE portfolio_returns SELECT portfolio, rdate, SUM(b.return) FROM portfolios a JOIN 10dayreturns b ON (a.stock = b.stock) GROUP BY portfolio, rdate; ©2012 DataStax
  • 42. Data model: Analytics HistLoss worst_date loss Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-03-11 -$11432.24 Portfolio3 2011-05-21 -$1476.93 INSERT OVERWRITE TABLE HistLoss SELECT a.portfolio, rdate, minp FROM ( SELECT portfolio, min(preturn) as minp FROM portfolio_returns GROUP BY portfolio ) a JOIN portfolio_returns b ON (a.portfolio = b.portfolio and a.minp = b.preturn); ©2012 DataStax
  • 43. 42
  • 45. Questions? Image credits • http://www.flickr.com/photos/26817893@N05/2573006312/ • http://www.flickr.com/photos/rowanbank/7686239548 • http://www.flickr.com/photos/mervtheswerve/6081933265 • http://www.flickr.com/photos/dg_pics/2526208830 • http://www.flickr.com/photos/wainwright/351684037 • http://www.flickr.com/photos/mikeneilson/1606662529 • http://www.flickr.com/photos/sbisson/3852905534 • http://www.flickr.com/photos/breadnbadger/2674928517