SlideShare una empresa de Scribd logo
1 de 72
Cassandra’s sweet spot

Dave Gardner
@davegardnerisme
jobs.hailocab.com
Looking for an expert backend
Java dev – speak to me!




meetup.com/Cassandra-
London
Next event 21st November
Building applications with Cassandra


   • Key features
   • Creating an application
   • Data modeling
Comparing Cassandra with X

  “Can someone quickly explain the
  differences between the two? Other than
  the fact that MongoDB supports ad-hoc
  querying I don't know whats different. It also
  appears (using google trends) that MongoDB
  seems to be growing while Cassandra is
  dying off. Is this the case?”
  27th July
  2010http://comments.gmane.org/gmane.comp.db.cassandra.user/
  7773
Comparing Cassandra with X


 “They have approximately nothing in
 common. And, no, Cassandra is
 definitely not dying off.”

 28th July 2010
 http://comments.gmane.org/gmane.comp.db.cassandra.user/7773
Top Tip #1

 To use a NoSQL solution effectively, we
 need to identify it's sweet spot.
Top Tip #1

 To use a NoSQL solution effectively, we
 need to identify it's sweet spot.

 This means learning about each solution;
 how is it designed? what algorithms
 does it use?
 http://www.alberton.info/nosql_databases_what_when_why_phpuk2
 011.html
Comparing Cassandra with X

“they say … I can’t decide between this project
and this project even though they look nothing
like each other. And the fact that you can’t
decide indicates that you don’t actually have a
problem that requires them.”

Benjamin Black – NoSQL Tapes (at 30:15)
http://nosqltapes.com/video/benjamin-black-on-nosql-cloud-computing-and-
fast_ip
Headline features


 1. Elastic
 Read and write throughput increases
 linearly as new machines are added


 http://cassandra.apache.org/
Headline features


 2. Decentralised
 Fault tolerant with no single point of
 failure; no “master” node


 http://cassandra.apache.org/
The dynamo paper

 •   Consistent hashing
 •   Vector clocks
 •   Gossip protocol
 •   Hinted handoff
 •   Read repair

 http://www.allthingsdistributed.com/files/amazon-dynamo-
 sosp2007.pdf
The dynamo paper
                       #
                       1       RF = 3

                   #       #
                   6       2




     Coordinator
                   #       #
                   5       3

Client
                       #
                       4
Headline features


 3. Rich data model
 Column based, range slices, column
 slices, secondary
 indexes, counters, expiring columns

 http://cassandra.apache.org/
The big table paper

 •   Sparse "columnar" data model
 •   SSTable disk storage
 •   Append-only commit log
 •   Memtable (buffer and sort)
 •   Immutable SSTable files
 •   Compaction
 http://labs.google.com/papers/bigtable-osdi06.pdf
 http://www.slideshare.net/geminimobile/bigtable-4820829
The big table paper

                      Column Family



             Name        Name         Name
   Row Key
             Value        Value       Value

             Column      Column       Column
Headline features


 4. You're in control
 Tunable consistency, per operation



 http://cassandra.apache.org/
Consistency levels

 How many replicas must respond to
 declare success?
Consistency levels: write operations

  Level                Description
  ANY                  One node, including hinted handoff
  ONE                  One node
  QUORUM               N/2 + 1 replicas
  LOCAL_QUORUM N/2 + 1 replicas in local data centre
  EACH_QUORUM          N/2 + 1 replicas in each data centre
  ALL                  All replicas

 http://wiki.apache.org/cassandra/API#Write
Consistency levels: read operations

  Level                Description
  ONE                  1st Response
  QUORUM               N/2 + 1 replicas
  LOCAL_QUORUM N/2 + 1 replicas in local data centre
  EACH_QUORUM          N/2 + 1 replicas in each data centre
  ALL                  All replicas


 http://wiki.apache.org/cassandra/API#Read
Headline features


 5. Performant
 Well known for high write performance


 http://www.datastax.com/docs/1.0/introduction/index#core-
 strengths-of-cassandra
Benchmark*



                 http://blog.cubrid.org/dev-
             platform/nosql-benchmarking/




                           * Add pinch of salt
Recap: headline features


 1. Elastic
 2. Decentralised
 3. Rich data model
 4. You’re in control (tunable consistency)
 5. Performant
A simple ad-targeting application




                                     Some ads
 Choose which
 ad to show




                Our user knowledge
A simple ad-targeting application

 Allow us to capture user behaviour/data
 via “pixels” - placing users into segments
 (different buckets)



 http://pixel.wehaveyourkidneys.com/add.php?add=foo
A simple ad-targeting application

 Record clicks and impressions of each
 ad; storing data per-ad and per-segment




 http://pixel.wehaveyourkidneys.com/adImpression.php?ad=1
 http://pixel.wehaveyourkidneys.com/adClick.php?ad=1
A simple ad-targeting application

 Real-time ad performance
 analytics, broken down by segment
 (which segments are performing well?)



 http://www.wehaveyourkidneys.com/adPerformance.php?ad=1
A simple ad-targeting application

 Recommendations based on best-
 performing ads




 (this is left as an exercise for the reader)
Additional requirements


 • Large number of users
 • High volume of impressions
 • Highly available – downtime is money
A good fit for Cassandra?

 Yes!

 Big data, high availability and lots of
 writes are all good signs that Cassandra
 will fit well.

 http://www.nosqldatabases.com/main/2010/10/19/what-is-
 cassandra-good-for.html
A good fit for Cassandra?

 Although there are many things that
 people are using Cassandra for.

 Highly available HTTP request routing
 (tiny data!)

 http://blip.tv/datastax/highly-available-http-request-routing-dns-
 using-cassandra-5501901
Top Tip #2

 Cassandra is an excellent fit where
 availability matters, where there is a lot
 of data or where you have a large
 number of write operations.
Demo

 Live demo before we start
Data modeling

 Start from your queries, work backwards

 http://www.slideshare.net/mattdennis/cassandra-data-modeling
 http://blip.tv/datastax/data-modeling-workshop-5496906
Data model basics: conflict resolution

 Per-column timestamp-based conflict
 resolution
 {                              {
     column: foo,                   column: foo,
     value: bar,                    value: zing,
     timestamp: 1000                timestamp: 1001
 }                              }

 http://cassandra.apache.org/
Data model basics: conflict resolution

 Per-column timestamp-based conflict
 resolution
 {                              {
     column: foo,                   column: foo,
     value: bar,                    value: zing,
     timestamp: 1000                timestamp: 1001
 }                              }

 http://cassandra.apache.org/
Data model basics: column ordering

 Columns ordered at time of
 writing, according to Column Family
 schema
 {                     {
     column: zebra,                 column: badger,
     value: foo,                    value: foo,
     timestamp: 1000                timestamp: 1001
 }                              }



 http://cassandra.apache.org/
Data model basics: column ordering

 Columns ordered at time of
 writing, according to Column Family
 schema
 {
     badger: foo,               with AsciiType column
     zebra: foo                 schema
 }




 http://cassandra.apache.org/
Data modeling: user segments

 Add user to bucket X, with expiry time Y
 Which buckets is user X in?

 ["user"][<uuid>][<bucketId>] = 1

 [CF]      [rowKey] [columnName]   = value
Data modeling: user segments

 user Column Family:
 [f97be9cc-5255-4578-8813-76701c0945bd][bar]   =   1
 [f97be9cc-5255-4578-8813-76701c0945bd][foo]   =   1
 [06a6f1b0-fcf2-41d9-8949-fe2d416bde8e][baz]   =   1
 [06a6f1b0-fcf2-41d9-8949-fe2d416bde8e][zoo]   =   1
 [503778bc-246f-4041-ac5a-fd944176b26d][aaa]   =   1

 Q: Is user in segment X?
 A: Single column fetch
Data modeling: user segments

 user Column Family:
 [f97be9cc-5255-4578-8813-76701c0945bd][bar]   =   1
 [f97be9cc-5255-4578-8813-76701c0945bd][foo]   =   1
 [06a6f1b0-fcf2-41d9-8949-fe2d416bde8e][baz]   =   1
 [06a6f1b0-fcf2-41d9-8949-fe2d416bde8e][zoo]   =   1
 [503778bc-246f-4041-ac5a-fd944176b26d][aaa]   =   1

 Q: Which segments is user X in?
 A: Column slice fetch
Top Tip #3

 With column slices, we get the columns
 back ordered, according to our schema

 We cannot do the same for rows
 however, unless we use the Order
 Preserving Partitioner
Top Tip #4

 Don’t use the Order Preserving
 Partitioner unless you absolutely have to




 http://ria101.wordpress.com/2010/02/22/cassandra-
 randompartitioner-vs-orderpreservingpartitioner/
Data modeling: user segments


 Add user to bucket X, with expiry time Y
 Which buckets is user X in?

 ["user"][<uuid>][<bucketId>] = 1

 [CF]      [rowKey] [columnName]   = value
Expiring columns

 An expiring column will be automatically
 deleted after n seconds




 http://cassandra.apache.org/
Data modeling: user segments

 $pool = new ConnectionPool(
     'whyk', array('localhost')
     );
 $users = new ColumnFamily($pool, 'users');
 $users->insert(
     $userUuid,
     array($segment => 1),
     NULL,    // default TS
     $expires
     );

 Using phpcassa client: https://github.com/thobbs/phpcassa
Data modeling: user segments

 UPDATE users
 USING TTL = 3600
 SET 'foo' = 1
 WHERE KEY =
     'f97be9cc-5255-4578-8813-76701c0945bd'

 Using CQL
  http://www.datastax.com/dev/blog/what%E2%80%99s-new-in-
 cassandra-0-8-part-1-cql-the-cassandra-query-language

 http://www.datastax.com/docs/1.0/references/cql
Top Tip #5

 Try to exploit Cassandra’s columnar data
 model; avoid read-before write and
 locking by safely mutating individual
 columns
Data modeling: ad performance

 Track overall ad performance; how many
 clicks/impressions per ad?

 ["ads"][<adId>][<stamp>]["click"] = #
 ["ads"][<adId>][<stamp>]["impression"] = #

 [CF] [Row] [S.Col] [Col]   = value

 Using super columns
Top Tip #6

 Friends don’t let friends use Super
 Columns.




 http://rubyscale.com/2010/beware-the-supercolumn-its-a-trap-for-
 the-unwary/
Data modeling: ad performance

 Try again using regular columns:

 ["ads"][<adId>][<stamp>-"click"] = #
 ["ads"][<adId>][<stamp>-"impression"] = #

 [CF] [Row]     [Col]   = value
Data modeling: ad performance

 ads Column Family:
 [1][2011103015-click] = 1
 [1][2011103015-impression] = 3434
 [1][2011103016-click] = 12
 [1][2011103016-impression] = 5411
 [1][2011103017-click] = 2
 [1][2011103017-impression] = 345

 Q: Get performance of ad X between two date/times
 A: Column slice against single row specifying a start
 stamp and end stamp + 1
Think carefully about your data

 This scheme works because I’m
 assuming each ad has a relatively short
 lifespan. This means that there are lots
 of rows and hence the load is spread.


 Other options:
 http://rubyscale.com/2011/basic-time-series-with-cassandra/
Counters


 • Distributed atomic counters
 • Easy to use
 • Not idempotent

 http://www.datastax.com/dev/blog/whats-new-in-cassandra-0-8-part-
 2-counters
Data modeling: ad performance

 $stamp = date('YmdH');
 $ads->add(
     $adId,                              // row key
     "$stamp-impression",                // column
     1                                   // increment
     );

 We’ll store performance metrics in hour buckets for graphing.
Data modeling: ad performance

 UPDATE ads
 SET '2011103015-impression'
     = '2011103015-impression' + 1
 WHERE KEY = '1’
Data modeling: performance/segment

 We can add in another dimension to our
 stats so we can breakdown by segment.

 ["ads"][<adId>]
           [<stamp>-<segment>-"click"] = #

 [CF]   [Row]
           [Col]                     = value
Data modeling: performance/segment

 ads Column Family:
 [1][2011103015-bar-click] = 1
 [1][2011103015-bar-impression] = 3434
 [1][2011103015-foo-click] = 12
 [1][2011103015-foo-impression] = 5411
 [1][2011103016-bar-click] = 2

 Q: Get performance of ad X between two date/times,
 split by segment
 A: Column slice against single row specifying a start
 stamp and end stamp + 1
Data modeling: performance/segment

 $stamp = date('YmdH');
 $ads->add(
     "$adId-segments",                                // row key
     "$stamp-$segment-impression",                    // column
     1                                                // incr
     );

 We’ll store performance metrics in hour buckets for graphing.
Data modeling: segment stats

 Track overall clicks/impressions per
 bucket; which buckets are most clicky?

 ["segments"][<adId>-"segments"]
            [<stamp>-<segment>-"click"] = #

 [CF]   [Row]
            [Col]                    = value
Recap: Data modeling

 • Think about the queries, work
   backwards
 • Don’t overuse single rows; try to
   spread the load
 • Don’t use super columns
 • Ask on IRC! #cassandra
Recap: Common data modeling patterns

 1. Using column names with no value

 [cf][rowKey][columnName] = 1
Recap: Common data modeling patterns

 2. Counters

 [cf][rowKey][columnName]++
And also…

 3. Serialising a whole object

 [cf][rowKey][columnName] = {
     foo: 3,
     bar: 11
     }
There’s more: Brisk

 Integrated Hadoop distribution (without
 HDFS installed). Run Hive and Pig queries
 directly against Cassandra

 DataStax now offer this functionality in
 their “Enterprise” product

 http://www.datastax.com/products/enterprise
Hive

CREATE EXTERNAL TABLE tempUsers
    (userUuid string, segmentId string, value string)
STORED BY
'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES (
    "cassandra.columns.mapping" = ":key,:column,:value",
    "cassandra.cf.name" = "users"
    );


SELECT segmentId, count(1) AS total
FROM tempUsers
GROUP BY segmentId
ORDER BY total DESC;
There’s more: Supercharged Cassandra

 Acunu have reengineered the entire Unix
 storage stack, optimised specifically for
 Big Data workloads

 Includes instant snapshot of CFs

 http://www.acunu.com/products/choosing-cassandra/
In conclusion

 Cassandra is founded on sound design
 principles
In conclusion

 The Cassandra data model, sometimes
 mentioned as a weakness, is incredibly
 powerful
In conclusion

 The clients are getting better; CQL is a
 step forward
In conclusion

 Hadoop integration means we can
 analyse data directly from a Cassandra
 cluster
In conclusion

 Cassandra’s sweet spot is highly
 available “big data” (especially time-
 series) with large numbers of writes
Thanks

Learn more about Cassandra
meetup.com/Cassandra-London
Checkout the code https://github.com/davegardnerisme/we-have-
your-kidneys

Watch videos from Cassandra SF 2011
http://www.datastax.com/events/cassandrasf2011/presentations

Más contenido relacionado

La actualidad más candente

Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for SysadminsNathan Milford
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckDataStax Academy
 
Node.js and Cassandra
Node.js and CassandraNode.js and Cassandra
Node.js and CassandraStratio
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesPhil Peace
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3DataStax
 
MariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityMariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityColin Charles
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduseScott Miao
 
Using advanced options in MariaDB Connector/J
Using advanced options in MariaDB Connector/JUsing advanced options in MariaDB Connector/J
Using advanced options in MariaDB Connector/JMariaDB plc
 
Introduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraIntroduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraJohnny Miller
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 
Learn Cassandra at edureka!
Learn Cassandra at edureka!Learn Cassandra at edureka!
Learn Cassandra at edureka!Edureka!
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
 
MySQL PHP native driver : Advanced Functions / PHP forum Paris 2013
 MySQL PHP native driver  : Advanced Functions / PHP forum Paris 2013   MySQL PHP native driver  : Advanced Functions / PHP forum Paris 2013
MySQL PHP native driver : Advanced Functions / PHP forum Paris 2013 Serge Frezefond
 
Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기Jesang Yoon
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseChristopher Choi
 
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::ClientBuilding Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::ClientMike Friedman
 
006 performance tuningandclusteradmin
006 performance tuningandclusteradmin006 performance tuningandclusteradmin
006 performance tuningandclusteradminScott Miao
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 

La actualidad más candente (20)

Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide Deck
 
Node.js and Cassandra
Node.js and CassandraNode.js and Cassandra
Node.js and Cassandra
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3
 
MariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityMariaDB and Cassandra Interoperability
MariaDB and Cassandra Interoperability
 
Top ten-list
Top ten-listTop ten-list
Top ten-list
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
 
Using advanced options in MariaDB Connector/J
Using advanced options in MariaDB Connector/JUsing advanced options in MariaDB Connector/J
Using advanced options in MariaDB Connector/J
 
Introduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraIntroduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache Cassandra
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
Learn Cassandra at edureka!
Learn Cassandra at edureka!Learn Cassandra at edureka!
Learn Cassandra at edureka!
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
MySQL PHP native driver : Advanced Functions / PHP forum Paris 2013
 MySQL PHP native driver  : Advanced Functions / PHP forum Paris 2013   MySQL PHP native driver  : Advanced Functions / PHP forum Paris 2013
MySQL PHP native driver : Advanced Functions / PHP forum Paris 2013
 
Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBase
 
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::ClientBuilding Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::Client
 
ES & Kafka
ES & KafkaES & Kafka
ES & Kafka
 
006 performance tuningandclusteradmin
006 performance tuningandclusteradmin006 performance tuningandclusteradmin
006 performance tuningandclusteradmin
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 

Destacado

Learning Cassandra
Learning CassandraLearning Cassandra
Learning CassandraDave Gardner
 
Cassandra - Research Paper Overview
Cassandra - Research Paper OverviewCassandra - Research Paper Overview
Cassandra - Research Paper Overviewsameiralk
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsDave Gardner
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Modelebenhewitt
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Cassandra - Deep Dive ...
Cassandra - Deep Dive ...Cassandra - Deep Dive ...
Cassandra - Deep Dive ...sameiralk
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13Dave Gardner
 
Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)Dave Gardner
 
Cassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUGCassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUGMatthew Dennis
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13Dave Gardner
 
DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?DataStax
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNblueboxtraveler
 
Samza: Real-time Stream Processing at LinkedIn
Samza: Real-time Stream Processing at LinkedInSamza: Real-time Stream Processing at LinkedIn
Samza: Real-time Stream Processing at LinkedInC4Media
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark datastaxjp
 
BigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsBigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsMatthew Dennis
 
Building Real-time Data Products at LinkedIn with Apache Samza
Building Real-time Data Products at LinkedIn with Apache SamzaBuilding Real-time Data Products at LinkedIn with Apache Samza
Building Real-time Data Products at LinkedIn with Apache SamzaTrieu Nguyen
 
Event Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaEvent Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaZach Cox
 

Destacado (20)

Learning Cassandra
Learning CassandraLearning Cassandra
Learning Cassandra
 
Cassandra - Research Paper Overview
Cassandra - Research Paper OverviewCassandra - Research Paper Overview
Cassandra - Research Paper Overview
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Cassandra - Deep Dive ...
Cassandra - Deep Dive ...Cassandra - Deep Dive ...
Cassandra - Deep Dive ...
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13
 
Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)
 
Cassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUGCassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUG
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13
 
DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
 
Samza: Real-time Stream Processing at LinkedIn
Samza: Real-time Stream Processing at LinkedInSamza: Real-time Stream Processing at LinkedIn
Samza: Real-time Stream Processing at LinkedIn
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark
 
BigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsBigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current Trends
 
Building Real-time Data Products at LinkedIn with Apache Samza
Building Real-time Data Products at LinkedIn with Apache SamzaBuilding Real-time Data Products at LinkedIn with Apache Samza
Building Real-time Data Products at LinkedIn with Apache Samza
 
Event Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaEvent Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and Samza
 

Similar a Cassandra's Sweet Spot - an introduction to Apache Cassandra

Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage systemArunit Gupta
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandraPL dream
 
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...Amazon Web Services
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar clusterShivji Kumar Jha
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafkaconfluent
 
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...Rustem Feyzkhanov
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloudDmitry Tolpeko
 
C* Summit EU 2013: Cassandra Internals
C* Summit EU 2013: Cassandra Internals C* Summit EU 2013: Cassandra Internals
C* Summit EU 2013: Cassandra Internals DataStax Academy
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka MeetupCliff Gilmore
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introductionfardinjamshidi
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Applicationsupertom
 
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...ScyllaDB
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupMichael Wynholds
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
 
Tech-Spark: Exploring the Cosmos DB
Tech-Spark: Exploring the Cosmos DBTech-Spark: Exploring the Cosmos DB
Tech-Spark: Exploring the Cosmos DBRalph Attard
 
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...Amazon Web Services
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
 

Similar a Cassandra's Sweet Spot - an introduction to Apache Cassandra (20)

Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
C* Summit EU 2013: Cassandra Internals
C* Summit EU 2013: Cassandra Internals C* Summit EU 2013: Cassandra Internals
C* Summit EU 2013: Cassandra Internals
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL Meetup
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 
Tech-Spark: Exploring the Cosmos DB
Tech-Spark: Exploring the Cosmos DBTech-Spark: Exploring the Cosmos DB
Tech-Spark: Exploring the Cosmos DB
 
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 

Más de Dave Gardner

Cabs, Cassandra, and Hailo
Cabs, Cassandra, and HailoCabs, Cassandra, and Hailo
Cabs, Cassandra, and HailoDave Gardner
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systemsDave Gardner
 
Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011Dave Gardner
 
2011.07.18 cassandrameetup
2011.07.18 cassandrameetup2011.07.18 cassandrameetup
2011.07.18 cassandrameetupDave Gardner
 
Cassandra + Hadoop = Brisk
Cassandra + Hadoop = BriskCassandra + Hadoop = Brisk
Cassandra + Hadoop = BriskDave Gardner
 
Introduction to Cassandra at London Web Meetup
Introduction to Cassandra at London Web MeetupIntroduction to Cassandra at London Web Meetup
Introduction to Cassandra at London Web MeetupDave Gardner
 
Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Dave Gardner
 

Más de Dave Gardner (8)

Cabs, Cassandra, and Hailo
Cabs, Cassandra, and HailoCabs, Cassandra, and Hailo
Cabs, Cassandra, and Hailo
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 
Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011
 
2011.07.18 cassandrameetup
2011.07.18 cassandrameetup2011.07.18 cassandrameetup
2011.07.18 cassandrameetup
 
Cassandra + Hadoop = Brisk
Cassandra + Hadoop = BriskCassandra + Hadoop = Brisk
Cassandra + Hadoop = Brisk
 
Introduction to Cassandra at London Web Meetup
Introduction to Cassandra at London Web MeetupIntroduction to Cassandra at London Web Meetup
Introduction to Cassandra at London Web Meetup
 
Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2
 
PHP and Cassandra
PHP and CassandraPHP and Cassandra
PHP and Cassandra
 

Último

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Cassandra's Sweet Spot - an introduction to Apache Cassandra

  • 1. Cassandra’s sweet spot Dave Gardner @davegardnerisme
  • 2. jobs.hailocab.com Looking for an expert backend Java dev – speak to me! meetup.com/Cassandra- London Next event 21st November
  • 3. Building applications with Cassandra • Key features • Creating an application • Data modeling
  • 4. Comparing Cassandra with X “Can someone quickly explain the differences between the two? Other than the fact that MongoDB supports ad-hoc querying I don't know whats different. It also appears (using google trends) that MongoDB seems to be growing while Cassandra is dying off. Is this the case?” 27th July 2010http://comments.gmane.org/gmane.comp.db.cassandra.user/ 7773
  • 5. Comparing Cassandra with X “They have approximately nothing in common. And, no, Cassandra is definitely not dying off.” 28th July 2010 http://comments.gmane.org/gmane.comp.db.cassandra.user/7773
  • 6. Top Tip #1 To use a NoSQL solution effectively, we need to identify it's sweet spot.
  • 7. Top Tip #1 To use a NoSQL solution effectively, we need to identify it's sweet spot. This means learning about each solution; how is it designed? what algorithms does it use? http://www.alberton.info/nosql_databases_what_when_why_phpuk2 011.html
  • 8. Comparing Cassandra with X “they say … I can’t decide between this project and this project even though they look nothing like each other. And the fact that you can’t decide indicates that you don’t actually have a problem that requires them.” Benjamin Black – NoSQL Tapes (at 30:15) http://nosqltapes.com/video/benjamin-black-on-nosql-cloud-computing-and- fast_ip
  • 9. Headline features 1. Elastic Read and write throughput increases linearly as new machines are added http://cassandra.apache.org/
  • 10. Headline features 2. Decentralised Fault tolerant with no single point of failure; no “master” node http://cassandra.apache.org/
  • 11. The dynamo paper • Consistent hashing • Vector clocks • Gossip protocol • Hinted handoff • Read repair http://www.allthingsdistributed.com/files/amazon-dynamo- sosp2007.pdf
  • 12. The dynamo paper # 1 RF = 3 # # 6 2 Coordinator # # 5 3 Client # 4
  • 13. Headline features 3. Rich data model Column based, range slices, column slices, secondary indexes, counters, expiring columns http://cassandra.apache.org/
  • 14. The big table paper • Sparse "columnar" data model • SSTable disk storage • Append-only commit log • Memtable (buffer and sort) • Immutable SSTable files • Compaction http://labs.google.com/papers/bigtable-osdi06.pdf http://www.slideshare.net/geminimobile/bigtable-4820829
  • 15. The big table paper Column Family Name Name Name Row Key Value Value Value Column Column Column
  • 16. Headline features 4. You're in control Tunable consistency, per operation http://cassandra.apache.org/
  • 17. Consistency levels How many replicas must respond to declare success?
  • 18. Consistency levels: write operations Level Description ANY One node, including hinted handoff ONE One node QUORUM N/2 + 1 replicas LOCAL_QUORUM N/2 + 1 replicas in local data centre EACH_QUORUM N/2 + 1 replicas in each data centre ALL All replicas http://wiki.apache.org/cassandra/API#Write
  • 19. Consistency levels: read operations Level Description ONE 1st Response QUORUM N/2 + 1 replicas LOCAL_QUORUM N/2 + 1 replicas in local data centre EACH_QUORUM N/2 + 1 replicas in each data centre ALL All replicas http://wiki.apache.org/cassandra/API#Read
  • 20. Headline features 5. Performant Well known for high write performance http://www.datastax.com/docs/1.0/introduction/index#core- strengths-of-cassandra
  • 21. Benchmark* http://blog.cubrid.org/dev- platform/nosql-benchmarking/ * Add pinch of salt
  • 22. Recap: headline features 1. Elastic 2. Decentralised 3. Rich data model 4. You’re in control (tunable consistency) 5. Performant
  • 23. A simple ad-targeting application Some ads Choose which ad to show Our user knowledge
  • 24. A simple ad-targeting application Allow us to capture user behaviour/data via “pixels” - placing users into segments (different buckets) http://pixel.wehaveyourkidneys.com/add.php?add=foo
  • 25. A simple ad-targeting application Record clicks and impressions of each ad; storing data per-ad and per-segment http://pixel.wehaveyourkidneys.com/adImpression.php?ad=1 http://pixel.wehaveyourkidneys.com/adClick.php?ad=1
  • 26. A simple ad-targeting application Real-time ad performance analytics, broken down by segment (which segments are performing well?) http://www.wehaveyourkidneys.com/adPerformance.php?ad=1
  • 27. A simple ad-targeting application Recommendations based on best- performing ads (this is left as an exercise for the reader)
  • 28. Additional requirements • Large number of users • High volume of impressions • Highly available – downtime is money
  • 29. A good fit for Cassandra? Yes! Big data, high availability and lots of writes are all good signs that Cassandra will fit well. http://www.nosqldatabases.com/main/2010/10/19/what-is- cassandra-good-for.html
  • 30. A good fit for Cassandra? Although there are many things that people are using Cassandra for. Highly available HTTP request routing (tiny data!) http://blip.tv/datastax/highly-available-http-request-routing-dns- using-cassandra-5501901
  • 31. Top Tip #2 Cassandra is an excellent fit where availability matters, where there is a lot of data or where you have a large number of write operations.
  • 32. Demo Live demo before we start
  • 33. Data modeling Start from your queries, work backwards http://www.slideshare.net/mattdennis/cassandra-data-modeling http://blip.tv/datastax/data-modeling-workshop-5496906
  • 34. Data model basics: conflict resolution Per-column timestamp-based conflict resolution { { column: foo, column: foo, value: bar, value: zing, timestamp: 1000 timestamp: 1001 } } http://cassandra.apache.org/
  • 35. Data model basics: conflict resolution Per-column timestamp-based conflict resolution { { column: foo, column: foo, value: bar, value: zing, timestamp: 1000 timestamp: 1001 } } http://cassandra.apache.org/
  • 36. Data model basics: column ordering Columns ordered at time of writing, according to Column Family schema { { column: zebra, column: badger, value: foo, value: foo, timestamp: 1000 timestamp: 1001 } } http://cassandra.apache.org/
  • 37. Data model basics: column ordering Columns ordered at time of writing, according to Column Family schema { badger: foo, with AsciiType column zebra: foo schema } http://cassandra.apache.org/
  • 38. Data modeling: user segments Add user to bucket X, with expiry time Y Which buckets is user X in? ["user"][<uuid>][<bucketId>] = 1 [CF] [rowKey] [columnName] = value
  • 39. Data modeling: user segments user Column Family: [f97be9cc-5255-4578-8813-76701c0945bd][bar] = 1 [f97be9cc-5255-4578-8813-76701c0945bd][foo] = 1 [06a6f1b0-fcf2-41d9-8949-fe2d416bde8e][baz] = 1 [06a6f1b0-fcf2-41d9-8949-fe2d416bde8e][zoo] = 1 [503778bc-246f-4041-ac5a-fd944176b26d][aaa] = 1 Q: Is user in segment X? A: Single column fetch
  • 40. Data modeling: user segments user Column Family: [f97be9cc-5255-4578-8813-76701c0945bd][bar] = 1 [f97be9cc-5255-4578-8813-76701c0945bd][foo] = 1 [06a6f1b0-fcf2-41d9-8949-fe2d416bde8e][baz] = 1 [06a6f1b0-fcf2-41d9-8949-fe2d416bde8e][zoo] = 1 [503778bc-246f-4041-ac5a-fd944176b26d][aaa] = 1 Q: Which segments is user X in? A: Column slice fetch
  • 41. Top Tip #3 With column slices, we get the columns back ordered, according to our schema We cannot do the same for rows however, unless we use the Order Preserving Partitioner
  • 42. Top Tip #4 Don’t use the Order Preserving Partitioner unless you absolutely have to http://ria101.wordpress.com/2010/02/22/cassandra- randompartitioner-vs-orderpreservingpartitioner/
  • 43. Data modeling: user segments Add user to bucket X, with expiry time Y Which buckets is user X in? ["user"][<uuid>][<bucketId>] = 1 [CF] [rowKey] [columnName] = value
  • 44. Expiring columns An expiring column will be automatically deleted after n seconds http://cassandra.apache.org/
  • 45. Data modeling: user segments $pool = new ConnectionPool( 'whyk', array('localhost') ); $users = new ColumnFamily($pool, 'users'); $users->insert( $userUuid, array($segment => 1), NULL, // default TS $expires ); Using phpcassa client: https://github.com/thobbs/phpcassa
  • 46. Data modeling: user segments UPDATE users USING TTL = 3600 SET 'foo' = 1 WHERE KEY = 'f97be9cc-5255-4578-8813-76701c0945bd' Using CQL http://www.datastax.com/dev/blog/what%E2%80%99s-new-in- cassandra-0-8-part-1-cql-the-cassandra-query-language http://www.datastax.com/docs/1.0/references/cql
  • 47. Top Tip #5 Try to exploit Cassandra’s columnar data model; avoid read-before write and locking by safely mutating individual columns
  • 48. Data modeling: ad performance Track overall ad performance; how many clicks/impressions per ad? ["ads"][<adId>][<stamp>]["click"] = # ["ads"][<adId>][<stamp>]["impression"] = # [CF] [Row] [S.Col] [Col] = value Using super columns
  • 49. Top Tip #6 Friends don’t let friends use Super Columns. http://rubyscale.com/2010/beware-the-supercolumn-its-a-trap-for- the-unwary/
  • 50. Data modeling: ad performance Try again using regular columns: ["ads"][<adId>][<stamp>-"click"] = # ["ads"][<adId>][<stamp>-"impression"] = # [CF] [Row] [Col] = value
  • 51. Data modeling: ad performance ads Column Family: [1][2011103015-click] = 1 [1][2011103015-impression] = 3434 [1][2011103016-click] = 12 [1][2011103016-impression] = 5411 [1][2011103017-click] = 2 [1][2011103017-impression] = 345 Q: Get performance of ad X between two date/times A: Column slice against single row specifying a start stamp and end stamp + 1
  • 52. Think carefully about your data This scheme works because I’m assuming each ad has a relatively short lifespan. This means that there are lots of rows and hence the load is spread. Other options: http://rubyscale.com/2011/basic-time-series-with-cassandra/
  • 53. Counters • Distributed atomic counters • Easy to use • Not idempotent http://www.datastax.com/dev/blog/whats-new-in-cassandra-0-8-part- 2-counters
  • 54. Data modeling: ad performance $stamp = date('YmdH'); $ads->add( $adId, // row key "$stamp-impression", // column 1 // increment ); We’ll store performance metrics in hour buckets for graphing.
  • 55. Data modeling: ad performance UPDATE ads SET '2011103015-impression' = '2011103015-impression' + 1 WHERE KEY = '1’
  • 56. Data modeling: performance/segment We can add in another dimension to our stats so we can breakdown by segment. ["ads"][<adId>] [<stamp>-<segment>-"click"] = # [CF] [Row] [Col] = value
  • 57. Data modeling: performance/segment ads Column Family: [1][2011103015-bar-click] = 1 [1][2011103015-bar-impression] = 3434 [1][2011103015-foo-click] = 12 [1][2011103015-foo-impression] = 5411 [1][2011103016-bar-click] = 2 Q: Get performance of ad X between two date/times, split by segment A: Column slice against single row specifying a start stamp and end stamp + 1
  • 58. Data modeling: performance/segment $stamp = date('YmdH'); $ads->add( "$adId-segments", // row key "$stamp-$segment-impression", // column 1 // incr ); We’ll store performance metrics in hour buckets for graphing.
  • 59. Data modeling: segment stats Track overall clicks/impressions per bucket; which buckets are most clicky? ["segments"][<adId>-"segments"] [<stamp>-<segment>-"click"] = # [CF] [Row] [Col] = value
  • 60. Recap: Data modeling • Think about the queries, work backwards • Don’t overuse single rows; try to spread the load • Don’t use super columns • Ask on IRC! #cassandra
  • 61. Recap: Common data modeling patterns 1. Using column names with no value [cf][rowKey][columnName] = 1
  • 62. Recap: Common data modeling patterns 2. Counters [cf][rowKey][columnName]++
  • 63. And also… 3. Serialising a whole object [cf][rowKey][columnName] = { foo: 3, bar: 11 }
  • 64. There’s more: Brisk Integrated Hadoop distribution (without HDFS installed). Run Hive and Pig queries directly against Cassandra DataStax now offer this functionality in their “Enterprise” product http://www.datastax.com/products/enterprise
  • 65. Hive CREATE EXTERNAL TABLE tempUsers (userUuid string, segmentId string, value string) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = ":key,:column,:value", "cassandra.cf.name" = "users" ); SELECT segmentId, count(1) AS total FROM tempUsers GROUP BY segmentId ORDER BY total DESC;
  • 66. There’s more: Supercharged Cassandra Acunu have reengineered the entire Unix storage stack, optimised specifically for Big Data workloads Includes instant snapshot of CFs http://www.acunu.com/products/choosing-cassandra/
  • 67. In conclusion Cassandra is founded on sound design principles
  • 68. In conclusion The Cassandra data model, sometimes mentioned as a weakness, is incredibly powerful
  • 69. In conclusion The clients are getting better; CQL is a step forward
  • 70. In conclusion Hadoop integration means we can analyse data directly from a Cassandra cluster
  • 71. In conclusion Cassandra’s sweet spot is highly available “big data” (especially time- series) with large numbers of writes
  • 72. Thanks Learn more about Cassandra meetup.com/Cassandra-London Checkout the code https://github.com/davegardnerisme/we-have- your-kidneys Watch videos from Cassandra SF 2011 http://www.datastax.com/events/cassandrasf2011/presentations