SlideShare una empresa de Scribd logo
1 de 35
PUTTING THE X FACTOR INTO
              CASSANDRA:
  ADVENTURES IN COUNTING
             MALCOLM BOX, CTO, TELLYBUG




                                          1
INTRO
  Malcolm Box, CTO & Co-Founder

  @malcolmbox

  malcolm@tellybug.com

  http://tellybug.com




                                  2
3
WHAT I’M TALKING ABOUT
  How we started using Cassandra

  How we use it to power the X Factor and Britain’s Got Talent apps

  Counting - harder than you might think

  What we learnt along the way




                                                                      4
THE CHALLENGE
  10-12 Million people watching these shows


  TV tells them to buzz/clap/score....


  ....Servers melt


  Design goals to handle 10K interactions/s




                                              5
ROLL BACK 1 YEAR
  We’d won BGT 2011 - our first big talent show


  Existing MySQL/Django/Python stack


  Back of envelope calculations....oh dear


  Needed something quickly that could cope with anticipated load




                                                                   6
OUR FIRST CASSANDRA SCHEMA
 create column family vote_log with comment='Log of votes'

   and comparator='UTF8Type' and key_validation_class='UUIDType'

   and default_validation_class='UTF8Type'

   and column_metadata = [

     {column_name:'ipaddr', validation_class:'AsciiType'},

     {column_name:'poll', validation_class:'LongType'},

     {column_name:'choice', validation_class:'LongType'},

     {column_name:'idtoken', validation_class:'UTF8Type'},

     {column_name:'count', validation_class:'LongType'}];




                                                                   7
WHAT WE LEARNT
 Cassandra scales beautifully for writes



 Cassandra has no single point of failure

 ....but it’s not hard to make it fail



 Ad-hoc questions and reporting were going to be much slower




                                                               8
OPERATIONS
  BGT 2011 was a write only DB



  Ignored failures



  One cluster, one AZ



  Backup to MySQL


                                 9
Over 1 Million app
                downloads


                Over 260 Million boos/claps




X FACTOR 2011

                                          10
IMPLEMENTING X FACTOR WITH CASSANDRA
  Counting



  Social network



  No longer write-only




                                       11
WHAT ARE MY FRIENDS DOING?
  Scale makes this hard


  10K changes/s


  Which ones are relevant to which users?


     When new users (and their social graph) can arrive at any time




                                                                      12
SOLUTION
  New Column Family - user activity


  Maps user to their interactions


  Write problem nicely randomised and thus ideal for Cassandra


  Read problem!




                                                                 13
COUNTING - HARDER THAN IT LOOKS
  Everyone can count



  But we need to count really fast



  And distribute the results to all the clients




                                                  14
DISTRIBUTED COUNTING
  “Memcache does counters”


  “OK, how about sharding?”


  “Well, I hear Cassandra 0.8 has counters”




                                              15
ASIDE - THINGS THAT CAN’T COUNT #3
 cache.set('key', 1)

 cache.decr('key', 1)

 >>> 0L

 cache.decr('key', 1)

 >>> 0L

 cache.incr('key', -1)

 >>> 4294967295L

 cache.incr('key', 1)

 >>> 4294967296L


                                     16
SINGLE BOX LIMITS




                    17
SINGLE BOX LIMITS
  We have a single value




                           17
SINGLE BOX LIMITS
  We have a single value

  Everything needs to read and write that value -
  from multiple servers




                                                    17
SINGLE BOX LIMITS
  We have a single value

  Everything needs to read and write that value -
  from multiple servers

  EC2 limits

     Single Memcache server runs out of
     network I/O

     What then?




                                                    17
CASSANDRA HAS COUNTERS
  New (at the time) feature in Cassandra 0.8

  Special column type - CounterColumnType as the validator

  Distributed 64 bit counter, with eventual consistency

     CL.ONE writes recommended to avoid implicit reads impacting performance

     Reads tot up values from replicas to give value

  Simple functionality

     incr()/decr(), get()




                                                                               18
CAN CASSANDRA COUNT?




                       19
CAN CASSANDRA COUNT?
  Yes, But....




                       19
CAN CASSANDRA COUNT?
  Yes, But....

  Performance can be an issue

      Switch off replicate_on_write, tune RF & cluster size




                                                              19
CAN CASSANDRA COUNT?
  Yes, But....

  Performance can be an issue

      Switch off replicate_on_write, tune RF & cluster size

  Not scalable for single counter

      Scales as function of RF up to 4 nodes

      Above that ... you’re out of luck

      Best we achieved is ~10K/s increments to single counter value on EC2 m1.large instances




                                                                                                19
CAN CASSANDRA COUNT?
  Yes, But....

  Performance can be an issue

      Switch off replicate_on_write, tune RF & cluster size

  Not scalable for single counter

      Scales as function of RF up to 4 nodes

      Above that ... you’re out of luck

      Best we achieved is ~10K/s increments to single counter value on EC2 m1.large instances

  What do you do if an operation fails?



                                                                                                19
COUNTING AT SCALE WITH CASSANDRA
  Write throughput to a single counter is limited


     We were inside the performance limit, so writes could go to Cassandra


  No way to scale within Cassandra (yet)


  Reads have a serious performance overhead


  We used sharded counters in memcached with source of truth in Cassandra


     Few reads from Cass = much more predictable performance


                                                                             20
OPERATIONS
  Cassandra GUIs & mgmt consoles still in infancy

  Hard to figure out what is going wrong when performance suffered

  Analytics (and backup) still via dump to MySQL

     Flexible, well understood

  Single cluster, single AZ




                                                                    21
WHERE WE WERE AFTER X FACTOR
  Cassandra as a source of truth in production

  Mainly write load

  Memcached layer on top

  Simple operations

  No backups :(




                                                 22
BEYOND X FACTOR
  Dancing on Ice - harder counting

  Britain’s Got Talent 2012 - more social

  Backups

  Data integrity




                                            23
DATA CONSISTENCY
  There’s no referential integrity

     So is the data in the database self-consistent?

     Or do you have a bug somewhere?

  How do you validate the data?

     Truth + 1




                                                       24
BACKUPS
  Backing up a cluster isn’t easy



  Restoring can be harder...




                                    25
CONCLUSION
  Cassandra saved our bacon :)


  Scales to insane write loads

     Reads are easier to scale in memcached


  Beware of limitations on “hot” values


  Migrating functionality gradually let us learn the operational aspects

     There are lots of interesting failure scenarios at scale


                                                                           26
TODO
 Scale-up/Scale-down of a cluster



 Better monitoring and operations



 Analytics using Hadoop




                                    27
ANY QUESTIONS?


  We’re hiring - if you want to work on wicked scaling problems and reach millions of
                                   users, get in touch!

                                malcolm@tellybug.com

                                    @malcolmbox




                                                                                        28

Más contenido relacionado

Similar a Cassandra EU 2012 - Putting the X Factor into Cassandra

Big data meetup 2012 01-18 - stripped
Big data meetup 2012 01-18 - strippedBig data meetup 2012 01-18 - stripped
Big data meetup 2012 01-18 - strippedMalcolm Box
 
Scylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScyllaDB
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
Performance Testing: Scylla vs. Cassandra vs. Datastax
Performance Testing: Scylla vs. Cassandra vs. DatastaxPerformance Testing: Scylla vs. Cassandra vs. Datastax
Performance Testing: Scylla vs. Cassandra vs. DatastaxScyllaDB
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesScyllaDB
 
Scaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosqlScaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosqlDavid Daeschler
 
Real World Cassandra
Real World CassandraReal World Cassandra
Real World CassandraGiltTech
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandrajohnrjenson
 
Idi2017 - Cloud DB: strengths and weaknesses
Idi2017 - Cloud DB: strengths and weaknessesIdi2017 - Cloud DB: strengths and weaknesses
Idi2017 - Cloud DB: strengths and weaknessesLinuxaria.com
 
Cassandra tw presentation
Cassandra tw presentationCassandra tw presentation
Cassandra tw presentationOmarFaroque16
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Building Bridges Between Applications and Data
Building Bridges Between Applications and DataBuilding Bridges Between Applications and Data
Building Bridges Between Applications and DataOlyaSurits
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Scylla db deck, july 2017
Scylla db deck, july 2017Scylla db deck, july 2017
Scylla db deck, july 2017Dor Laor
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...DataStax
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: CassandraAcunu
 

Similar a Cassandra EU 2012 - Putting the X Factor into Cassandra (20)

Big data meetup 2012 01-18 - stripped
Big data meetup 2012 01-18 - strippedBig data meetup 2012 01-18 - stripped
Big data meetup 2012 01-18 - stripped
 
Scylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of Scylla
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Performance Testing: Scylla vs. Cassandra vs. Datastax
Performance Testing: Scylla vs. Cassandra vs. DatastaxPerformance Testing: Scylla vs. Cassandra vs. Datastax
Performance Testing: Scylla vs. Cassandra vs. Datastax
 
The Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking DatabasesThe Do’s and Don’ts of Benchmarking Databases
The Do’s and Don’ts of Benchmarking Databases
 
Scaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosqlScaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosql
 
Real World Cassandra
Real World CassandraReal World Cassandra
Real World Cassandra
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
 
Idi2017 - Cloud DB: strengths and weaknesses
Idi2017 - Cloud DB: strengths and weaknessesIdi2017 - Cloud DB: strengths and weaknesses
Idi2017 - Cloud DB: strengths and weaknesses
 
Cassandra tw presentation
Cassandra tw presentationCassandra tw presentation
Cassandra tw presentation
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Cassandra vs Databases
Cassandra vs Databases Cassandra vs Databases
Cassandra vs Databases
 
Building Bridges Between Applications and Data
Building Bridges Between Applications and DataBuilding Bridges Between Applications and Data
Building Bridges Between Applications and Data
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Scylla db deck, july 2017
Scylla db deck, july 2017Scylla db deck, july 2017
Scylla db deck, july 2017
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: Cassandra
 

Más de Acunu

Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinAcunu
 
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsAcunu
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu
 
All Your Base
All Your BaseAll Your Base
All Your BaseAcunu
 
Realtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraRealtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraAcunu
 
Realtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonRealtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonAcunu
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time CassandraAcunu
 
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Acunu
 
Realtime Analytics with Cassandra
Realtime Analytics with CassandraRealtime Analytics with Cassandra
Realtime Analytics with CassandraAcunu
 
Acunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your businessAcunu
 
Realtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraRealtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraAcunu
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Acunu
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsAcunu
 
Next Generation Cassandra
Next Generation CassandraNext Generation Cassandra
Next Generation CassandraAcunu
 
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Acunu
 
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixCassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixAcunu
 
Cassandra EU 2012 - Data modelling workshop by Richard Low
Cassandra EU 2012 - Data modelling workshop by Richard LowCassandra EU 2012 - Data modelling workshop by Richard Low
Cassandra EU 2012 - Data modelling workshop by Richard LowAcunu
 

Más de Acunu (20)

Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on Cassandra
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational Aspirin
 
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra Apps
 
All Your Base
All Your BaseAll Your Base
All Your Base
 
Realtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraRealtime Analytics with Apache Cassandra
Realtime Analytics with Apache Cassandra
 
Realtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonRealtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX London
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
 
Realtime Analytics with Cassandra
Realtime Analytics with CassandraRealtime Analytics with Cassandra
Realtime Analytics with Cassandra
 
Acunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra London
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your business
 
Realtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraRealtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with Cassandra
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
 
Next Generation Cassandra
Next Generation CassandraNext Generation Cassandra
Next Generation Cassandra
 
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
 
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixCassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
 
Cassandra EU 2012 - Data modelling workshop by Richard Low
Cassandra EU 2012 - Data modelling workshop by Richard LowCassandra EU 2012 - Data modelling workshop by Richard Low
Cassandra EU 2012 - Data modelling workshop by Richard Low
 

Último

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Último (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Cassandra EU 2012 - Putting the X Factor into Cassandra

  • 1. PUTTING THE X FACTOR INTO CASSANDRA: ADVENTURES IN COUNTING MALCOLM BOX, CTO, TELLYBUG 1
  • 2. INTRO Malcolm Box, CTO & Co-Founder @malcolmbox malcolm@tellybug.com http://tellybug.com 2
  • 3. 3
  • 4. WHAT I’M TALKING ABOUT How we started using Cassandra How we use it to power the X Factor and Britain’s Got Talent apps Counting - harder than you might think What we learnt along the way 4
  • 5. THE CHALLENGE 10-12 Million people watching these shows TV tells them to buzz/clap/score.... ....Servers melt Design goals to handle 10K interactions/s 5
  • 6. ROLL BACK 1 YEAR We’d won BGT 2011 - our first big talent show Existing MySQL/Django/Python stack Back of envelope calculations....oh dear Needed something quickly that could cope with anticipated load 6
  • 7. OUR FIRST CASSANDRA SCHEMA create column family vote_log with comment='Log of votes' and comparator='UTF8Type' and key_validation_class='UUIDType' and default_validation_class='UTF8Type' and column_metadata = [ {column_name:'ipaddr', validation_class:'AsciiType'}, {column_name:'poll', validation_class:'LongType'}, {column_name:'choice', validation_class:'LongType'}, {column_name:'idtoken', validation_class:'UTF8Type'}, {column_name:'count', validation_class:'LongType'}]; 7
  • 8. WHAT WE LEARNT Cassandra scales beautifully for writes Cassandra has no single point of failure ....but it’s not hard to make it fail Ad-hoc questions and reporting were going to be much slower 8
  • 9. OPERATIONS BGT 2011 was a write only DB Ignored failures One cluster, one AZ Backup to MySQL 9
  • 10. Over 1 Million app downloads Over 260 Million boos/claps X FACTOR 2011 10
  • 11. IMPLEMENTING X FACTOR WITH CASSANDRA Counting Social network No longer write-only 11
  • 12. WHAT ARE MY FRIENDS DOING? Scale makes this hard 10K changes/s Which ones are relevant to which users? When new users (and their social graph) can arrive at any time 12
  • 13. SOLUTION New Column Family - user activity Maps user to their interactions Write problem nicely randomised and thus ideal for Cassandra Read problem! 13
  • 14. COUNTING - HARDER THAN IT LOOKS Everyone can count But we need to count really fast And distribute the results to all the clients 14
  • 15. DISTRIBUTED COUNTING “Memcache does counters” “OK, how about sharding?” “Well, I hear Cassandra 0.8 has counters” 15
  • 16. ASIDE - THINGS THAT CAN’T COUNT #3 cache.set('key', 1) cache.decr('key', 1) >>> 0L cache.decr('key', 1) >>> 0L cache.incr('key', -1) >>> 4294967295L cache.incr('key', 1) >>> 4294967296L 16
  • 18. SINGLE BOX LIMITS We have a single value 17
  • 19. SINGLE BOX LIMITS We have a single value Everything needs to read and write that value - from multiple servers 17
  • 20. SINGLE BOX LIMITS We have a single value Everything needs to read and write that value - from multiple servers EC2 limits Single Memcache server runs out of network I/O What then? 17
  • 21. CASSANDRA HAS COUNTERS New (at the time) feature in Cassandra 0.8 Special column type - CounterColumnType as the validator Distributed 64 bit counter, with eventual consistency CL.ONE writes recommended to avoid implicit reads impacting performance Reads tot up values from replicas to give value Simple functionality incr()/decr(), get() 18
  • 23. CAN CASSANDRA COUNT? Yes, But.... 19
  • 24. CAN CASSANDRA COUNT? Yes, But.... Performance can be an issue Switch off replicate_on_write, tune RF & cluster size 19
  • 25. CAN CASSANDRA COUNT? Yes, But.... Performance can be an issue Switch off replicate_on_write, tune RF & cluster size Not scalable for single counter Scales as function of RF up to 4 nodes Above that ... you’re out of luck Best we achieved is ~10K/s increments to single counter value on EC2 m1.large instances 19
  • 26. CAN CASSANDRA COUNT? Yes, But.... Performance can be an issue Switch off replicate_on_write, tune RF & cluster size Not scalable for single counter Scales as function of RF up to 4 nodes Above that ... you’re out of luck Best we achieved is ~10K/s increments to single counter value on EC2 m1.large instances What do you do if an operation fails? 19
  • 27. COUNTING AT SCALE WITH CASSANDRA Write throughput to a single counter is limited We were inside the performance limit, so writes could go to Cassandra No way to scale within Cassandra (yet) Reads have a serious performance overhead We used sharded counters in memcached with source of truth in Cassandra Few reads from Cass = much more predictable performance 20
  • 28. OPERATIONS Cassandra GUIs & mgmt consoles still in infancy Hard to figure out what is going wrong when performance suffered Analytics (and backup) still via dump to MySQL Flexible, well understood Single cluster, single AZ 21
  • 29. WHERE WE WERE AFTER X FACTOR Cassandra as a source of truth in production Mainly write load Memcached layer on top Simple operations No backups :( 22
  • 30. BEYOND X FACTOR Dancing on Ice - harder counting Britain’s Got Talent 2012 - more social Backups Data integrity 23
  • 31. DATA CONSISTENCY There’s no referential integrity So is the data in the database self-consistent? Or do you have a bug somewhere? How do you validate the data? Truth + 1 24
  • 32. BACKUPS Backing up a cluster isn’t easy Restoring can be harder... 25
  • 33. CONCLUSION Cassandra saved our bacon :) Scales to insane write loads Reads are easier to scale in memcached Beware of limitations on “hot” values Migrating functionality gradually let us learn the operational aspects There are lots of interesting failure scenarios at scale 26
  • 34. TODO Scale-up/Scale-down of a cluster Better monitoring and operations Analytics using Hadoop 27
  • 35. ANY QUESTIONS? We’re hiring - if you want to work on wicked scaling problems and reach millions of users, get in touch! malcolm@tellybug.com @malcolmbox 28

Notas del editor

  1. \n
  2. Who I am.\nBackground in mobile\nNot a Big Data Expert\n\n
  3. Apps that make TV more entertaining\nBig shows, big audiences\nSimple interaction - so we get lots of it\nSmall number of “results”\n
  4. \n
  5. XFactor - over 1M installs, 260 Million boos/claps\n
  6. No way to scale MySQL for single counter write\nHybrid memcache/mysql for values\nWhere to write the audit trail/log of what had happened?\nStep forward Acunu/Cassandra\n
  7. Random partitioner. UUID type\nAnalytics by MySQL\nA write only database\n
  8. E.g. too many connections from the web tier\n\n
  9. \n
  10. \n
  11. Counters - moving production counts from MySQL to Cassandra\nSocial network - challenge if you don’t own the graph\n\n
  12. Splaying writes is normal solution - push everyone’s updates to all their friends\nBut what about friends who aren’t there yet?\n
  13. Cassandra as source of truth and destination for writes.\nMemcache as place to read from - holds social graphs, activity etc. Updated in parallel with Cassandra writes\nA lot of logic to deal with cache misses, and horizontal scaling of the cache\n
  14. BGT used a memcache based counter with write-behind to MySQL\n
  15. \n
  16. Bug in older versions of memcached and pylibmc - now fixed\n
  17. Redis - same sort of issues.\nFundamental limitation of single value living on single box\n\n
  18. Redis - same sort of issues.\nFundamental limitation of single value living on single box\n\n
  19. Redis - same sort of issues.\nFundamental limitation of single value living on single box\n\n
  20. Looked ideal for our needs - move counts out of memcache & MySQL\n
  21. \n
  22. \n
  23. \n
  24. \n
  25. Now multiple levels of inconsistency:\n- Cassandra\n- Central memcache value\n- Sharded counter values on each webserver box\n\nWhat is “the truth”?\n
  26. We saw crashes on too many connections, truncate behaviour etc\n
  27. \n
  28. \n
  29. We have millions of records in the DB - and then counts etc. Are the two consistent?\nIf not, why not?\nWe’ve seen various issues including missing reads, counter values not consistent etc etc\n
  30. Netflix, Rackspace... everyone writes a tool\nTook us a couple of weeks to be able to backup and restore our cluster successfully\n - and another week to figure out whether the data was the same\n
  31. \n
  32. Bursty loads - we need to scale both ways\nMonitoring - we struggle generally with monitoring/alerting/graphing\nBackup & restore to smaller clusters - see Priam from Netflix\nAnalytics - we’ve hit the wall on the get_range() approach\n
  33. \n