SlideShare una empresa de Scribd logo
1 de 27
Hindsight is 20/20:
MySQL to Cassandra
Michael Kjellman (@mkjellman)
Barracuda Networks
What I Do
• Build and maintain “real-time” Spam
detection and Web Filter classification
• Java/Perl/C (and bits of everything else)
• Author perlcassa (Perl C* client)
• Frontend? Backend? Customer? Internal?
Broken RAID Card? Bad Disk? I touch it all.
Our C* Cluster
• In production for ~2 years since 0.8
• Running 1.2.5 + minor patches
• 24 nodes in 2 datacenters
• (2) 2TB Hard Drives (no RAID)
• (1) Small SSD for small hot CFs
• 64GB of RAM
• Puppet for management
• Cobbler for deployment
• Target max load at 600GB per node
What is “real-time” exactly?
Our Rewrite by the Numbers
Cassandra
Based
MySQL
Based
Average Application
Latency
2.41ms 5.0ms
Elements in Database 32,836,767 3,946,713
Elements Application
Handles
32,836,767 314,974
Element Seen Prior to
Tracking
1st request Various
Thresholds
Datacenters 2 1
Average Latency of
Automated
Classification
3 seconds 8 minutes
Should you Rewrite?
• How To Survive a Ground-Up Rewrite Without Losing
Your Sanity[1] – Joel Spolsky
• Past engineering decisions preventing
implementation of new business requirements
• New threats smarter and more targeted
[1]http://onstartups.com/tabid/3339/bid/97052/How-To-Survive-a-Ground-Up-Rewrite-Without-Losing-Your-Sanity.aspx
Evolving Legacy Systems
• Even good developers can write sloppy code
• Too much duct tape
– Most layers applied around the database
Hitting the Reset Button
• Plan for continuous failure
• Easily Scalable
• No Single Point of Failure – that you know of
• Many smaller boxes vs. one monolithic box
Whiteboard to Reality
• Get technical buy-in from all parties
• Migrate and rewrite in stages
– Business requirements forced hybrid period with
the old and new systems operated in parallel
Cassandra is Not…
1. Direct MySQL replacement
2. Magic bullet to solve everything
Migrating
• Painful
• Painful
• Painful
• Tons of rewriting
• Tons of regressions
• Did I say painful?
So Why Migrate?
• C* is the best option for persistence tier
• Business success motivation
• Don’t let your database hold you back
Lessons Learned (the good)
• Carefully defining data model up front
• Creating a flexible systems architecture that
adapts well to changes during implementation
• Seriously – “Measure twice, cut once.”
Lessons Learned (the bad)
• Consider migration and delivery requirements
from the very beginning
• Adjust expectations – didn’t expect relying on
legacy systems for so long
• Make syncing data between systems a priority
Tips
1. Start with the queries
2. Think differently regarding reads
3. Syncing and migrating data
4. Don’t use C* as a queue
5. Estimate capacity
6. Automate, Automate, Automate
7. Some maintenance required
1. Start with the Queries
• C* != “#dontneedtothinkaboutmyschema”
• Counters and Composites
• Optimize for use case
– Don’t be afraid of writes. Storage is cheap.
– Optimize to reduce the number of tombstones
2. Think Differently Regarding Reads
• Do you really need all that data at once?
• mysql> SELECT * FROM mysupercooltable
WHERE foo = ‘bar’;
– Slow, but eventually will work
• cqlsh> SELECT * FROM myreallybigcf
WHERE foo = ‘bar’;
– Won’t work. Expect RPC timeout exceptions on reads
generally after ~10,000 rows even with paging
• Our solutions:
– ElasticSearch
– Hadoop/Pig
3. Syncing and Migrating Data
• Sync and migration scripts – take more
seriously than production code
• Design sync to be continuous with both
systems running in parallel during migration
• Prioritize the sync
4. Don’t use C* as a Queue
• Cassandra anti-patterns: Queues and queue-
like datasets[2] – Aleksey Yeschenko
• Tombstones + read performance
• Our solution:
– Kafka (multiple publisher, multiple consumer
durable queue)
[2]http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
5. Estimate Capacity
• Don’t forget the Java heap (8GB Max)
• Plan capacity – today and future
• Stress Tool – profile node and multiply
• MySQL hardware != Cassandra hardware
• New bottlenecks thanks to C* being so
awesome?
• I/O still an important concern with C*
6. Automate, Automate, Automate
• Love your inner Ops self. Distributed systems
move complexity to operations.
• Puppet or something similar (really)
• Learn CCM earlier rather than later
– www.github.com/pcmanus/ccm
7. Some Maintenance Required
• Repairs & Cleanup ops
– automate and run frequently
• Rolling restart meet rolling
repair
• Learn jconsole
• Solution:
– Jolokia (JMX via HTTP)
Where is Barracuda Today?
• 2 years in production with Cassandra
• Definitely the right choice for our persistence
tier
• 2 product lines on C* based system and
another major product in beta
• Achieved “real-time” response
2.0 and Beyond
• Thrift -> CQL
• CQL helps the MySQL to C* migration
– Easier to comprehend / grasp
• Everyone understands SELECT * FROM cf WHERE
key = ‘foo’;
• CAS and other 2.0 features make C* an even
better replacement option for MySQL
C* Community
• Supercalifragilisticexpialidocious community!
• Riak, HBase, Oracle are other options. How is
their dev community?
• Great client support. Great people. Great
motivated developers.
• IRC: #cassandra on freenode
• Mailing List: user@cassandra.apache.org

Más contenido relacionado

La actualidad más candente

Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStaxWebinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
DataStax
 

La actualidad más candente (20)

Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Webinar: DataStax Training - Everything you need to become a Cassandra RockstarWebinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStax
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
Reporting from the Trenches: Intuit & Cassandra
Reporting from the Trenches: Intuit & CassandraReporting from the Trenches: Intuit & Cassandra
Reporting from the Trenches: Intuit & Cassandra
 
Workshop - How to benchmark your database
Workshop - How to benchmark your databaseWorkshop - How to benchmark your database
Workshop - How to benchmark your database
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a Hitch
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stack
 
Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7
 
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
 
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStaxWebinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
 
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
 
Keeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter whatKeeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter what
 
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
 
From PoCs to Production
From PoCs to ProductionFrom PoCs to Production
From PoCs to Production
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 
Oracle to Cassandra Core Concepts Guid Part 1: A new hope
Oracle to Cassandra Core Concepts Guid Part 1: A new hopeOracle to Cassandra Core Concepts Guid Part 1: A new hope
Oracle to Cassandra Core Concepts Guid Part 1: A new hope
 
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
 
DataStax Training – Everything you need to become a Cassandra Rockstar
DataStax Training – Everything you need to become a Cassandra RockstarDataStax Training – Everything you need to become a Cassandra Rockstar
DataStax Training – Everything you need to become a Cassandra Rockstar
 

Destacado

How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
DataStax
 

Destacado (20)

Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Cassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break GlassCassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break Glass
 
Webinar: Don't Leave Your Data in the Dark
Webinar: Don't Leave Your Data in the DarkWebinar: Don't Leave Your Data in the Dark
Webinar: Don't Leave Your Data in the Dark
 
How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
 
Webinar: Eventual Consistency != Hopeful Consistency
Webinar: Eventual Consistency != Hopeful ConsistencyWebinar: Eventual Consistency != Hopeful Consistency
Webinar: Eventual Consistency != Hopeful Consistency
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3
 
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStax
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStaxWebinar | From Zero to 1 Million with Google Cloud Platform and DataStax
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStax
 
Webinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each DayWebinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each Day
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
 
Cassandra Community Webinar | Make Life Easier - An Introduction to Cassandra...
Cassandra Community Webinar | Make Life Easier - An Introduction to Cassandra...Cassandra Community Webinar | Make Life Easier - An Introduction to Cassandra...
Cassandra Community Webinar | Make Life Easier - An Introduction to Cassandra...
 
Webinar: Building Blocks for the Future of Television
Webinar: Building Blocks for the Future of TelevisionWebinar: Building Blocks for the Future of Television
Webinar: Building Blocks for the Future of Television
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
 
Cassandra Community Webinar: Apache Cassandra Internals
Cassandra Community Webinar: Apache Cassandra InternalsCassandra Community Webinar: Apache Cassandra Internals
Cassandra Community Webinar: Apache Cassandra Internals
 
Cassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerCassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super Modeler
 
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
 

Similar a Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known

Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 

Similar a Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known (20)

Hindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to CassandraHindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to Cassandra
 
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael KjellmanC* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Where Django Caching Bust at the Seams
Where Django Caching Bust at the SeamsWhere Django Caching Bust at the Seams
Where Django Caching Bust at the Seams
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the Database
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Performant Django - Ara Anjargolian
Performant Django - Ara AnjargolianPerformant Django - Ara Anjargolian
Performant Django - Ara Anjargolian
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Transitioning From SQL Server to MySQL - Presentation from Percona Live 2016
Transitioning From SQL Server to MySQL - Presentation from Percona Live 2016Transitioning From SQL Server to MySQL - Presentation from Percona Live 2016
Transitioning From SQL Server to MySQL - Presentation from Percona Live 2016
 
ECMDay2015 - Kent Agerlund – Configuration Manager 2012 – A Site Review
ECMDay2015 - Kent Agerlund – Configuration Manager 2012 – A Site ReviewECMDay2015 - Kent Agerlund – Configuration Manager 2012 – A Site Review
ECMDay2015 - Kent Agerlund – Configuration Manager 2012 – A Site Review
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyCassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
 
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
 

Más de DataStax

Más de DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known

  • 1. Hindsight is 20/20: MySQL to Cassandra Michael Kjellman (@mkjellman) Barracuda Networks
  • 2. What I Do • Build and maintain “real-time” Spam detection and Web Filter classification • Java/Perl/C (and bits of everything else) • Author perlcassa (Perl C* client) • Frontend? Backend? Customer? Internal? Broken RAID Card? Bad Disk? I touch it all.
  • 3. Our C* Cluster • In production for ~2 years since 0.8 • Running 1.2.5 + minor patches • 24 nodes in 2 datacenters • (2) 2TB Hard Drives (no RAID) • (1) Small SSD for small hot CFs • 64GB of RAM • Puppet for management • Cobbler for deployment • Target max load at 600GB per node
  • 5.
  • 6. Our Rewrite by the Numbers Cassandra Based MySQL Based Average Application Latency 2.41ms 5.0ms Elements in Database 32,836,767 3,946,713 Elements Application Handles 32,836,767 314,974 Element Seen Prior to Tracking 1st request Various Thresholds Datacenters 2 1 Average Latency of Automated Classification 3 seconds 8 minutes
  • 7. Should you Rewrite? • How To Survive a Ground-Up Rewrite Without Losing Your Sanity[1] – Joel Spolsky • Past engineering decisions preventing implementation of new business requirements • New threats smarter and more targeted [1]http://onstartups.com/tabid/3339/bid/97052/How-To-Survive-a-Ground-Up-Rewrite-Without-Losing-Your-Sanity.aspx
  • 8. Evolving Legacy Systems • Even good developers can write sloppy code • Too much duct tape – Most layers applied around the database
  • 9. Hitting the Reset Button • Plan for continuous failure • Easily Scalable • No Single Point of Failure – that you know of • Many smaller boxes vs. one monolithic box
  • 10. Whiteboard to Reality • Get technical buy-in from all parties • Migrate and rewrite in stages – Business requirements forced hybrid period with the old and new systems operated in parallel
  • 11.
  • 12. Cassandra is Not… 1. Direct MySQL replacement 2. Magic bullet to solve everything
  • 13. Migrating • Painful • Painful • Painful • Tons of rewriting • Tons of regressions • Did I say painful?
  • 14. So Why Migrate? • C* is the best option for persistence tier • Business success motivation • Don’t let your database hold you back
  • 15. Lessons Learned (the good) • Carefully defining data model up front • Creating a flexible systems architecture that adapts well to changes during implementation • Seriously – “Measure twice, cut once.”
  • 16. Lessons Learned (the bad) • Consider migration and delivery requirements from the very beginning • Adjust expectations – didn’t expect relying on legacy systems for so long • Make syncing data between systems a priority
  • 17. Tips 1. Start with the queries 2. Think differently regarding reads 3. Syncing and migrating data 4. Don’t use C* as a queue 5. Estimate capacity 6. Automate, Automate, Automate 7. Some maintenance required
  • 18. 1. Start with the Queries • C* != “#dontneedtothinkaboutmyschema” • Counters and Composites • Optimize for use case – Don’t be afraid of writes. Storage is cheap. – Optimize to reduce the number of tombstones
  • 19. 2. Think Differently Regarding Reads • Do you really need all that data at once? • mysql> SELECT * FROM mysupercooltable WHERE foo = ‘bar’; – Slow, but eventually will work • cqlsh> SELECT * FROM myreallybigcf WHERE foo = ‘bar’; – Won’t work. Expect RPC timeout exceptions on reads generally after ~10,000 rows even with paging • Our solutions: – ElasticSearch – Hadoop/Pig
  • 20. 3. Syncing and Migrating Data • Sync and migration scripts – take more seriously than production code • Design sync to be continuous with both systems running in parallel during migration • Prioritize the sync
  • 21. 4. Don’t use C* as a Queue • Cassandra anti-patterns: Queues and queue- like datasets[2] – Aleksey Yeschenko • Tombstones + read performance • Our solution: – Kafka (multiple publisher, multiple consumer durable queue) [2]http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
  • 22. 5. Estimate Capacity • Don’t forget the Java heap (8GB Max) • Plan capacity – today and future • Stress Tool – profile node and multiply • MySQL hardware != Cassandra hardware • New bottlenecks thanks to C* being so awesome? • I/O still an important concern with C*
  • 23. 6. Automate, Automate, Automate • Love your inner Ops self. Distributed systems move complexity to operations. • Puppet or something similar (really) • Learn CCM earlier rather than later – www.github.com/pcmanus/ccm
  • 24. 7. Some Maintenance Required • Repairs & Cleanup ops – automate and run frequently • Rolling restart meet rolling repair • Learn jconsole • Solution: – Jolokia (JMX via HTTP)
  • 25. Where is Barracuda Today? • 2 years in production with Cassandra • Definitely the right choice for our persistence tier • 2 product lines on C* based system and another major product in beta • Achieved “real-time” response
  • 26. 2.0 and Beyond • Thrift -> CQL • CQL helps the MySQL to C* migration – Easier to comprehend / grasp • Everyone understands SELECT * FROM cf WHERE key = ‘foo’; • CAS and other 2.0 features make C* an even better replacement option for MySQL
  • 27. C* Community • Supercalifragilisticexpialidocious community! • Riak, HBase, Oracle are other options. How is their dev community? • Great client support. Great people. Great motivated developers. • IRC: #cassandra on freenode • Mailing List: user@cassandra.apache.org

Notas del editor

  1. -usage changed and significantly increased
  2. It’s never really real timeIs it 1 second? 3 seconds? 1 hour?When do you have a business problem due to the fact you are not “real-time” enough?
  3. -We had a technical “realtime” issue that translated (more importantly) to a business problem. We weren’t catching spam fast enough.-Example: vimaseg.com.br -> 8 minutes from the first hit to classified translated into 180 messages in customers inboxes-How to close that gap to near zero?-New system classified the same domain in 3 seconds from the first hit. 0 messages in customers inboxes
  4. Our Rewrite by the numbers
  5. -The data grows as business continues to grow and there is a need to consolidate and aggregate data across products and systems
  6. -What does “legacy” bring to mind at most companies. Ops team ducktape (The data has a life of its own)-Over time, the various layers of duck tape make operations harder and hardersystems built with good intentions but frequently hit an inflection point where the underlying database problem can’t be fixed anymore-ducktape isn’t good enough anymoreadd a slave-addmemcache-attempt to better batch queries
  7. -If the legacy system is preventing implementation, then new system design is required-our inflection point: throwing away valuable data to keep the system stable-five years ago, continuous failure in your persistence tier was virtually unthinkable five years ago
  8. -Getting technical buy in from all parties that C* and other tools were the “right” tool going forwardHad to engineer our migration and rewrite in stages to provide tangible business value earlierCouldn’t just “go away” for a year and promise a perfect solution sometime down the roadBusiness requirements forced hybrid period with the old and new systems operated in parallelGetting technical buy-in
  9. -The up front costs are high, but the ability to implement anything going forward is a powerful proposition.
  10. -the old problems won’t go away during the migration-prepare to manage expectations that things might get worse before they get better
  11. C* != “#dontneedtothinkaboutmyschema”Counters and CompositesOptimize for use caseDon’t be afraid of writes. Storage is cheap. If multiple writes make for a cleaner, simpler read path, do it.Optimize to reduce the number of tombstones
  12. -talk about the first iteration, where I also tried the select * approach to prefill our cache. Not necessary and more importantly bad design.-mysql / relational database mentality of batch retrieval-possible to get the same result, but required different thinking and logic
  13. Almost impossible to get it right the first time-give example of elements that were in MySQL incorrectly with a timestamp of 0 for the epoch. I incorrectly assumed that > 0 would be valid. Our initial sync missed all elements with the incorrect timestamp of 0-how we had to split up our sync code into pieces-how important is the speed of your syncing
  14. -give example of bcd, where to remove and make external changes in the hashtable, bcd would read every n seconds from a mysql (select *) and then delete all after retrieving the records-goes back to article number 2
  15. -If MySQL was the bottleneck before, after migrating to C* other elements might now become the bottleneck
  16. -Deploying changes to distributed systems is more complicated and more prone to human error-give example of person who tried to manually upgrade 30+ node cluster and made human error which resulted in app being down-with distributed systems comes more complication, and minor mistakes can lead to cascading failures