SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Cassandra On EC2



Matthew F. Dennis // @mdennis


                         @mdennis
Instance Sizes
●   m1.xlarge is by far the most common size
●   m1.large is ok for many use cases
●   m2.4xlarge in some cases
    ●   keep the entire dataset in memory
●   c1.xlarge / cc1.4xlarge
    ●   Smallish but very hot set of data
        –   regardless of how much data is on disk
    ●   Extremely high request rate
    ●   Encrypted node-node communications and high traffic
    ●   Usually better off with many m1.xlarge instances because of
        the extra memory, but not always

                                                          @mdennis
Configuration
●   Stripe All Ephemeral Drives
●   data directory and commit log on same volume
    ●   Only applies to EC2 and SSDs, not physical HW
    ●   Why?
●   6-8 GB heap on m1.xlarge
●   3-4 GB heap on m1.large
●   Phi Convict Threshold? Maybe ...



                                               @mdennis
EBS versus Ephemeral
●   Ephemeral drives are:
    ●   Generally faster for C*
    ●   More stable (no pauses/freezes; outages?)
    ●   Cheaper
    ●   Easier to initially configure
●   Striped EBS?
    ●   yeah, about that …
●   TL;DL don't use EBS for C* on EC2


                                                @mdennis
Multi-Zone
●   Alternate zones in your token topology
    ●   No really, this is important, alternate zones
        –   We should probably fix this ...
●   “complicated, but possible” to add new zones
    after initial deployment
●   Never move a *token* to a different region or
    zone
    ●   If you think that is what you want to do, really you
        want to bootstrap new one at token-1 in the new
        region/zone and then decom the old one

                                                     @mdennis
Multi-Region C* on EC2
●   Connectivity is the complicated part
    ●   Ec2MultiRegionSnitch is not the entire answer
        –   https://issues.apache.org/jira/browse/CASSANDRA-2452
●   Don't try to make a “fail over” DC, just go with active-active
    ●   If you insist, then do the fail over in your application and configure C* the
        same as you would active-active
●   Generally requires a lot more storage
    ●   Doesn't matter though because you're using ephemeral drives (right?)
        and don't want a TB of data on each node anyway




                                                                       @mdennis
Multi-Region Connectivity Options
●   VPN
●   Encrypted node-node communication
    ●   CPU utilization is often a downside
●   VPNCubed / VPCPlus
    ●   I've never deployed it, heard good things about it though
●   Amazon VPC
    ●   anyone know if a single VPC can span regions yet?
●   SSH Tunnels
●   EC2 security groups
●   IPTables
●   Encrypted node-node + public IP binding + AWS security groups +
    IPTables (EIPs may simplify this, never actually tried it)

                                                                    @mdennis
Recovery From Failures
●   Don't “fix” EC2 nodes, replace them
    ●   boostrap at token-1, remove old token
        –   bootstrap can be slow, but will get better

●   Other than that it's the same in EC2 as not ...




                                                         @mdennis
Node Maintenance
●   “Maintenance” On EC2?
●   Usually not required (just replace the node)
●   If it is, just stop C*, CL+HH/repair/RR will fix it
    ●   Same as physical HW
    ●   https://issues.apache.org/jira/browse/CASSANDRA-2034


●   Stop Trying To Decom Nodes Just To Replace a Disk !!!




                                                    @mdennis
Backups
●   C* snapshots and push to S3
●   Directory Watcher that pushes new files to S3
    ●   SimpleGeo: https://github.com/simplegeo/tablesnap
●   Netflix: http://slidesha.re/NFOnCassBkup
●   Keep a log of all incoming writes
    ●   Not specific to S3
    ●   Can be coupled with snapshots / S3
    ●   Useful for other reasons as well
●   Compression in transit to S3 (or where ever) can be done on
    a separate EC2 instance to avoid burning CPU
    ●   Usually not worth the extra complexity / cost

                                                            @mdennis
Changing Node Sizes
●   Start a new instance
●   rsync data from from original node to new node
●   Shutdown C* on original node
●   rsync data from from original node to new node
●   Start C* on new node
●   Shutdown original instance
●   NB: Assumes same token, region, zone, etc


                                          @mdennis
Elastic Load Balancers
● They're awesome, use them
  ● Could be more awesome (e.g. better integration with Route 53)


  ● What I really want is TCP anycast for ELB across regions (AWS could

    make it work)
● Balance across regions with GeoIP / GeoDNS


  ● Zerigo, TZOHA, Neustar, “homegrown”, etc


  ● Route 53? You wish (though Route 53 itself is run over anycast)


    – “in the future we plan for Route 53 to also give you greater control over
      … the route your users take to reach an endpoint” --Werner Vogels
● Put them in front of your app servers, not your C* instances


● Keep your app servers stateless or at least “weakly” stateless (e.g. no sticky

  sessions required)




                                                                  @mdennis
AMIs versus Scripted Setup
●   DataStax publishes C* AMIs
●   Chef Recipes as well
●   Or roll your own …
●   Whatever you do, just make sure it's automated
    and repeatable
●   *personally* I prefer scripting the setup
    remotely, but this is … “less than ideal”
●   PSSH is, in general, awesome

                                            @mdennis
WTF?!
●   Your zone X is not the same as my zone X
    ●   Consistent within an EC2 account
    ●   Problematic across accounts
    ●   Does not apply to regions (i.e. your region X is my region X)
●   EIPs resolve to private IPs from within AWS
●   EBS volumes sometimes just “freeze”
    ●   AWS: “yeah, that happens sometimes under load”
●   steal% sometimes 20% or more (1%-3% is “normal”)
    ●   This is AWS literally stealing your money
    ●   Thankfully not all that common, but watch out for it

                                                          @mdennis
Missing AWS Features
●   ELB over anycast
    ●   Probably doable by AWS, but not others ...
●   GeoDNS from Route53
    ●   No really, WTF Doesn't Route53 Do GeoDNS ?!?!
●   Multi-Region VPC
●   Local SSDs




                                                 @mdennis
We're Hiring !
●   Developers
●   QA
●   Community Manager
●   Sales / SE
●   Interns
       –   Dev
       –   Support
       –   QA
●   Smart People Interested In Cassandra


                                           @mdennis
Cassandra On EC2



    Q?
 (yes, I'll post the slides on slideshare)



                                         @mdennis

Más contenido relacionado

La actualidad más candente

ops300 Week5 storage (1)
ops300 Week5 storage (1)ops300 Week5 storage (1)
ops300 Week5 storage (1)trayyoo
 
Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingSpeeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingErik Osterman
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsSematext Group, Inc.
 
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...DataStax Academy
 
G1: To Infinity and Beyond
G1: To Infinity and BeyondG1: To Infinity and Beyond
G1: To Infinity and BeyondScyllaDB
 
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)Ontico
 
Tuning Linux for Databases.
Tuning Linux for Databases.Tuning Linux for Databases.
Tuning Linux for Databases.Alexey Lesovsky
 
Storage based snapshots for KVM VMs in CloudStack
Storage based snapshots for KVM VMs in CloudStackStorage based snapshots for KVM VMs in CloudStack
Storage based snapshots for KVM VMs in CloudStackShapeBlue
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingMatthew Dennis
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Ceph Community
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryScyllaDB
 

La actualidad más candente (18)

Shootout at the AWS Corral
Shootout at the AWS CorralShootout at the AWS Corral
Shootout at the AWS Corral
 
92 grand prix_2013
92 grand prix_201392 grand prix_2013
92 grand prix_2013
 
ops300 Week5 storage (1)
ops300 Week5 storage (1)ops300 Week5 storage (1)
ops300 Week5 storage (1)
 
Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingSpeeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using Starling
 
Long Term Road Test of C*
Long Term Road Test of C*Long Term Road Test of C*
Long Term Road Test of C*
 
Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
 
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
 
G1: To Infinity and Beyond
G1: To Infinity and BeyondG1: To Infinity and Beyond
G1: To Infinity and Beyond
 
Redis ndc2013
Redis ndc2013Redis ndc2013
Redis ndc2013
 
7 Ways To Crash Postgres
7 Ways To Crash Postgres7 Ways To Crash Postgres
7 Ways To Crash Postgres
 
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
 
Tuning Linux for Databases.
Tuning Linux for Databases.Tuning Linux for Databases.
Tuning Linux for Databases.
 
Storage based snapshots for KVM VMs in CloudStack
Storage based snapshots for KVM VMs in CloudStackStorage based snapshots for KVM VMs in CloudStack
Storage based snapshots for KVM VMs in CloudStack
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data Modeling
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
 
Fail over fail_back
Fail over fail_backFail over fail_back
Fail over fail_back
 

Similar a Cassandra On EC2

Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistencyseldo
 
Ceph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wildCeph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wildCeph Community
 
Efficient Buffer Management
Efficient Buffer ManagementEfficient Buffer Management
Efficient Buffer Managementbasisspace
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013dotCloud
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Docker, Inc.
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data Omid Vahdaty
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
A vision of persistence
A vision of persistenceA vision of persistence
A vision of persistenceDocker, Inc.
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...Yandex
 
Truemotion Adventures in Containerization
Truemotion Adventures in ContainerizationTruemotion Adventures in Containerization
Truemotion Adventures in ContainerizationRyan Hunter
 
Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)Jérôme Petazzoni
 
Taking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and DecideTaking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and DecideBret Fisher
 
Taking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and DecideTaking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and DecideDocker, Inc.
 
Coredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverCoredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverYann Hamon
 
Cassandra from tarball to production
Cassandra   from tarball to productionCassandra   from tarball to production
Cassandra from tarball to productionRon Kuris
 
Kubernetes lessons learned
Kubernetes lessons learnedKubernetes lessons learned
Kubernetes lessons learnedPaul Guth
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xrkr10
 
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun DuynsteeSolr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun Duynsteelucenerevolution
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmAnne Nicolas
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySematext Group, Inc.
 

Similar a Cassandra On EC2 (20)

Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistency
 
Ceph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wildCeph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wild
 
Efficient Buffer Management
Efficient Buffer ManagementEfficient Buffer Management
Efficient Buffer Management
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
A vision of persistence
A vision of persistenceA vision of persistence
A vision of persistence
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
 
Truemotion Adventures in Containerization
Truemotion Adventures in ContainerizationTruemotion Adventures in Containerization
Truemotion Adventures in Containerization
 
Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)
 
Taking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and DecideTaking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and Decide
 
Taking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and DecideTaking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and Decide
 
Coredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverCoredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS server
 
Cassandra from tarball to production
Cassandra   from tarball to productionCassandra   from tarball to production
Cassandra from tarball to production
 
Kubernetes lessons learned
Kubernetes lessons learnedKubernetes lessons learned
Kubernetes lessons learned
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
 
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun DuynsteeSolr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
 

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Cassandra On EC2

  • 1. Cassandra On EC2 Matthew F. Dennis // @mdennis @mdennis
  • 2. Instance Sizes ● m1.xlarge is by far the most common size ● m1.large is ok for many use cases ● m2.4xlarge in some cases ● keep the entire dataset in memory ● c1.xlarge / cc1.4xlarge ● Smallish but very hot set of data – regardless of how much data is on disk ● Extremely high request rate ● Encrypted node-node communications and high traffic ● Usually better off with many m1.xlarge instances because of the extra memory, but not always @mdennis
  • 3. Configuration ● Stripe All Ephemeral Drives ● data directory and commit log on same volume ● Only applies to EC2 and SSDs, not physical HW ● Why? ● 6-8 GB heap on m1.xlarge ● 3-4 GB heap on m1.large ● Phi Convict Threshold? Maybe ... @mdennis
  • 4. EBS versus Ephemeral ● Ephemeral drives are: ● Generally faster for C* ● More stable (no pauses/freezes; outages?) ● Cheaper ● Easier to initially configure ● Striped EBS? ● yeah, about that … ● TL;DL don't use EBS for C* on EC2 @mdennis
  • 5. Multi-Zone ● Alternate zones in your token topology ● No really, this is important, alternate zones – We should probably fix this ... ● “complicated, but possible” to add new zones after initial deployment ● Never move a *token* to a different region or zone ● If you think that is what you want to do, really you want to bootstrap new one at token-1 in the new region/zone and then decom the old one @mdennis
  • 6. Multi-Region C* on EC2 ● Connectivity is the complicated part ● Ec2MultiRegionSnitch is not the entire answer – https://issues.apache.org/jira/browse/CASSANDRA-2452 ● Don't try to make a “fail over” DC, just go with active-active ● If you insist, then do the fail over in your application and configure C* the same as you would active-active ● Generally requires a lot more storage ● Doesn't matter though because you're using ephemeral drives (right?) and don't want a TB of data on each node anyway @mdennis
  • 7. Multi-Region Connectivity Options ● VPN ● Encrypted node-node communication ● CPU utilization is often a downside ● VPNCubed / VPCPlus ● I've never deployed it, heard good things about it though ● Amazon VPC ● anyone know if a single VPC can span regions yet? ● SSH Tunnels ● EC2 security groups ● IPTables ● Encrypted node-node + public IP binding + AWS security groups + IPTables (EIPs may simplify this, never actually tried it) @mdennis
  • 8. Recovery From Failures ● Don't “fix” EC2 nodes, replace them ● boostrap at token-1, remove old token – bootstrap can be slow, but will get better ● Other than that it's the same in EC2 as not ... @mdennis
  • 9. Node Maintenance ● “Maintenance” On EC2? ● Usually not required (just replace the node) ● If it is, just stop C*, CL+HH/repair/RR will fix it ● Same as physical HW ● https://issues.apache.org/jira/browse/CASSANDRA-2034 ● Stop Trying To Decom Nodes Just To Replace a Disk !!! @mdennis
  • 10. Backups ● C* snapshots and push to S3 ● Directory Watcher that pushes new files to S3 ● SimpleGeo: https://github.com/simplegeo/tablesnap ● Netflix: http://slidesha.re/NFOnCassBkup ● Keep a log of all incoming writes ● Not specific to S3 ● Can be coupled with snapshots / S3 ● Useful for other reasons as well ● Compression in transit to S3 (or where ever) can be done on a separate EC2 instance to avoid burning CPU ● Usually not worth the extra complexity / cost @mdennis
  • 11. Changing Node Sizes ● Start a new instance ● rsync data from from original node to new node ● Shutdown C* on original node ● rsync data from from original node to new node ● Start C* on new node ● Shutdown original instance ● NB: Assumes same token, region, zone, etc @mdennis
  • 12. Elastic Load Balancers ● They're awesome, use them ● Could be more awesome (e.g. better integration with Route 53) ● What I really want is TCP anycast for ELB across regions (AWS could make it work) ● Balance across regions with GeoIP / GeoDNS ● Zerigo, TZOHA, Neustar, “homegrown”, etc ● Route 53? You wish (though Route 53 itself is run over anycast) – “in the future we plan for Route 53 to also give you greater control over … the route your users take to reach an endpoint” --Werner Vogels ● Put them in front of your app servers, not your C* instances ● Keep your app servers stateless or at least “weakly” stateless (e.g. no sticky sessions required) @mdennis
  • 13. AMIs versus Scripted Setup ● DataStax publishes C* AMIs ● Chef Recipes as well ● Or roll your own … ● Whatever you do, just make sure it's automated and repeatable ● *personally* I prefer scripting the setup remotely, but this is … “less than ideal” ● PSSH is, in general, awesome @mdennis
  • 14. WTF?! ● Your zone X is not the same as my zone X ● Consistent within an EC2 account ● Problematic across accounts ● Does not apply to regions (i.e. your region X is my region X) ● EIPs resolve to private IPs from within AWS ● EBS volumes sometimes just “freeze” ● AWS: “yeah, that happens sometimes under load” ● steal% sometimes 20% or more (1%-3% is “normal”) ● This is AWS literally stealing your money ● Thankfully not all that common, but watch out for it @mdennis
  • 15. Missing AWS Features ● ELB over anycast ● Probably doable by AWS, but not others ... ● GeoDNS from Route53 ● No really, WTF Doesn't Route53 Do GeoDNS ?!?! ● Multi-Region VPC ● Local SSDs @mdennis
  • 16. We're Hiring ! ● Developers ● QA ● Community Manager ● Sales / SE ● Interns – Dev – Support – QA ● Smart People Interested In Cassandra @mdennis
  • 17. Cassandra On EC2 Q? (yes, I'll post the slides on slideshare) @mdennis