SlideShare una empresa de Scribd logo
1 de 28
Cassandra In Operation
Niall Milton, CTO, DigBigData
Overview

1.
2.
3.
4.
5.
6.
7.

Physical Infrastructure
Deployment
Management
Monitoring
Troubleshooting
Tooling
Questions?
Physical Infrastructure

Hardware
Topology
Surety vs. Cost
Sizing
When to add nodes?
Hardware : Memory
More is better up to a point.
From 1.2, off-heap bloom filters and compression metadata
mean greater data density per node, less on-heap used but need
more system memory
Respect Java heap limits. 8BG as an upper bound. Java7 has
improved garbage collection, may be able to push it higher.
Be aware of perm gen and native memory also.
Keep an eye on system memory. If memory is nearly exhausted,
system performance degrades across the board
Anecdotal: 20MB RAM per TB
Hardware : CPU

8 - 12 core is recommended
Given efficient i/o, Cassandra is more likely to be CPU
bound under most circumstances
For hosted environments, CPU bursting is useful.
The Big Picture : consider total core availability across
the cluster when sizing
Include cost per core in sizing estimates as cost
impact is higher than disk or memory
Hardware : Disk

DataStax recommendation is to use SSD
Commodity SSD cheaper, but know the limitations. Not
apples and apples. Manufacturers now producing “lightenterprise” products. Enterprise SSD designed for mixed
workloads.
Think 1 disk for commit log, many for data. Anything else
and you risk I/O bottlenecking
With JOBD support >1.2 distributes data over all available
disks and has sensible failure management. No longer need
to use RAID10 or similar
SSD requires update to device queue settings
Topology

Split local DC nodes across multiple racks for rack redundancy
With multi-dc, use LOCAL_QUORUM or lower consistency level for
most operations
If inter-DC network latency is high, you will need to tweak
phi_convict_threshold from default. Same for EC2
Be aware of single node bandwidth limitations. Buying amazing h/w
means you need available bandwidth to exercise it
Know what your latencies are at each network stage. Be aware of
inter-DC firewalls and traffic contention with other resources. (Are
there devices you have no permission to change settings for that
your cluster is impacted by?)
Plan as best you can for all eventualities
Rapid Read Repair feature (2.x) will help cluster performance
Surety vs. Cost

Have a 24x7 ethos. Don’t let a customers suffer because of
caution or tight pockets.
Know your growth rate.
Add nodes pre-emptively even if you are not at your outer
limit.
More nodes spread load across nodes and increase
number of concurrent requests to cluster. Think in terms
of reducing current load per node as well as adding
storage
EC2 has more flexibility but the same rules apply.
Sizing

A single node can comfortably store 3 - 5TB.
SATA has higher density (3TB/disk available) but
limits still hit due to per node limit above.
With both calculate theoretical max. iops
Allow extra for indexes
Run a simulation if possible with dummy data
Allow 10% free space for LCS and worst case 50% for
STCS
Sizing

Cassandra has its own compression, no need to do it
yourself (LZ4 performs better than Snappy)
Don’t serialize objects you may want to run analytics
on in the future
Mixed workloads make projections more difficult due
to compaction and SSTable immutability
Review column data for constants and strings that
don’t need to be stored in long form. (Common
Oversight)
When do we Add Nodes?

Think GROWTH RATE, its ok to be a pessimist, people
will thank you later for it.
Be as accurate as possible in your projections
Use multi-variate projections e.g.storage, request
threads, memory, network i/o, CPU usage
Update projections on a regular basis, they help
convince stakeholders to spend money
Don’t wait till the last minute, provisioning new
hardware nearly always takes longer than planned.
Compaction

Can be a serious thorn in your side if not understood (can
be i/o intensive).
Every write is immutable, highly volatile columns will be
distributed across many SSTables, this affects read
performance
Compaction leads to fewer places to look for data
Compaction also removes deleted or TTL’d columns
Don’t forget gc_grace_seconds when using TTL.
New Hybrid compaction algorithm uses STCS in LCS level 0.
Worth checking out.
Virtual Nodes

Since Cassandra 1.2 virtual nodes are the default
Split token ranges from 1 per node to many (default 256)
Ranges are “shuffled” to be random and non-contiguous
Hot token ranges are unlikely to affect performance as
much as previously
Cluster expansion is more manageable, spreads load
evenly
No more node obesity…
Management

OpsCentre
Chef
Fabric
OpsCentre
Produced by DataStax
Community & Enterprise Flavours
Runs agents on each node
Enterprise version allows addition & removal of nodes
Complete Integration with DataStax Community &
Enterprise
Chef

Very popular in industry
Open Source recipes for different flavours of
Cassandra, vanilla apache, dce.
Chef Server provides visual management
Knife provides comprehensive CLI
Enterprise features like access control and plugin
architecture
Lots of other recipes!
Fabric

Python Library for executing scripts across many
machines at once.
Code like you were local to the machine, fabric takes
care of remote execution
Endless possibilities for templating and management
tasks
Requires scripting skills but can be used for highly
customised deployments without relying on
chef/machine images.
Monitoring

Monitoring Tools
What to monitor?
Monitoring Tools

OpsCentre
Sensu
Zabbix
Nagios
APMs
New Relic
AppDynamics
Etc…
What to Monitor?

All the usual stuff (CPU, Memory, I/O, Disk Space)
Cassandra specific:
R / W latencies
Pending compactions
nodetool tpstats
Cassandra log for exceptions / warnings
Use netstat to monitor concurrent connections
Client
Inter-node
Courtesy: Brendan Gregg

What to Monitor?
Troubleshooting
Garbage Collection / Compaction
Can cause system pauses and affect node
performance
Look in logs for Heap nearly full warnings
Enable GC logging in cassandra-env.sh
Heap dumps
OpsCentre GC Graphs
Troubleshooting
Query Performance
Day 0 performance is one thing, day N is another
Again, try and run simulations
Can degrade over time if schema design is not optimal
Use query tracing periodically to monitor
performance of most common queries
Can be automated as part of monitoring & SLAs
Conclusion

Automate deployment as much as possible
Don’t mix high throughput workloads, have few data
classes per cluster
Invest time in developing an effective monitoring
infrastructure
Update projections frequently
Run simulations (including failure scenarios)
Curiosity is good, experiment with different settings
Questions?

Confused Face?
Training!

Book a Developer or Administrator Cassandra training
course for 17th or 19th February and receive a 33%
discount.
Offer extended to all Meetup Group members on
individual rates only.
Limited places remaining!

Promo Code: MEETUPDUB

Más contenido relacionado

La actualidad más candente

The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...DataStax
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXzznate
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...DataStax
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 
Compaction, Compaction Everywhere
Compaction, Compaction EverywhereCompaction, Compaction Everywhere
Compaction, Compaction EverywhereDataStax Academy
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big DataDataStax Academy
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...DataStax
 
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...DataStax
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraDataStax
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...DataStax
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability Omid Vahdaty
 
MapR Tutorial Series
MapR Tutorial SeriesMapR Tutorial Series
MapR Tutorial Seriesselvaraaju
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016DataStax
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsAcunu
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016DataStax
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSDataStax Academy
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsJulien Anguenot
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Rick Branson
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...DataStax
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 

La actualidad más candente (20)

The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Compaction, Compaction Everywhere
Compaction, Compaction EverywhereCompaction, Compaction Everywhere
Compaction, Compaction Everywhere
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
 
MapR Tutorial Series
MapR Tutorial SeriesMapR Tutorial Series
MapR Tutorial Series
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 

Destacado

Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 

Destacado (7)

Cassandra Metrics
Cassandra MetricsCassandra Metrics
Cassandra Metrics
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 

Similar a Cassandra in Operation

Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareIndicThreads
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesAmazon Web Services
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computersshopnil786
 
Highly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowakiHighly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowakijavier ramirez
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Shardinguzzal basak
 
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...javier ramirez
 
Everything you always wanted to know about highly available distributed datab...
Everything you always wanted to know about highly available distributed datab...Everything you always wanted to know about highly available distributed datab...
Everything you always wanted to know about highly available distributed datab...Codemotion
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And ScalabilityJason Ragsdale
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityRenato Lucindo
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Cassandra from tarball to production
Cassandra   from tarball to productionCassandra   from tarball to production
Cassandra from tarball to productionRon Kuris
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperDavid Walker
 
Blades for HPTC
Blades for HPTCBlades for HPTC
Blades for HPTCGuy Coates
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio, Inc.
 
Low level java programming
Low level java programmingLow level java programming
Low level java programmingPeter Lawrey
 

Similar a Cassandra in Operation (20)

Cassandra admin
Cassandra adminCassandra admin
Cassandra admin
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
 
Highly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowakiHighly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowaki
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Sharding
 
Real world capacity
Real world capacityReal world capacity
Real world capacity
 
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
 
Everything you always wanted to know about highly available distributed datab...
Everything you always wanted to know about highly available distributed datab...Everything you always wanted to know about highly available distributed datab...
Everything you always wanted to know about highly available distributed datab...
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And Scalability
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Cassandra from tarball to production
Cassandra   from tarball to productionCassandra   from tarball to production
Cassandra from tarball to production
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
os
osos
os
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
 
Blades for HPTC
Blades for HPTCBlades for HPTC
Blades for HPTC
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
Low level java programming
Low level java programmingLow level java programming
Low level java programming
 

Último

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Último (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Cassandra in Operation

  • 1. Cassandra In Operation Niall Milton, CTO, DigBigData
  • 4. Hardware : Memory More is better up to a point. From 1.2, off-heap bloom filters and compression metadata mean greater data density per node, less on-heap used but need more system memory Respect Java heap limits. 8BG as an upper bound. Java7 has improved garbage collection, may be able to push it higher. Be aware of perm gen and native memory also. Keep an eye on system memory. If memory is nearly exhausted, system performance degrades across the board Anecdotal: 20MB RAM per TB
  • 5. Hardware : CPU 8 - 12 core is recommended Given efficient i/o, Cassandra is more likely to be CPU bound under most circumstances For hosted environments, CPU bursting is useful. The Big Picture : consider total core availability across the cluster when sizing Include cost per core in sizing estimates as cost impact is higher than disk or memory
  • 6. Hardware : Disk DataStax recommendation is to use SSD Commodity SSD cheaper, but know the limitations. Not apples and apples. Manufacturers now producing “lightenterprise” products. Enterprise SSD designed for mixed workloads. Think 1 disk for commit log, many for data. Anything else and you risk I/O bottlenecking With JOBD support >1.2 distributes data over all available disks and has sensible failure management. No longer need to use RAID10 or similar SSD requires update to device queue settings
  • 7. Topology Split local DC nodes across multiple racks for rack redundancy With multi-dc, use LOCAL_QUORUM or lower consistency level for most operations If inter-DC network latency is high, you will need to tweak phi_convict_threshold from default. Same for EC2 Be aware of single node bandwidth limitations. Buying amazing h/w means you need available bandwidth to exercise it Know what your latencies are at each network stage. Be aware of inter-DC firewalls and traffic contention with other resources. (Are there devices you have no permission to change settings for that your cluster is impacted by?) Plan as best you can for all eventualities Rapid Read Repair feature (2.x) will help cluster performance
  • 8. Surety vs. Cost Have a 24x7 ethos. Don’t let a customers suffer because of caution or tight pockets. Know your growth rate. Add nodes pre-emptively even if you are not at your outer limit. More nodes spread load across nodes and increase number of concurrent requests to cluster. Think in terms of reducing current load per node as well as adding storage EC2 has more flexibility but the same rules apply.
  • 9. Sizing A single node can comfortably store 3 - 5TB. SATA has higher density (3TB/disk available) but limits still hit due to per node limit above. With both calculate theoretical max. iops Allow extra for indexes Run a simulation if possible with dummy data Allow 10% free space for LCS and worst case 50% for STCS
  • 10. Sizing Cassandra has its own compression, no need to do it yourself (LZ4 performs better than Snappy) Don’t serialize objects you may want to run analytics on in the future Mixed workloads make projections more difficult due to compaction and SSTable immutability Review column data for constants and strings that don’t need to be stored in long form. (Common Oversight)
  • 11. When do we Add Nodes? Think GROWTH RATE, its ok to be a pessimist, people will thank you later for it. Be as accurate as possible in your projections Use multi-variate projections e.g.storage, request threads, memory, network i/o, CPU usage Update projections on a regular basis, they help convince stakeholders to spend money Don’t wait till the last minute, provisioning new hardware nearly always takes longer than planned.
  • 12.
  • 13. Compaction Can be a serious thorn in your side if not understood (can be i/o intensive). Every write is immutable, highly volatile columns will be distributed across many SSTables, this affects read performance Compaction leads to fewer places to look for data Compaction also removes deleted or TTL’d columns Don’t forget gc_grace_seconds when using TTL. New Hybrid compaction algorithm uses STCS in LCS level 0. Worth checking out.
  • 14. Virtual Nodes Since Cassandra 1.2 virtual nodes are the default Split token ranges from 1 per node to many (default 256) Ranges are “shuffled” to be random and non-contiguous Hot token ranges are unlikely to affect performance as much as previously Cluster expansion is more manageable, spreads load evenly No more node obesity…
  • 15.
  • 17. OpsCentre Produced by DataStax Community & Enterprise Flavours Runs agents on each node Enterprise version allows addition & removal of nodes Complete Integration with DataStax Community & Enterprise
  • 18. Chef Very popular in industry Open Source recipes for different flavours of Cassandra, vanilla apache, dce. Chef Server provides visual management Knife provides comprehensive CLI Enterprise features like access control and plugin architecture Lots of other recipes!
  • 19. Fabric Python Library for executing scripts across many machines at once. Code like you were local to the machine, fabric takes care of remote execution Endless possibilities for templating and management tasks Requires scripting skills but can be used for highly customised deployments without relying on chef/machine images.
  • 22. What to Monitor? All the usual stuff (CPU, Memory, I/O, Disk Space) Cassandra specific: R / W latencies Pending compactions nodetool tpstats Cassandra log for exceptions / warnings Use netstat to monitor concurrent connections Client Inter-node
  • 24. Troubleshooting Garbage Collection / Compaction Can cause system pauses and affect node performance Look in logs for Heap nearly full warnings Enable GC logging in cassandra-env.sh Heap dumps OpsCentre GC Graphs
  • 25. Troubleshooting Query Performance Day 0 performance is one thing, day N is another Again, try and run simulations Can degrade over time if schema design is not optimal Use query tracing periodically to monitor performance of most common queries Can be automated as part of monitoring & SLAs
  • 26. Conclusion Automate deployment as much as possible Don’t mix high throughput workloads, have few data classes per cluster Invest time in developing an effective monitoring infrastructure Update projections frequently Run simulations (including failure scenarios) Curiosity is good, experiment with different settings
  • 28. Training! Book a Developer or Administrator Cassandra training course for 17th or 19th February and receive a 33% discount. Offer extended to all Meetup Group members on individual rates only. Limited places remaining! Promo Code: MEETUPDUB

Notas del editor

  1. Further Reading : http://www.slideshare.net/planetcassandra/c-summit-2013-practice-makes-perfect-extreme-cassandra-optimization-by-albert-tobey