SlideShare a Scribd company logo
1 of 38
Cassandra tuning - above and beyond
Matija Gobec
Co-founder & Senior Consultant @ SmartCat.io
© DataStax, All Rights Reserved.
Why this talk
We were challenged with an interesting requirement…
“99.999%”
2
© DataStax, All Rights Reserved.
1 Initial investigation and setup
2 Metrics and reporting
3 Test setup
4 AWS deployment
5 Did we make it?
3
© DataStax, All Rights Reserved.
What makes a distributed system?
A bunch of stuff that magically works together
4
© DataStax, All Rights Reserved.
How to start?
Investigate the current setup (if any)
Understand your use case
Understand your data
Set a base configuration
Define target performance (goal)
5
© DataStax, All Rights Reserved.
Initial investigation
• What type of deployment are you working with?
• What is the available hardware?
• CPU cores and threads
• Memory amount and type
• Storage size and type
• Network interfaces amount and type
• Limitations
6
Hardware and setup
© DataStax, All Rights Reserved.
Hardware configuration
8-16 cores
32GB ram
Commit log SSD
Data drive SSD
10GbE
Placement groups
Availability zones
Enhanced networking
8
© DataStax, All Rights Reserved.
OS - Swap, storage, cpu
1. Swap is bad
• remove swap from stab
• disable swap: swapoff -a
2. Optimize block layer
• echo 1 > /sys/block/XXX/queue/nomerges
• echo 8 > /sys/block/XXX/queue/read_ahead_kb
• echo deadline > /sys/block/XXX/queue/scheduler
3. Disable cpu scaling
9
© DataStax, All Rights Reserved.
sysctl.d - network
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_ecn = 0
net.ipv4.tcp_window_scaling = 1
net.ipv4.ip_local_port_range = 10000 65535
net.ipv4.tcp_tw_recycle = 1
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.somaxconn = 4096
net.core.netdev_max_backlog = 16384
10
# read buffer space allocatable in units of pages
# write buffer space allocatable in units of pages
# disable explicit congestion notification
# enable window scaling (higher throughput)
# allowed local port range
# enable fast time-wait recycle
# max socket receive buffer in bytes
# max socket send buffer in bytes
# number of incoming connections
# incoming connections backlog
© DataStax, All Rights Reserved.
sysctl.d - vm and fs
11
vm.swappiness = 1
vm.max_map_count = 1073741824
vm.dirty_background_bytes = 10485760
vm.dirty_bytes = 1073741824
fs.file-max = 1073741824
vm.min_free_kbytes = 1048576
# memory swapping threshold
# max memory map areas a process can have
# dirty memory amount threshold (kernel)
# dirty memory amount threshold (process)
# max number of open files
# min number of VM free kilobytes
© DataStax, All Rights Reserved.
JVM - CMS
MAX_HEAP_SIZE=“8G" # Good starting point
HEAP_NEWSIZE=“2G" # Good starting point
JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"
JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking”
# Tunable settings
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=2"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=16"
JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"
JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=4096”
# Instagram settings
JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"
JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=60000"
JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=30000"
12
© DataStax, All Rights Reserved.
JVM - G1GC
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25”
JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16” # Set to number of full cores
JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16” # Set to number of full cores
13
© DataStax, All Rights Reserved.
Cassandra
concurrent_reads: 128
concurrent_writes: 128
concurrent_counter_writes: 128
memtable_allocation_type: heap_buffers
memtable_flush_writers: 8
memtable_cleanup_threshold: 0.15
memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 2048
trickle_fsync: true
trickle_fsync_interval_in_kb: 1024
internode_compression: dc
14
Data model and compaction strategy
© DataStax, All Rights Reserved.
Data model
Data model impacts performance a lot
Optimize so that you read from one partition
Make sure your data can be distributed
SSTable compression depending on the use case
16
© DataStax, All Rights Reserved.
Compaction strategy
1. Size tiered compaction strategy
• Good as a default
• Performance and size constraints
2. Leveled compaction strategy
• Great for low latency read requirements
• Constant compactions
3. Date tiered / Time window compaction strategy
• Good fit for time series use cases
17
© DataStax, All Rights Reserved.
Ok, what now?
After we set the base configuration it’s time for testing and observing
18
Metrics and reporting stack
© DataStax, All Rights Reserved.
Metrics and reporting stack
OS metrics (SmartCat)
Metrics reporter config (AddThis)
Cassandra diagnostics (SmartCat)
Filebeat
Riemann
InfluxDB
Grafana
Elasticsearch
Logstash
Kibana
20
© DataStax, All Rights Reserved.
Grafana
21
© DataStax, All Rights Reserved.
Kibana
22
© DataStax, All Rights Reserved.
Slow queries
Track query execution times above some threshold
Gain insights into the long processing queries
Relate that to what’s going on on the node
Compare app and cluster slow queries
https://github.com/smartcat-labs/cassandra-diagnostics
23
© DataStax, All Rights Reserved.
Slow queries - cluster
24
© DataStax, All Rights Reserved.
Slow queries - cluster vs app
25
© DataStax, All Rights Reserved.
Ops center
Pros:
Great when starting out
Everything you need in a nice GUI
Cluster metrics
Cons:
Metrics stored in the same cluster
Issues with some of the services (repair, slow query,...)
Additional agents on the nodes
26
Test setup
© DataStax, All Rights Reserved.
Test setup
Make sure you have repeatable tests
Fixed rate tests
Variable rate tests
Production like tests
Cassandra Stress
Various loadgen tools (gatling, wrk, loader,...)
28
© DataStax, All Rights Reserved.
Coordinated omission
29
© DataStax, All Rights Reserved.
Tuning methodology
30
AWS
© DataStax, All Rights Reserved.
AWS deployment
Choose your instance based on calculations
Use placement groups and availability zones
Don’t overdo it just because you can ($$$)
Are you sure you need ephemeral storage?
Go for EBS volumes (gp2)
32
© DataStax, All Rights Reserved.
EBS volumes
Pros:
3.4TB+ volume has 10.000 IOPs
Average latency is ~0.38ms
Durable across reboots
AWS snapshots
Can be attached/detached
Easy to recreate
33
Cons:
Rare latency spikes
Average latency is ~0.38ms
Degrading factor
© DataStax, All Rights Reserved.
EBS volumes - problems
34
© DataStax, All Rights Reserved.
End result
Did we meet our goal?
Can we go any further?
35
© DataStax, All Rights Reserved.
Whats next?
Torture testing
Failure scenarios
Latency and delay inducers
Automate everything
36
Q&A
Thank you
Matija Gobec
matija@smartcat.io
@mad_max0204
smartcat-labs.github.io
smartcat.io

More Related Content

What's hot

Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...
Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...
Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...DataStax Academy
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...DataStax
 
Aleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDAAleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDAAndrejs Vorobjovs
 
Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?DataStax
 
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...Databricks
 
Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Sergio Zenatti Filho
 
The new big data
The new big dataThe new big data
The new big dataAdam Doyle
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxDataStax
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax
 
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidSpeeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidAlluxio, Inc.
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
Cloudian HyperStore Operating Environment
Cloudian HyperStore Operating EnvironmentCloudian HyperStore Operating Environment
Cloudian HyperStore Operating EnvironmentCloudian
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Querying Druid in SQL with Superset
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with SupersetDataWorks Summit
 
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data StoresPresto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data StoresAlluxio, Inc.
 
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
 Improving Apache Spark™ In-Memory Computing with Apache Ignite™ Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Improving Apache Spark™ In-Memory Computing with Apache Ignite™Tom Diederich
 
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...DataStax
 

What's hot (20)

Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...
Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...
Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
 
Aleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDAAleksejs Nemirovskis - Manage your data using oracle BDA
Aleksejs Nemirovskis - Manage your data using oracle BDA
 
Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?
 
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
 
Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory
 
The new big data
The new big dataThe new big data
The new big data
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStax
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
 
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidSpeeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Cloudian HyperStore Operating Environment
Cloudian HyperStore Operating EnvironmentCloudian HyperStore Operating Environment
Cloudian HyperStore Operating Environment
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Querying Druid in SQL with Superset
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with Superset
 
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data StoresPresto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
 
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
 Improving Apache Spark™ In-Memory Computing with Apache Ignite™ Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
 
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
 

Similar to Cassandra Tuning - above and beyond

Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)Apache Apex
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0Vinay Kumar Chella
 
Predictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-timePredictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-timeAerospike, Inc.
 
Load Testing Cassandra Applications (Ben Slater, Instaclustr) | C* Summit 2016
Load Testing Cassandra Applications (Ben Slater, Instaclustr) | C* Summit 2016Load Testing Cassandra Applications (Ben Slater, Instaclustr) | C* Summit 2016
Load Testing Cassandra Applications (Ben Slater, Instaclustr) | C* Summit 2016DataStax
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applicationsBen Slater
 
Load Testing Cassandra Applications
Load Testing Cassandra Applications Load Testing Cassandra Applications
Load Testing Cassandra Applications Instaclustr
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engineBhuvaneshwaran R
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionSearce Inc
 
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28Amazon Web Services
 
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介オラクルエンジニア通信
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Community
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...confluent
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAlluxio, Inc.
 

Similar to Cassandra Tuning - above and beyond (20)

Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0
 
Predictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-timePredictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-time
 
Load Testing Cassandra Applications (Ben Slater, Instaclustr) | C* Summit 2016
Load Testing Cassandra Applications (Ben Slater, Instaclustr) | C* Summit 2016Load Testing Cassandra Applications (Ben Slater, Instaclustr) | C* Summit 2016
Load Testing Cassandra Applications (Ben Slater, Instaclustr) | C* Summit 2016
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
 
Load Testing Cassandra Applications
Load Testing Cassandra Applications Load Testing Cassandra Applications
Load Testing Cassandra Applications
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engine
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in Production
 
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
 
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
SFHUG Kudu Talk
SFHUG Kudu TalkSFHUG Kudu Talk
SFHUG Kudu Talk
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
 
Apache cassandra v4.0
Apache cassandra v4.0Apache cassandra v4.0
Apache cassandra v4.0
 

Recently uploaded

data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spaintimesproduction05
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 

Recently uploaded (20)

data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 

Cassandra Tuning - above and beyond

  • 1. Cassandra tuning - above and beyond Matija Gobec Co-founder & Senior Consultant @ SmartCat.io
  • 2. © DataStax, All Rights Reserved. Why this talk We were challenged with an interesting requirement… “99.999%” 2
  • 3. © DataStax, All Rights Reserved. 1 Initial investigation and setup 2 Metrics and reporting 3 Test setup 4 AWS deployment 5 Did we make it? 3
  • 4. © DataStax, All Rights Reserved. What makes a distributed system? A bunch of stuff that magically works together 4
  • 5. © DataStax, All Rights Reserved. How to start? Investigate the current setup (if any) Understand your use case Understand your data Set a base configuration Define target performance (goal) 5
  • 6. © DataStax, All Rights Reserved. Initial investigation • What type of deployment are you working with? • What is the available hardware? • CPU cores and threads • Memory amount and type • Storage size and type • Network interfaces amount and type • Limitations 6
  • 8. © DataStax, All Rights Reserved. Hardware configuration 8-16 cores 32GB ram Commit log SSD Data drive SSD 10GbE Placement groups Availability zones Enhanced networking 8
  • 9. © DataStax, All Rights Reserved. OS - Swap, storage, cpu 1. Swap is bad • remove swap from stab • disable swap: swapoff -a 2. Optimize block layer • echo 1 > /sys/block/XXX/queue/nomerges • echo 8 > /sys/block/XXX/queue/read_ahead_kb • echo deadline > /sys/block/XXX/queue/scheduler 3. Disable cpu scaling 9
  • 10. © DataStax, All Rights Reserved. sysctl.d - network net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 net.ipv4.tcp_ecn = 0 net.ipv4.tcp_window_scaling = 1 net.ipv4.ip_local_port_range = 10000 65535 net.ipv4.tcp_tw_recycle = 1 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.core.somaxconn = 4096 net.core.netdev_max_backlog = 16384 10 # read buffer space allocatable in units of pages # write buffer space allocatable in units of pages # disable explicit congestion notification # enable window scaling (higher throughput) # allowed local port range # enable fast time-wait recycle # max socket receive buffer in bytes # max socket send buffer in bytes # number of incoming connections # incoming connections backlog
  • 11. © DataStax, All Rights Reserved. sysctl.d - vm and fs 11 vm.swappiness = 1 vm.max_map_count = 1073741824 vm.dirty_background_bytes = 10485760 vm.dirty_bytes = 1073741824 fs.file-max = 1073741824 vm.min_free_kbytes = 1048576 # memory swapping threshold # max memory map areas a process can have # dirty memory amount threshold (kernel) # dirty memory amount threshold (process) # max number of open files # min number of VM free kilobytes
  • 12. © DataStax, All Rights Reserved. JVM - CMS MAX_HEAP_SIZE=“8G" # Good starting point HEAP_NEWSIZE=“2G" # Good starting point JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem" JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking” # Tunable settings JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=2" JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=16" JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions" JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=4096” # Instagram settings JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark" JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=60000" JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=30000" 12
  • 13. © DataStax, All Rights Reserved. JVM - G1GC JVM_OPTS="$JVM_OPTS -XX:+UseG1GC" JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500" JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5" JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25” JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16” # Set to number of full cores JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16” # Set to number of full cores 13
  • 14. © DataStax, All Rights Reserved. Cassandra concurrent_reads: 128 concurrent_writes: 128 concurrent_counter_writes: 128 memtable_allocation_type: heap_buffers memtable_flush_writers: 8 memtable_cleanup_threshold: 0.15 memtable_heap_space_in_mb: 2048 memtable_offheap_space_in_mb: 2048 trickle_fsync: true trickle_fsync_interval_in_kb: 1024 internode_compression: dc 14
  • 15. Data model and compaction strategy
  • 16. © DataStax, All Rights Reserved. Data model Data model impacts performance a lot Optimize so that you read from one partition Make sure your data can be distributed SSTable compression depending on the use case 16
  • 17. © DataStax, All Rights Reserved. Compaction strategy 1. Size tiered compaction strategy • Good as a default • Performance and size constraints 2. Leveled compaction strategy • Great for low latency read requirements • Constant compactions 3. Date tiered / Time window compaction strategy • Good fit for time series use cases 17
  • 18. © DataStax, All Rights Reserved. Ok, what now? After we set the base configuration it’s time for testing and observing 18
  • 20. © DataStax, All Rights Reserved. Metrics and reporting stack OS metrics (SmartCat) Metrics reporter config (AddThis) Cassandra diagnostics (SmartCat) Filebeat Riemann InfluxDB Grafana Elasticsearch Logstash Kibana 20
  • 21. © DataStax, All Rights Reserved. Grafana 21
  • 22. © DataStax, All Rights Reserved. Kibana 22
  • 23. © DataStax, All Rights Reserved. Slow queries Track query execution times above some threshold Gain insights into the long processing queries Relate that to what’s going on on the node Compare app and cluster slow queries https://github.com/smartcat-labs/cassandra-diagnostics 23
  • 24. © DataStax, All Rights Reserved. Slow queries - cluster 24
  • 25. © DataStax, All Rights Reserved. Slow queries - cluster vs app 25
  • 26. © DataStax, All Rights Reserved. Ops center Pros: Great when starting out Everything you need in a nice GUI Cluster metrics Cons: Metrics stored in the same cluster Issues with some of the services (repair, slow query,...) Additional agents on the nodes 26
  • 28. © DataStax, All Rights Reserved. Test setup Make sure you have repeatable tests Fixed rate tests Variable rate tests Production like tests Cassandra Stress Various loadgen tools (gatling, wrk, loader,...) 28
  • 29. © DataStax, All Rights Reserved. Coordinated omission 29
  • 30. © DataStax, All Rights Reserved. Tuning methodology 30
  • 31. AWS
  • 32. © DataStax, All Rights Reserved. AWS deployment Choose your instance based on calculations Use placement groups and availability zones Don’t overdo it just because you can ($$$) Are you sure you need ephemeral storage? Go for EBS volumes (gp2) 32
  • 33. © DataStax, All Rights Reserved. EBS volumes Pros: 3.4TB+ volume has 10.000 IOPs Average latency is ~0.38ms Durable across reboots AWS snapshots Can be attached/detached Easy to recreate 33 Cons: Rare latency spikes Average latency is ~0.38ms Degrading factor
  • 34. © DataStax, All Rights Reserved. EBS volumes - problems 34
  • 35. © DataStax, All Rights Reserved. End result Did we meet our goal? Can we go any further? 35
  • 36. © DataStax, All Rights Reserved. Whats next? Torture testing Failure scenarios Latency and delay inducers Automate everything 36
  • 37. Q&A