SlideShare una empresa de Scribd logo
1 de 59
© DataStax, All Rights Reserved.Confidential
Understand Apache
Cassandra Performance
Through Metrics:
A Beginner’s Guide
1 © DataStax, All Rights Reserved. Confidential
MAY 21 - 23, 2019
Gaylord National Resort & Convention Center Maryland
Why ?
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Agenda
● Basic Concepts in Cassandra Architecture
● How Do You Begin To Understand
Performance of a Real-time Database
● What Tools are Available
● What are the Most Important Metrics
MAY 21 - 23, 2019
Gaylord National Resort & Convention Center Maryland
Cassandra Concepts
© DataStax, All Rights Reserved.Confidential
Masterless / Peer-to-Peer Architecture
● All nodes are the same, owning a piece of data
● Availability
− No special “master”, “leader”, etc
− No fragility; no single-point-of-failure
− No “failover”
● Scalability
− All nodes host data, but also serve queries
− More data? More nodes.
− More queries? More nodes.
5
Client
© DataStax, All Rights Reserved.Confidential
Coordinator, Replica and Client
● No single point of failure
● All data replicated
− Replication automatically handled
− All replicas are equal
● Any client can connect to any node
and read/write the data they need
● Any node can be:
−Coordinator
−Storage/Replica Nodes
6
Client
MAY 21 - 23, 2019
Gaylord National Resort & Convention Center Maryland
Key Concepts in
Real-time Database Performance
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Throughput and Latency
● Throughput: rate of operations
● Latency: time takes for one operation
● Sustainable Throughput
− “achieving throughput while safely maintaining
SLA” – Gil Tene
− Don’t measure latency at saturation
● System Resources
− Utilization
− Saturation
− Error
− Availability
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
How to Measure Latency
● Single latency - Capture the time takes for one operation
● What if you have millions of operations per second?
● What if you have millions in one hour, how do you say “how did the million operations in the
last hour go”?
● How do you effectively plot the latency numbers?
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Let’s look at a small example:
● Assume we recorded 12 latency values:
− 11ms, 19ms, 13ms, 12ms, 85ms, 43ms, 720ms, 17ms, 22ms, 25ms, 31ms, 2ms
● If we list out these raw values,
− It will take a lot of space: 12 x 8 bytes = 96 bytes.
− It won’t be scalable: if you have 1 million raw latency values, storage and transfer will be super costly
− It will be very expensive to find max value from the raw list
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Let’s look at a small example:
● Assume we recorded 12 latency values:
− 11ms, 19ms, 13ms, 12ms, 85ms, 43ms, 720ms, 17ms, 22ms, 25ms, 31ms, 2ms
● If we list out these raw values,
− It will take a lot of space: 12 x 8 bytes = 96 bytes.
− It won’t be scalable: if you have 1 million raw latency values, storage and transfer will be super costly
− It will be very expensive to find max value from the raw list
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Let’s look at a small example:
● Assume we recorded 12 latency values:
− 11ms, 19ms, 13ms, 12ms, 85ms, 43ms, 720ms, 17ms, 22ms, 25ms, 31ms, 2ms
● Average:
− avg (11, 19, 13, 12, 85, 43, 720, 17, 22, 25, 31, 2) = 83.3ms
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Let’s look at a small example:
● Assume we recorded 12 latency values:
− 11ms, 19ms, 13ms, 12ms, 85ms, 43ms, 720ms, 17ms, 22ms, 25ms, 31ms, 2ms
● Average:
− avg (11, 19, 13, 12, 85, 43, 720, 17, 22, 25, 31, 2) = 83.3ms
− Downside: no idea about
− the best latency
− the worst latency
− or distribution of these values
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Histogram
● A histogram is an accurate representation of
the distribution of numerical data.
● To construct a histogram, the first step is to
"bucket" the range of values
− i.e. divide the entire range of values into a
series of intervals
− and then count how many values fall into
each interval
− The buckets are usually specified as
consecutive, non-overlapping intervals of a
variable
CC BY 2.5,
https://commons.wikimedia.org/w/index.php?curid=3483039
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Go back to the previous example:
● Assume we recorded 12 latency values:
− 11ms, 19ms, 13ms, 12ms, 85ms, 43ms, 720ms, 17ms, 22ms, 25ms, 31ms, 2ms
● We sort them first:
− 2ms, 11ms, 12ms, 13ms, 17ms, 19ms, 22ms, 25ms, 31ms, 43ms, 85ms, 720ms
− Then we can put them into the following buckets:
1-10 10-100 100-1000
1 10 1
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Go back to the previous example:
● This will save a lot of space and is a lot more scalable
● We’re indeed losing some accuracy:
− Max: 1000ms (actual: 720ms)
− Min: 10ms (actual: 2ms)
− Avg: (10 x 1 + 100 x 10 + 1000 x 1) / 12 = 167ms (actual: 83.3ms)
− We can also calculate percentile, for example:
− 90th Percentile: among 12 latency values, 90% of them occurred in 10-100 bucket or lower
− so P90=100ms
1-10 10-100 100-1000
1 10 1
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
EstimatedHistogram
● The series starts at 1 and grows by 1.2 each time
1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 17, 20, 24, 29,
…
12108970, 14530764, 17436917, 20924300, 25109160, 30130992, 36157190
● Time resolution from 1 microsecond to 36 seconds
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
How Histogram Shows Up in Latency Metrics
● Quantile Estimation:
− % of the requests should be
faster than given latency
− P50
− P75
− P95
− P98
− P99
− P999
● Buckets of count/frequency
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Aggregation on Histogram
● NO aggregation (e.g. average) on quantile
numbers
● Averaging on Max can be very misleading
● Averaging on quantile number also doesn’t
mean anything
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Aggregation on Histogram
● NO aggregation (e.g. average) on quantile
numbers
● Averaging on Max can be very misleading
● Averaging on P90 number also doesn’t
mean anything
● However, if you expose the histogram raw
buckets, merging the number can be
straightforward
1-10 10-100 100-1000
1 10 1
1-10 10-100 100-1000
4 7 1
1-10 10-100 100-1000
2 9 1
node0
node1
node2
1-10 10-100 100-1000
7 26 3
cluster
MAY 21 - 23, 2019
Gaylord National Resort & Convention Center Maryland
Available Metrics Tools
© DataStax, All Rights Reserved.Confidential
JMX
Java Management Extensions
● JMX is an API built into Java for managing and monitoring applications
● DataStax Enterprise uses JMX to interact with external applications and
tools
● nodetool leverages JMX to communicate with the database
● Third-party clients can also interact with DSE with JMX
© DataStax, All Rights Reserved.Confidential
JMX
Accessing JMX
● JMX connects remotely to the IP address of the
node
● Uses the configured JMX port for the JVM
− Default port 7199
− Subsequent RMI connection will also use the same
port
● Also supports user authentication and SSL
encryption
© DataStax, All Rights Reserved.Confidential
JMX
Accessing JMX
● Third-party tools for accessing JMX:
− GUI: JConsole, VisualVM
− Command-line: jmxterm, jmxsh, nodetool sjk mx (included with DSE)
● Exposed directly via non-JMX protocols:
− Jolokia – exposes via JSON over HTTP
− Dropwizard Metrics Library (built-in) - exposes via HTTP, SLF4J, Graphite, …
© DataStax, All Rights Reserved.Confidential
JMX
MBeans
● Managed Java object that represents a device, application, or resource
● Exposes an interface that contains the following:
− Set of readable and/or writeable attributes
− Set of invokable operations
● Derive DSE metrics and information
from reading MBean attributes
© DataStax, All Rights Reserved.Confidential
MBean
Accessing a Managed Bean (MBean)
● The MBean name is structured as follows:
− domain – usually a package name, i.e. org.apache.cassandra.metrics or com.datastax.bdp
− key property list – list of key-value pairs
− Keys generally have a type and a name
● The full name would be domain:[key1]=[value1],[key2]=[value2],...
− Domain and key property list is separated by colons
− Key-value pairs separated by commas
● MBeans may have a set of readable attributes
© DataStax, All Rights Reserved.Confidential
MBean
Example
org.apache.cassandra.metrics:type=Client,name=connectedNativeClients
© DataStax, All Rights Reserved.Confidential
Mbean Metric Types: Gauge and Counter
● Gauge provides an instantaneous reading of the metric value
− It has one attribute called value
● Counter is similar, but is used to compare previous readings
− It has one attribute called count
− Where applicable, the count values are cleared when the node starts or restarts
org.apache.cassandra.metrics:type=Table,keyspace=<keyspace>,scope=<Table>,name=PendingCompactions
org.apache.cassandra.metrics:type=Table,keyspace=<keyspace>,scope=<Table>,name=PendingFlushes
org.apache.cassandra.metrics:type=Table,keyspace=<keyspace>,scope=<Table>,name=BytesFlushed
© DataStax, All Rights Reserved.Confidential
Mbean Metric Type: Histogram
● Histogram includes attributes for min, max, mean, and various value percentiles
− Uses forward decay to make recent values more significant
− Past minute values twice as significant as all previous values
© DataStax, All Rights Reserved.Confidential
Mbean Metric Type: Histogram
Histogram example
(Histogram)
org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=MutationSizeHistogram
© DataStax, All Rights Reserved.Confidential
Mbean Metric Type: Meter
● Contains a count and measures mean throughput based on the rate unit
● Includes exponentially-weighted moving average throughputs
− One / five / fifteen minute rates
● Mean throughput doesn’t get affected by moving average values
● Values reset at node start or restart
© DataStax, All Rights Reserved.Confidential
Mbean Metric Type: Meter
Meter example
● 20 compactions completed since server
restart
● Average throughput for 1 compaction is 152
seconds, based on mean rate (since server
restart)
● In the past fifteen minutes, compactions
were completing at an average rate of one
per 7 seconds
org.apache.cassandra.metrics:type=Compaction,name=TotalCompactionsCompleted
© DataStax, All Rights Reserved.Confidential
Mbean Metric Types: Timer and Latency
● Timer measures the rate that a particular code is called, and also includes the time-cost
histogram
− Attributes include meter (the number of events in the past 1 / 5 / 15 minutes) and histogram
● Latency is a special type that includes a timer, used for tracking latency in microseconds, and
a counter which counts the total latency for all events
− A separate TotalLatency MBean counts the total latency for all events
− Calculates “correct” histograms
● Values reset at node start or restart
© DataStax, All Rights Reserved.Confidential
Mbean Metric Types: Timer and Latency
Timer and Latency examples
(Latency)
org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency
(Latency)
org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=TotalLatency
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
DDAC or OSS C* Metrics Tools
● nodetool
● JMX tools
− JConsole, VisualVM, sjkplus, jmxterm, jmxsh
● DropWizard Metrics Library Metrics Reporter https://github.com/addthis/metrics-reporter-config
● Graphite_Exporter https://github.com/prometheus/graphite_exporter
● Prometheus https://prometheus.io/docs/introduction/overview/
● Grafana https://grafana.com/docs/guides/getting_started/
● cassandra_exporter https://github.com/criteo/cassandra_exporter
● cassandra-monitoring https://github.com/soccerties/cassandra-monitoring
● Prometheus jmx_exporter https://github.com/prometheus/jmx_exporter
● Prometheus node_exporter https://github.com/prometheus/node_exporter
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
DSE Metrics Tools
● DSE Metrics Collector
● DSE Metrics Collector Dashboard https://github.com/datastax/dse-metric-reporter-dashboards
● Prometheus
● Grafana
● Graphite_Exporter
● nodetool
● JMX tools
− JConsole, VisualVM, sjkplus, jmxterm, jmxsh
● OpsCenter
● DropWizard Metrics Library Metrics Reporter
● cassandra_exporter
● cassandra-monitoring
● Prometheus jmx_exporter and node_exporter
© DataStax, All Rights Reserved.Confidential
DSE Metrics Collector (DSE)
● Part of DSE Server Foundation
● Collects DSE and OS Metrics
● Easily integrated with enterprise monitoring stack
● Introduced in DSE 6.7 (enabled by default), but backported to DSE 6.0.5+ and
DSE 5.1.14+ as well (disabled by default)
● Based on collectd (with local temporary storage) that can export/expose metrics to
different monitoring systems: Prometheus, Graphite, …
● Collectd works as a sub-process spawned by DSE JVM and life cycle managed by
DSE
© DataStax, All Rights Reserved.Confidential
DSE Metrics Collector Architecture
Grafana Dashboards
Prometheus
Monitoring Server
Customer Landscape
DataStax Enterprise Cluster
DataStax Metrics Collector
Collectd
DSE and OS Metrics
Exporter Plugin
© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
DataStax Enterprise Metrics Dashboard (DSE)
● Freely available from DataStax github repo as an example
https://github.com/datastax/dse-metric-reporter-dashboards
https://docs.datastax.com/en/dse/6.7/dse-
dev/datastax_enterprise/tools/metricsCollector/mcExportMetricsDocker.html
● Built using docker-compose
● Push button setup of a dashboard environment that can be used as your template
© DataStax, All Rights Reserved.Confidential
© DataStax, All Rights Reserved.Confidential
© DataStax, All Rights Reserved.Confidential
MAY 21 - 23, 2019
Gaylord National Resort & Convention Center Maryland
What Metrics to Monitor
© DataStax, All Rights Reserved.Confidential
MBeans
Table Metrics
● Contains metrics affecting all tables on the node
• Mbeans used for table-specific metrics
• Similar to metrics provided by nodetool tablestats
org.apache.cassandra.metrics:type=Table
org.apache.cassandra.metrics:type=Table,keyspace=<keyspace>,
scope=<Table>,name=<MetricName>
© DataStax, All Rights Reserved.Confidential
MBeans
Keyspace Metrics
● Same metric MBeans as the table metrics, aggregated to the keyspace
● Similar to metrics provided by nodetool tablestats
org.apache.cassandra.metrics:type=Keyspace,scope=<Keyspace>,name=<MetricName>
© DataStax, All Rights Reserved.Confidential
MBeans
ThreadPool Metrics
● Type divides the thread pools into internal, request, and transport
● Same set of MBeans for each thread pool
− Active Tasks
− Pending Tasks
− Completed Tasks
− Total Blocked Tasks
− Currently Blocked Tasks
− Max Pool Size
● Similar to metrics provided by nodetool tpstats
org.apache.cassandra.metrics:type=ThreadPools,scope=<ThreadPoolName>,type=<Type>,name=<MetricName>
© DataStax, All Rights Reserved.Confidential
MBeans
Client Request Metrics
● Metrics that encapsulate work taking place at the coordinator level
● Request types:
− CASRead
− CASWrite
− RangeSlice
− Read
− Write
− ViewWrite
● Similar to metrics provided by nodetool proxyhistograms
org.apache.cassandra.metrics:type=ClientRequest,scope=<RequestType>,name=<MetricName>
© DataStax, All Rights Reserved.Confidential
MBeans
Compaction Metrics
● Metrics specific to compaction work
● Attributes
− BytesCompacted
− PendingTasks
− CompletedTasks
− TotalCompactionsCompleted
− PendingTasksByTableName
● Similar to metrics provided by nodetool compactionstats
org.apache.cassandra.metrics:type=Compaction,name=<MetricName>
© DataStax, All Rights Reserved.Confidential
MBeans
Other database metrics
CQL Metrics org.apache.cassandra.metrics:type=CQL,name=<MetricName>
DroppedMessage Metrics org.apache.cassandra.metrics:type=DroppedMetrics,scope=<Type>,name=<MetricName>
Streaming Metrics org.apache.cassandra.metrics:type=Streaming,scope=<PeerIP>,name=<MetricName>
CommitLog Metrics org.apache.cassandra.metrics:type=CommitLog,name=<MetricName>
Storage Metrics org.apache.cassandra.metrics:type=Storage,name=<MetricName>
Hinted Handoff Metrics org.apache.cassandra.metrics:type=HintedHandoffManager,name=<MetricName>
Hints Service Metrics org.apache.cassandra.metrics:type=HintsService,name=<MetricName>
SSTable Index Metrics org.apache.cassandra.metrics:type=Index,scope=RowIndexEntry,name=<MetricName>
BufferPool Metrics org.apache.cassandra.metrics:type=BufferPool,name=<MetricName>
Client Metrics org.apache.cassandra.metrics:type=Client,name=<MetricName>
Batch Metrics org.apache.cassandra.metrics:type=Batch,name=<MetricName>
© DataStax, All Rights Reserved.Confidential
MBeans
JVM Metrics
BufferPool jvm.nio:type=BufferPool,name=<direct|mapped>
FileDescriptorRatio java.lang:type=OperatingSystem,name=<OpenFileDescriptorCount|MaxFileDescriptorCount>
GarbageCollector java.lang:type=GarbageCollector,name=<gc_type>
Memory java.lang:type=Memory
MemoryPool java.lang:type=MemoryPool,name=<memory_pool>
http://cassandra.apache.org/doc/latest/operating/metrics.html
MAY 21 - 23, 2019
Gaylord National Resort & Convention Center Maryland
Most Important Performance Metrics
© DataStax, All Rights Reserved.Confidential
Most important metrics to monitor
Metric description Threshold
Read and write latencies. Client scope, table scope P99 > 200ms for more than 1 minute
Dropped mutations Value greater than 0
Pending compactions more than 30 for more than 15min
Aborted compactions Value greater than 0
Total timeouts, and timeouts per host - could be a
sign of network problems, etc.
Value greater than 0
Maximal partition size Partition sizes bigger than 100Mb is a
sign of problems with data model
© DataStax, All Rights Reserved.Confidential
Most important metrics to monitor
Metric description Threshold
Number of SSTables on host & per table > 500 per individual table (non-LCS)
Blocked allocations of memtable pool Value greater than 0
Total hints on specific node Value greater than 0
Hints replay (failed, succeed, timed out) Value greater than 0 for failed and
timed out
Blocked tasks for compaction executor, memtable
flush writer
Value greater than 0
Cross-data center latency Too high values (> 100ms)
Number of segments waiting on commit High count during last minute, high
99th percentile of time waiting…
© DataStax, All Rights Reserved.Confidential
Most important metrics to monitor
Metric description Threshold
Data about Java’s garbage collection Max GC Elapsed (ms) is greater
than 500ms
Pending flushes More or near value of
memtable_flush_writers
MAY 21 - 23, 2019
Gaylord National Resort & Convention Center Maryland
Resources
© DataStax, All Rights Reserved.Confidential
Learning Resources
● Official document of Cassandra’s metrics:
http://cassandra.apache.org/doc/latest/operating/metrics.html
● DSE Metrics Collector Documentation: https://docs.datastax.com/en/dse/6.7/dse-
dev/datastax_enterprise/tools/metricsCollector/mcIntroduction.html
● DSE Metrics Dashboard github repo: https://github.com/datastax/dse-metric-reporter-
dashboards
● Prometheus relabeling configuration in DSE Metrics Dashboard: https://tinyurl.com/y4u3y2zf
● Gil Tene’s Latency Tip of The Day: http://latencytipoftheday.blogspot.com/
● Nitsant Wakart’s blog: http://psy-lob-saw.blogspot.com/2016/07/fixing-co-in-cstress.html
© DataStax, All Rights Reserved.Confidential
https://tinyurl.com/y62r4uw4
© DataStax, All Rights Reserved.Confidential
Thank you
58 © DataStax, All Rights Reserved. Confidential
MAY 21 - 23, 2019
Gaylord National Resort & Convention Center Maryland
Q & A

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
MongoDB.pptx
MongoDB.pptxMongoDB.pptx
MongoDB.pptx
 
PostgreSQL and RAM usage
PostgreSQL and RAM usagePostgreSQL and RAM usage
PostgreSQL and RAM usage
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Cassandra consistency
Cassandra consistencyCassandra consistency
Cassandra consistency
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
How to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQueryHow to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQuery
 
Object storage
Object storageObject storage
Object storage
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know Snowflake
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 

Similar a Webinar | How to Understand Apache Cassandra™ Performance Through Read/Write Metrics: A Beginner's Guide

Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDBWebinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Severalnines
 
From Startup to Mature Company: PostgreSQL Tips and techniques
From Startup to Mature Company:  PostgreSQL Tips and techniquesFrom Startup to Mature Company:  PostgreSQL Tips and techniques
From Startup to Mature Company: PostgreSQL Tips and techniques
John Ashmead
 

Similar a Webinar | How to Understand Apache Cassandra™ Performance Through Read/Write Metrics: A Beginner's Guide (20)

Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0
 
implementation of a big data architecture for real-time analytics with data s...
implementation of a big data architecture for real-time analytics with data s...implementation of a big data architecture for real-time analytics with data s...
implementation of a big data architecture for real-time analytics with data s...
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDB
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Are your ready for in memory applications?
Are your ready for in memory applications?Are your ready for in memory applications?
Are your ready for in memory applications?
 
Scalar, nimble, brocade, commvault, star trek into darkness, toronto, 05 16 2013
Scalar, nimble, brocade, commvault, star trek into darkness, toronto, 05 16 2013Scalar, nimble, brocade, commvault, star trek into darkness, toronto, 05 16 2013
Scalar, nimble, brocade, commvault, star trek into darkness, toronto, 05 16 2013
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
 
DataStax 6 and Beyond
DataStax 6 and BeyondDataStax 6 and Beyond
DataStax 6 and Beyond
 
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDBWebinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
 
Keep Them out of the Database
Keep Them out of the DatabaseKeep Them out of the Database
Keep Them out of the Database
 
London hug
London hugLondon hug
London hug
 
Virtual training optimizing the tick stack
Virtual training  optimizing the tick stackVirtual training  optimizing the tick stack
Virtual training optimizing the tick stack
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloud
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL Migration
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with Alternator
 
From Startup to Mature Company: PostgreSQL Tips and techniques
From Startup to Mature Company:  PostgreSQL Tips and techniquesFrom Startup to Mature Company:  PostgreSQL Tips and techniques
From Startup to Mature Company: PostgreSQL Tips and techniques
 

Más de DataStax

Más de DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 
Innovation Around Data and AI for Fraud Detection
Innovation Around Data and AI for Fraud DetectionInnovation Around Data and AI for Fraud Detection
Innovation Around Data and AI for Fraud Detection
 
How to get Real-Time Value from your IoT Data - Datastax
How to get Real-Time Value from your IoT Data - DatastaxHow to get Real-Time Value from your IoT Data - Datastax
How to get Real-Time Value from your IoT Data - Datastax
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Webinar | How to Understand Apache Cassandra™ Performance Through Read/Write Metrics: A Beginner's Guide

  • 1. © DataStax, All Rights Reserved.Confidential Understand Apache Cassandra Performance Through Metrics: A Beginner’s Guide 1 © DataStax, All Rights Reserved. Confidential
  • 2. MAY 21 - 23, 2019 Gaylord National Resort & Convention Center Maryland Why ?
  • 3. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. Agenda ● Basic Concepts in Cassandra Architecture ● How Do You Begin To Understand Performance of a Real-time Database ● What Tools are Available ● What are the Most Important Metrics
  • 4. MAY 21 - 23, 2019 Gaylord National Resort & Convention Center Maryland Cassandra Concepts
  • 5. © DataStax, All Rights Reserved.Confidential Masterless / Peer-to-Peer Architecture ● All nodes are the same, owning a piece of data ● Availability − No special “master”, “leader”, etc − No fragility; no single-point-of-failure − No “failover” ● Scalability − All nodes host data, but also serve queries − More data? More nodes. − More queries? More nodes. 5 Client
  • 6. © DataStax, All Rights Reserved.Confidential Coordinator, Replica and Client ● No single point of failure ● All data replicated − Replication automatically handled − All replicas are equal ● Any client can connect to any node and read/write the data they need ● Any node can be: −Coordinator −Storage/Replica Nodes 6 Client
  • 7. MAY 21 - 23, 2019 Gaylord National Resort & Convention Center Maryland Key Concepts in Real-time Database Performance
  • 8. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. Throughput and Latency ● Throughput: rate of operations ● Latency: time takes for one operation ● Sustainable Throughput − “achieving throughput while safely maintaining SLA” – Gil Tene − Don’t measure latency at saturation ● System Resources − Utilization − Saturation − Error − Availability
  • 9. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. How to Measure Latency ● Single latency - Capture the time takes for one operation ● What if you have millions of operations per second? ● What if you have millions in one hour, how do you say “how did the million operations in the last hour go”? ● How do you effectively plot the latency numbers?
  • 10. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. Let’s look at a small example: ● Assume we recorded 12 latency values: − 11ms, 19ms, 13ms, 12ms, 85ms, 43ms, 720ms, 17ms, 22ms, 25ms, 31ms, 2ms ● If we list out these raw values, − It will take a lot of space: 12 x 8 bytes = 96 bytes. − It won’t be scalable: if you have 1 million raw latency values, storage and transfer will be super costly − It will be very expensive to find max value from the raw list
  • 11. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. Let’s look at a small example: ● Assume we recorded 12 latency values: − 11ms, 19ms, 13ms, 12ms, 85ms, 43ms, 720ms, 17ms, 22ms, 25ms, 31ms, 2ms ● If we list out these raw values, − It will take a lot of space: 12 x 8 bytes = 96 bytes. − It won’t be scalable: if you have 1 million raw latency values, storage and transfer will be super costly − It will be very expensive to find max value from the raw list
  • 12. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. Let’s look at a small example: ● Assume we recorded 12 latency values: − 11ms, 19ms, 13ms, 12ms, 85ms, 43ms, 720ms, 17ms, 22ms, 25ms, 31ms, 2ms ● Average: − avg (11, 19, 13, 12, 85, 43, 720, 17, 22, 25, 31, 2) = 83.3ms
  • 13. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. Let’s look at a small example: ● Assume we recorded 12 latency values: − 11ms, 19ms, 13ms, 12ms, 85ms, 43ms, 720ms, 17ms, 22ms, 25ms, 31ms, 2ms ● Average: − avg (11, 19, 13, 12, 85, 43, 720, 17, 22, 25, 31, 2) = 83.3ms − Downside: no idea about − the best latency − the worst latency − or distribution of these values
  • 14. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. Histogram ● A histogram is an accurate representation of the distribution of numerical data. ● To construct a histogram, the first step is to "bucket" the range of values − i.e. divide the entire range of values into a series of intervals − and then count how many values fall into each interval − The buckets are usually specified as consecutive, non-overlapping intervals of a variable CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=3483039
  • 15. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. Go back to the previous example: ● Assume we recorded 12 latency values: − 11ms, 19ms, 13ms, 12ms, 85ms, 43ms, 720ms, 17ms, 22ms, 25ms, 31ms, 2ms ● We sort them first: − 2ms, 11ms, 12ms, 13ms, 17ms, 19ms, 22ms, 25ms, 31ms, 43ms, 85ms, 720ms − Then we can put them into the following buckets: 1-10 10-100 100-1000 1 10 1
  • 16. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. Go back to the previous example: ● This will save a lot of space and is a lot more scalable ● We’re indeed losing some accuracy: − Max: 1000ms (actual: 720ms) − Min: 10ms (actual: 2ms) − Avg: (10 x 1 + 100 x 10 + 1000 x 1) / 12 = 167ms (actual: 83.3ms) − We can also calculate percentile, for example: − 90th Percentile: among 12 latency values, 90% of them occurred in 10-100 bucket or lower − so P90=100ms 1-10 10-100 100-1000 1 10 1
  • 17. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. EstimatedHistogram ● The series starts at 1 and grows by 1.2 each time 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 17, 20, 24, 29, … 12108970, 14530764, 17436917, 20924300, 25109160, 30130992, 36157190 ● Time resolution from 1 microsecond to 36 seconds
  • 18. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. How Histogram Shows Up in Latency Metrics ● Quantile Estimation: − % of the requests should be faster than given latency − P50 − P75 − P95 − P98 − P99 − P999 ● Buckets of count/frequency
  • 19. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. Aggregation on Histogram ● NO aggregation (e.g. average) on quantile numbers ● Averaging on Max can be very misleading ● Averaging on quantile number also doesn’t mean anything
  • 20. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. Aggregation on Histogram ● NO aggregation (e.g. average) on quantile numbers ● Averaging on Max can be very misleading ● Averaging on P90 number also doesn’t mean anything ● However, if you expose the histogram raw buckets, merging the number can be straightforward 1-10 10-100 100-1000 1 10 1 1-10 10-100 100-1000 4 7 1 1-10 10-100 100-1000 2 9 1 node0 node1 node2 1-10 10-100 100-1000 7 26 3 cluster
  • 21. MAY 21 - 23, 2019 Gaylord National Resort & Convention Center Maryland Available Metrics Tools
  • 22. © DataStax, All Rights Reserved.Confidential JMX Java Management Extensions ● JMX is an API built into Java for managing and monitoring applications ● DataStax Enterprise uses JMX to interact with external applications and tools ● nodetool leverages JMX to communicate with the database ● Third-party clients can also interact with DSE with JMX
  • 23. © DataStax, All Rights Reserved.Confidential JMX Accessing JMX ● JMX connects remotely to the IP address of the node ● Uses the configured JMX port for the JVM − Default port 7199 − Subsequent RMI connection will also use the same port ● Also supports user authentication and SSL encryption
  • 24. © DataStax, All Rights Reserved.Confidential JMX Accessing JMX ● Third-party tools for accessing JMX: − GUI: JConsole, VisualVM − Command-line: jmxterm, jmxsh, nodetool sjk mx (included with DSE) ● Exposed directly via non-JMX protocols: − Jolokia – exposes via JSON over HTTP − Dropwizard Metrics Library (built-in) - exposes via HTTP, SLF4J, Graphite, …
  • 25. © DataStax, All Rights Reserved.Confidential JMX MBeans ● Managed Java object that represents a device, application, or resource ● Exposes an interface that contains the following: − Set of readable and/or writeable attributes − Set of invokable operations ● Derive DSE metrics and information from reading MBean attributes
  • 26. © DataStax, All Rights Reserved.Confidential MBean Accessing a Managed Bean (MBean) ● The MBean name is structured as follows: − domain – usually a package name, i.e. org.apache.cassandra.metrics or com.datastax.bdp − key property list – list of key-value pairs − Keys generally have a type and a name ● The full name would be domain:[key1]=[value1],[key2]=[value2],... − Domain and key property list is separated by colons − Key-value pairs separated by commas ● MBeans may have a set of readable attributes
  • 27. © DataStax, All Rights Reserved.Confidential MBean Example org.apache.cassandra.metrics:type=Client,name=connectedNativeClients
  • 28. © DataStax, All Rights Reserved.Confidential Mbean Metric Types: Gauge and Counter ● Gauge provides an instantaneous reading of the metric value − It has one attribute called value ● Counter is similar, but is used to compare previous readings − It has one attribute called count − Where applicable, the count values are cleared when the node starts or restarts org.apache.cassandra.metrics:type=Table,keyspace=<keyspace>,scope=<Table>,name=PendingCompactions org.apache.cassandra.metrics:type=Table,keyspace=<keyspace>,scope=<Table>,name=PendingFlushes org.apache.cassandra.metrics:type=Table,keyspace=<keyspace>,scope=<Table>,name=BytesFlushed
  • 29. © DataStax, All Rights Reserved.Confidential Mbean Metric Type: Histogram ● Histogram includes attributes for min, max, mean, and various value percentiles − Uses forward decay to make recent values more significant − Past minute values twice as significant as all previous values
  • 30. © DataStax, All Rights Reserved.Confidential Mbean Metric Type: Histogram Histogram example (Histogram) org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=MutationSizeHistogram
  • 31. © DataStax, All Rights Reserved.Confidential Mbean Metric Type: Meter ● Contains a count and measures mean throughput based on the rate unit ● Includes exponentially-weighted moving average throughputs − One / five / fifteen minute rates ● Mean throughput doesn’t get affected by moving average values ● Values reset at node start or restart
  • 32. © DataStax, All Rights Reserved.Confidential Mbean Metric Type: Meter Meter example ● 20 compactions completed since server restart ● Average throughput for 1 compaction is 152 seconds, based on mean rate (since server restart) ● In the past fifteen minutes, compactions were completing at an average rate of one per 7 seconds org.apache.cassandra.metrics:type=Compaction,name=TotalCompactionsCompleted
  • 33. © DataStax, All Rights Reserved.Confidential Mbean Metric Types: Timer and Latency ● Timer measures the rate that a particular code is called, and also includes the time-cost histogram − Attributes include meter (the number of events in the past 1 / 5 / 15 minutes) and histogram ● Latency is a special type that includes a timer, used for tracking latency in microseconds, and a counter which counts the total latency for all events − A separate TotalLatency MBean counts the total latency for all events − Calculates “correct” histograms ● Values reset at node start or restart
  • 34. © DataStax, All Rights Reserved.Confidential Mbean Metric Types: Timer and Latency Timer and Latency examples (Latency) org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency (Latency) org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=TotalLatency
  • 35. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. DDAC or OSS C* Metrics Tools ● nodetool ● JMX tools − JConsole, VisualVM, sjkplus, jmxterm, jmxsh ● DropWizard Metrics Library Metrics Reporter https://github.com/addthis/metrics-reporter-config ● Graphite_Exporter https://github.com/prometheus/graphite_exporter ● Prometheus https://prometheus.io/docs/introduction/overview/ ● Grafana https://grafana.com/docs/guides/getting_started/ ● cassandra_exporter https://github.com/criteo/cassandra_exporter ● cassandra-monitoring https://github.com/soccerties/cassandra-monitoring ● Prometheus jmx_exporter https://github.com/prometheus/jmx_exporter ● Prometheus node_exporter https://github.com/prometheus/node_exporter
  • 36. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. DSE Metrics Tools ● DSE Metrics Collector ● DSE Metrics Collector Dashboard https://github.com/datastax/dse-metric-reporter-dashboards ● Prometheus ● Grafana ● Graphite_Exporter ● nodetool ● JMX tools − JConsole, VisualVM, sjkplus, jmxterm, jmxsh ● OpsCenter ● DropWizard Metrics Library Metrics Reporter ● cassandra_exporter ● cassandra-monitoring ● Prometheus jmx_exporter and node_exporter
  • 37. © DataStax, All Rights Reserved.Confidential DSE Metrics Collector (DSE) ● Part of DSE Server Foundation ● Collects DSE and OS Metrics ● Easily integrated with enterprise monitoring stack ● Introduced in DSE 6.7 (enabled by default), but backported to DSE 6.0.5+ and DSE 5.1.14+ as well (disabled by default) ● Based on collectd (with local temporary storage) that can export/expose metrics to different monitoring systems: Prometheus, Graphite, … ● Collectd works as a sub-process spawned by DSE JVM and life cycle managed by DSE
  • 38. © DataStax, All Rights Reserved.Confidential DSE Metrics Collector Architecture Grafana Dashboards Prometheus Monitoring Server Customer Landscape DataStax Enterprise Cluster DataStax Metrics Collector Collectd DSE and OS Metrics Exporter Plugin
  • 39. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved. DataStax Enterprise Metrics Dashboard (DSE) ● Freely available from DataStax github repo as an example https://github.com/datastax/dse-metric-reporter-dashboards https://docs.datastax.com/en/dse/6.7/dse- dev/datastax_enterprise/tools/metricsCollector/mcExportMetricsDocker.html ● Built using docker-compose ● Push button setup of a dashboard environment that can be used as your template
  • 40. © DataStax, All Rights Reserved.Confidential
  • 41. © DataStax, All Rights Reserved.Confidential
  • 42. © DataStax, All Rights Reserved.Confidential
  • 43. MAY 21 - 23, 2019 Gaylord National Resort & Convention Center Maryland What Metrics to Monitor
  • 44. © DataStax, All Rights Reserved.Confidential MBeans Table Metrics ● Contains metrics affecting all tables on the node • Mbeans used for table-specific metrics • Similar to metrics provided by nodetool tablestats org.apache.cassandra.metrics:type=Table org.apache.cassandra.metrics:type=Table,keyspace=<keyspace>, scope=<Table>,name=<MetricName>
  • 45. © DataStax, All Rights Reserved.Confidential MBeans Keyspace Metrics ● Same metric MBeans as the table metrics, aggregated to the keyspace ● Similar to metrics provided by nodetool tablestats org.apache.cassandra.metrics:type=Keyspace,scope=<Keyspace>,name=<MetricName>
  • 46. © DataStax, All Rights Reserved.Confidential MBeans ThreadPool Metrics ● Type divides the thread pools into internal, request, and transport ● Same set of MBeans for each thread pool − Active Tasks − Pending Tasks − Completed Tasks − Total Blocked Tasks − Currently Blocked Tasks − Max Pool Size ● Similar to metrics provided by nodetool tpstats org.apache.cassandra.metrics:type=ThreadPools,scope=<ThreadPoolName>,type=<Type>,name=<MetricName>
  • 47. © DataStax, All Rights Reserved.Confidential MBeans Client Request Metrics ● Metrics that encapsulate work taking place at the coordinator level ● Request types: − CASRead − CASWrite − RangeSlice − Read − Write − ViewWrite ● Similar to metrics provided by nodetool proxyhistograms org.apache.cassandra.metrics:type=ClientRequest,scope=<RequestType>,name=<MetricName>
  • 48. © DataStax, All Rights Reserved.Confidential MBeans Compaction Metrics ● Metrics specific to compaction work ● Attributes − BytesCompacted − PendingTasks − CompletedTasks − TotalCompactionsCompleted − PendingTasksByTableName ● Similar to metrics provided by nodetool compactionstats org.apache.cassandra.metrics:type=Compaction,name=<MetricName>
  • 49. © DataStax, All Rights Reserved.Confidential MBeans Other database metrics CQL Metrics org.apache.cassandra.metrics:type=CQL,name=<MetricName> DroppedMessage Metrics org.apache.cassandra.metrics:type=DroppedMetrics,scope=<Type>,name=<MetricName> Streaming Metrics org.apache.cassandra.metrics:type=Streaming,scope=<PeerIP>,name=<MetricName> CommitLog Metrics org.apache.cassandra.metrics:type=CommitLog,name=<MetricName> Storage Metrics org.apache.cassandra.metrics:type=Storage,name=<MetricName> Hinted Handoff Metrics org.apache.cassandra.metrics:type=HintedHandoffManager,name=<MetricName> Hints Service Metrics org.apache.cassandra.metrics:type=HintsService,name=<MetricName> SSTable Index Metrics org.apache.cassandra.metrics:type=Index,scope=RowIndexEntry,name=<MetricName> BufferPool Metrics org.apache.cassandra.metrics:type=BufferPool,name=<MetricName> Client Metrics org.apache.cassandra.metrics:type=Client,name=<MetricName> Batch Metrics org.apache.cassandra.metrics:type=Batch,name=<MetricName>
  • 50. © DataStax, All Rights Reserved.Confidential MBeans JVM Metrics BufferPool jvm.nio:type=BufferPool,name=<direct|mapped> FileDescriptorRatio java.lang:type=OperatingSystem,name=<OpenFileDescriptorCount|MaxFileDescriptorCount> GarbageCollector java.lang:type=GarbageCollector,name=<gc_type> Memory java.lang:type=Memory MemoryPool java.lang:type=MemoryPool,name=<memory_pool> http://cassandra.apache.org/doc/latest/operating/metrics.html
  • 51. MAY 21 - 23, 2019 Gaylord National Resort & Convention Center Maryland Most Important Performance Metrics
  • 52. © DataStax, All Rights Reserved.Confidential Most important metrics to monitor Metric description Threshold Read and write latencies. Client scope, table scope P99 > 200ms for more than 1 minute Dropped mutations Value greater than 0 Pending compactions more than 30 for more than 15min Aborted compactions Value greater than 0 Total timeouts, and timeouts per host - could be a sign of network problems, etc. Value greater than 0 Maximal partition size Partition sizes bigger than 100Mb is a sign of problems with data model
  • 53. © DataStax, All Rights Reserved.Confidential Most important metrics to monitor Metric description Threshold Number of SSTables on host & per table > 500 per individual table (non-LCS) Blocked allocations of memtable pool Value greater than 0 Total hints on specific node Value greater than 0 Hints replay (failed, succeed, timed out) Value greater than 0 for failed and timed out Blocked tasks for compaction executor, memtable flush writer Value greater than 0 Cross-data center latency Too high values (> 100ms) Number of segments waiting on commit High count during last minute, high 99th percentile of time waiting…
  • 54. © DataStax, All Rights Reserved.Confidential Most important metrics to monitor Metric description Threshold Data about Java’s garbage collection Max GC Elapsed (ms) is greater than 500ms Pending flushes More or near value of memtable_flush_writers
  • 55. MAY 21 - 23, 2019 Gaylord National Resort & Convention Center Maryland Resources
  • 56. © DataStax, All Rights Reserved.Confidential Learning Resources ● Official document of Cassandra’s metrics: http://cassandra.apache.org/doc/latest/operating/metrics.html ● DSE Metrics Collector Documentation: https://docs.datastax.com/en/dse/6.7/dse- dev/datastax_enterprise/tools/metricsCollector/mcIntroduction.html ● DSE Metrics Dashboard github repo: https://github.com/datastax/dse-metric-reporter- dashboards ● Prometheus relabeling configuration in DSE Metrics Dashboard: https://tinyurl.com/y4u3y2zf ● Gil Tene’s Latency Tip of The Day: http://latencytipoftheday.blogspot.com/ ● Nitsant Wakart’s blog: http://psy-lob-saw.blogspot.com/2016/07/fixing-co-in-cstress.html
  • 57. © DataStax, All Rights Reserved.Confidential https://tinyurl.com/y62r4uw4
  • 58. © DataStax, All Rights Reserved.Confidential Thank you 58 © DataStax, All Rights Reserved. Confidential
  • 59. MAY 21 - 23, 2019 Gaylord National Resort & Convention Center Maryland Q & A