What's New in Scylla Monitoring 3.0

ScyllaDB
ScyllaDBScyllaDB
What’s new in Scylla
Monitoring 3.0
Amnon Heiman, Software Developer
Presenter
Amnon Heiman, Software Developer at ScyllaDB
Over 15 years of experience in software development of large
scale systems.
Previously worked at Convergin, which was acquired by Oracle.
Holds a BA and MSc in Computer Science from the Technion-
Machon Technologi Le’ Israel and an MBA from Tel Aviv
University.
What is New
■ Stack Overview
■ Versions Update
■ New Dashboards
■ Alerts
■ New Features
■ Scylla Manager Integration
Scylla Monitoring Stack
Scylla, Prometheus, Grafana
Versions upgrade
■ Grafana 6.4.3
■ Prometheus 2.13.1
■ Alert-manager 0.17
■ Scylla Open source 3.1
■ Scylla Enterprise 2019.1
■ Scylla Manager 2.0 (upcoming version)
■ Scylla Alternator
■ Python 3
Dashboards Reorganization
User facing Dashboards
■ Overview - General overview of the cluster
■ Detailed - Drilldown look at a Scylla Node
■ CQL - CQL metrics and CQL Optimization
■ Alternator - Alternator metrics
■ Manager - Scylla Manager metrics
■ OS - OS related metrics (disk, network)
Scylla’s support Oriented Dashboards
■ Errors - Scylla’s internal errors
■ CPU
■ IO
New Dashboard - CQL
Commands
■ Inserts
■ Reads
■ Deletes
■ Updates
■ Batches
New Dashboard - CQL cont’
Optimization
■ Prepared Statements
■ Paged Queries
■ Token Aware
■ Reversed Read
■ Allow filtering
■ Consistency Level issues
■ Cross DC traffic
New Dashboard - Alternator (DynamoDB API)
■ Cluster overview
■ Data Plane Actions
■ Data Plane Latencies
■ Control Plane Actions
■ Cache
■ Timeouts
Additional Alerts
■ Alerts are shown in the dashboard and can connect to external
systems
■ New alerts:
● Low disk size
● CQL connectivity
How to Add an Alert
■ Part of the Prometheus configuration (prometheus.rules.yml)
■ Structure
● Name
● What happened
● For how long
● What to report
https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
How to Add an Alert - Example
- alert: InstanceDown
expr: up == 0
for: 30s
labels:
severity: "2"
annotations:
description:'description...'
summary: Instance is down
Name
Prometheus expression
Duration
Labels Set to the Alert
severity is important
longer description
Summary
Annotations
■ Annotations are markers on the graph
Monitoring/Manager tighter integration
■ Manager has a Consul API
■ Prometheus can read the node list from the Manager
■ No configuration files are needed
Thank you Stay in touch
Any questions?
Amnon Heiman
amnon@scylladb.com
@amnonheiman
1 de 16

Más contenido relacionado

La actualidad más candente(20)

Planning for Disaster Recovery (DR) with Galera ClusterPlanning for Disaster Recovery (DR) with Galera Cluster
Planning for Disaster Recovery (DR) with Galera Cluster
Codership Oy - Creators of Galera Cluster740 vistas
Java Performance TuningJava Performance Tuning
Java Performance Tuning
Ender Aydin Orak268 vistas

Similar a What's New in Scylla Monitoring 3.0(20)

Más de ScyllaDB(20)

ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
ScyllaDB177 vistas
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
ScyllaDB79 vistas

Último(20)

What's New in Scylla Monitoring 3.0

  • 1. What’s new in Scylla Monitoring 3.0 Amnon Heiman, Software Developer
  • 2. Presenter Amnon Heiman, Software Developer at ScyllaDB Over 15 years of experience in software development of large scale systems. Previously worked at Convergin, which was acquired by Oracle. Holds a BA and MSc in Computer Science from the Technion- Machon Technologi Le’ Israel and an MBA from Tel Aviv University.
  • 3. What is New ■ Stack Overview ■ Versions Update ■ New Dashboards ■ Alerts ■ New Features ■ Scylla Manager Integration
  • 6. Versions upgrade ■ Grafana 6.4.3 ■ Prometheus 2.13.1 ■ Alert-manager 0.17 ■ Scylla Open source 3.1 ■ Scylla Enterprise 2019.1 ■ Scylla Manager 2.0 (upcoming version) ■ Scylla Alternator ■ Python 3
  • 7. Dashboards Reorganization User facing Dashboards ■ Overview - General overview of the cluster ■ Detailed - Drilldown look at a Scylla Node ■ CQL - CQL metrics and CQL Optimization ■ Alternator - Alternator metrics ■ Manager - Scylla Manager metrics ■ OS - OS related metrics (disk, network) Scylla’s support Oriented Dashboards ■ Errors - Scylla’s internal errors ■ CPU ■ IO
  • 8. New Dashboard - CQL Commands ■ Inserts ■ Reads ■ Deletes ■ Updates ■ Batches
  • 9. New Dashboard - CQL cont’ Optimization ■ Prepared Statements ■ Paged Queries ■ Token Aware ■ Reversed Read ■ Allow filtering ■ Consistency Level issues ■ Cross DC traffic
  • 10. New Dashboard - Alternator (DynamoDB API) ■ Cluster overview ■ Data Plane Actions ■ Data Plane Latencies ■ Control Plane Actions ■ Cache ■ Timeouts
  • 11. Additional Alerts ■ Alerts are shown in the dashboard and can connect to external systems ■ New alerts: ● Low disk size ● CQL connectivity
  • 12. How to Add an Alert ■ Part of the Prometheus configuration (prometheus.rules.yml) ■ Structure ● Name ● What happened ● For how long ● What to report https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
  • 13. How to Add an Alert - Example - alert: InstanceDown expr: up == 0 for: 30s labels: severity: "2" annotations: description:'description...' summary: Instance is down Name Prometheus expression Duration Labels Set to the Alert severity is important longer description Summary
  • 14. Annotations ■ Annotations are markers on the graph
  • 15. Monitoring/Manager tighter integration ■ Manager has a Consul API ■ Prometheus can read the node list from the Manager ■ No configuration files are needed
  • 16. Thank you Stay in touch Any questions? Amnon Heiman amnon@scylladb.com @amnonheiman

Notas del editor

  1. Lets start off by taking a look at an overview of our monitoring stack. Our Monitoring stack uses Prometheus for Metrics collection and storage. In order to create dashboards-display we use Grafana that reads these metrics from Prometheus. Prometheus can generate alerts, the alertmanager receives these alerts and serves as a data source for Grafana as well.
  2. Now, let’s discuss the changes. The applications and framework listed up here have all been upgraded. In particular Grafana 6 comes with a new look and extensioned abilities. By the way you no longer need python.
  3. The major change, is the dashboards reorganization to make the dashboards clearer and easier to use. The overview dashboard provides at quick glance how well the cluster is operating. Detailed - Drilldown detailed look at a Scylla Node
  4. Let's look at the new dashboards. The CQL dashboard is based on a talk by Shlomi in last year summit. It has two parts, the first covers the CQL commands.
  5. The second part is for CQL optimization. When everything is functioning optimally, all gauge should be at zero. On the other hand when the gauge is above zero it indicates potential problem.
  6. We recently introduced Scylla’s Alternator, which is a DynamoDB API for scylla. The Alternator dashboard provides a picture of what the alternator is doing.
  7. We now alert on low diskspace and cql connectivity problems.
  8. Many of our users have been asking about adding alerts themselves. Prometheus will fire an alert if a condition is met within a certain period of time. The alert will contain additional text explaining what is happening.
  9. This is what an alert configuration looks like To add an alert, you give it a name, you write an expression, define a minimum duration, and typically you will add labels, description and summary.
  10. Annotations are a new features which highlights events helping the users understand the system behaviour.
  11. Finally, we have tighter integration with the Manager. You can now set your monitoring to read its configuration directly from the manager instead of configuring it manually.