SlideShare una empresa de Scribd logo
1 de 31
a open-source monitoring solution.
Prometheus - Monitoring system &
time series database
Takeaways:
• What is Prometheus?
• Difference Between Nagios vs Prometheus
• PromQL (Prometheus Query Language)
• Time series DB
• Grafana
• Live Demo
What is Prometheus?
• Prometheus is an open-source systems monitoring and alerting
toolkit originally built at SoundCloud.
• Inspired by Google’s Borgmon Monitoring System
• Written in Go .. Go, also known as Golang.. Go is syntactically
similar to C. Go is widely used in production at Google and in
many other organizations and open-source projects.
• It is now a standalone open source project and maintained
independently of any company. To emphasize this, and to clarify
the project's governance structure, Prometheus joined the CNCF
in 2016 as the second hosted project, after Kubernetes.
• The core Prometheus server is a single binary, with no
dependencies like Zookeeper, Consul, Cassandra, Hadoop or the
internet. All it needs is local disk, preferably an SSD.
• It is a systems and service monitoring system. It collects metrics
from configured targets at given intervals, evaluates rule
expressions, displays the results, and can trigger alerts if some
condition is observed to be true.
https://appinventiv.com/blog/mini-guide-to-go-programming-language/
ABOUT
• The Linux Foundation is the parent.
• OpenSource cloud computing for applications. Not
to confuse with OpenStack which is for
infrastructure.
• Netflix pioneered the concept of cloud native as a
practical tool
• Cloud native is a term used to describe container-
based environments. Cloud native technologies are
used to develop applications built with services
packaged in containers, deployed as microservices
and managed on elastic infrastructure through
agile DevOps processes and continuous delivery
workflows.
• August 9, 2018 - CNCF Announces Prometheus
Graduation.
https://www.cncf.io/webinars/what-is-cloud-native-and-why-does-it-exist/
Why Prometheus?
 Multi-Dimensional Data Model – Ex: instance, service, endpoint, and method.
 Operational Simplicity
 Scalable data Collection
 Powerful query Language.
All of these features existed in various systems.
However, Prometheus combined them all.
Nagios – an Overview
• The Industry Standard In IT Infrastructure Monitoring
• First launched in 1999.Nagios is officially sponsored by Nagios Enterprises.
• Nagios Core, is a free and open-source computer-software application that monitors systems,
networks and infrastructure. Nagios offers monitoring and alerting services for servers, switches,
applications and services. It alerts users when things go wrong and alerts them a second time when
the problem has been resolved.
• NDOUTILS -The NDOUTILS addon is designed to store all configuration and event data from Nagios
in a database. It requires a MariaDB or MySQL database for storing Nagios Core data .
• RRDtool and Highcharts are included to create customizable graphs that can be displayed in
dashboards.
• (Nagios Core vs Nagios XI) Nagios Core is open source whereas Nagios XI is a commercial,
enterprise version of Nagios.
• Historical performance data that is used to generate graphs are stored in Round Robin Database
(RRD) files.
• Rrdcached - On a Nagios XI server, rrdcached collects host and service performance data and then
flushes it to the appropriate rrd files at a specified interval. This reduces the amount of disk activity
needed to keep a large number of rrd files current for performance graphs.
Nagios vs Prometheus
• Nagios is primarily about alerting based on the exit codes of
scripts.
• Nagios is host-based. Each host can have one or more services
and each service can perform one check.
• There is no notion of labels or a query language.
• Nagios has no storage per-se, beyond the current check state.
There are plugins which can store data such as for
visualisation.
• Nagios XI - Using Grafana With Existing Performance Data:
Grafana uses the existing performance data files (RRD) to
generate the graphs.
• Overall, Nagios is suitable for basic monitoring of small and/or
static systems where blackbox probing is sufficient. If you want
to do whitebox monitoring, or have a dynamic or cloud based
environment, then Prometheus is a good choice.
Cacti Cacti
Should we cry or laugh?
Prometheus – By Canonical
• Ref:
https://prometheus.io/blog/2016/11/16/interview-with-canonical/
Architecture
Architecture - Explanation
• Prometheus scrapes metrics from instrumented jobs, either directly or via an
intermediary push gateway for short-lived jobs. It stores all scraped samples
locally and runs rules over this data to either aggregate and record new time
series from existing data or generate alerts.
• Also pulling is slightly better than pushing.
• For cases where you must push, we offer the Pushgateway as occasionally you
will need to monitor components which cannot be scraped. The Prometheus
Pushgateway allows you to push time series from short-lived service-level batch
jobs to an intermediary job which Prometheus can scrape.
• Limitation:-Not for Billing using the status collected for monitoring as as the
collected data will likely not be detailed and complete enough.
• Grafana or other API consumers can be used to visualize the collected data.
Alertmanager
• Grouping: Useful during larger outages when many systems fail at once and
hundreds to thousands of alerts may be firing simultaneously
• Inhibition is a concept of suppressing notifications for certain alerts if certain
other alerts are already firing.
• Silences are a straightforward way to simply mute alerts for a given time
• Following external systems are supported:
Email
Generic Webhooks
HipChat
OpsGenie
PagerDuty
Pushover
Slack
• To make Prometheus highly available: Run identical Prometheus servers on two or
more separate machines. Identical alerts will be deduplicated by the Alertmanager.
Time Series Database (TSDB)
• What is a time series -The value of something tracked over time.
• Labels (key/value pairs). Identifier -> (t0, v0), (t1, v1), (t2, v2), (t3, v3), .... Each data
point is a tuple of a timestamp and a value. For the purpose of monitoring, the
timestamp is an integer and the value any number.
Example : - This could be temperature once a day, or requests to your API once a minute.
The latter could look like:
my_api_requests: 5@1:00PM 2@1:01PM 18@1:02PM
• Fundamentally the same as the one of OpenTSDB
• Prometheus includes a local on-disk time series database, but also optionally
integrates with remote storage systems
• Ingested samples are grouped into blocks of two hours. Each two-hour block
consists of a directory containing one or more chunk files that contain all time
series samples for that window of time, as well as a metadata file and index file
(which indexes metric names and labels to time series in the chunk files). When
series are deleted via the API, deletion records are stored in separate tombstone
files (instead of deleting the data immediately from the chunk files).
• limitation of the local storage is that it is not clustered or replicated. Hence Using
RAID for disk availiablity, snapshots for backups, capacity planning, etc, is
recommended for improved durability. Alternatively, external storage may be used
via the remote read/write APIs.
TSDB Configuration:-
• Prometheus has several flags that allow configuring the local storage.
The most important ones are:
--storage.tsdb.path: This determines where Prometheus writes its database. Defaults to data/.
--storage.tsdb.retention.time: This determines when to remove old data. Defaults to 15d.
--storage.tsdb.retention.size: This determines the maximum number of bytes that storage blocks can use The oldest
data will be removed first. Defaults to 0 or disabled.
--storage.tsdb.wal-compression: This flag enables compression of the write-ahead log (WAL). Depending on your data,
you can expect the WAL size to be halved with little extra cpu load.
• TSDB Storage as follows
Prometheus - Demo
Free Online Demo:
http://demo.robustperception.io:9090/graph
• Prometheus means Forethinker
• Prometheus is Titan. i.e A titan is an
extremely important person. Albert Einstein
was a titan in the world of science.
• A Trickster figure, he was a champion of
mankind known for his wily intelligence,
who stole fire from Zeus and the gods and
gave it to mortals.
• Prometheus is a 2012 science fiction film of
spaceship.
Are You a Titan or just wearing Titan Watch?
Let’s Start - Prometheus
• Prerequisite: Configure Prometheus.yml (i.e scrape interval, target server to be monitored, alertmanager configuration, etc)
• Config file is written in YAML format. Prometheus can reload its configuration at runtime. A configuration reload is triggered by sending a
SIGHUP to the Prometheus process or sending a HTTP POST request to the /-/reload endpoint (when the --web.enable-lifecycle flag is
enabled).
• The kill command can send all of the above signals to commands and process. However, commands only give response if they are
programmed to recognize those signals. Particularly useful signals include: There are 64 signal(kill –l), Some are as below
 SIGHUP (1) - Hangup detected on controlling terminal or death of controlling process.
 SIGKILL (9) - Kill signal i.e. kill running process.
 SIGSTOP (19) - Stop process.
 SIGCONT (18) - Continue process if stopped.
To send a kill signal to PID # 1234 use: kill -9 1234
To send a kSIGHUP signal to PID # 1234 use: kill -1 1234
Prometheus – Exporter
• Exporters bridge the gap between Prometheus and system which don’t export metrics
in the Prometheus format.
• There are official & externally contributed exporter available like for mysql, oracledb,
DELL/IBM Hw, jira,Hadoop storage, apache http,AWS APIs, Docker,SNMP etc
https://prometheus.io/docs/instrumenting/exporters/
• Build Your Own Exporter:-
 Important Cronjob success or not.
 Any New Error from timesten db - error.log
 Online Selling Website perspective – Total order success vs failure.
 Order Data Metric - Dashboard Integration
 Important file received/processed or not.
 Top selling product/category
 5star to 1star review metric analysis.
 etc.
Node-Exporter - Monitors For hardware and OS
Metrics
PromQL - Prometheus Query Language
• Prometheus provides a functional query
language.
• It lets user select and aggregate time series data
in real time. The result of an expression can either
be shown as a graph, viewed as tabular data in
Prometheus's expression browser, or consumed
by external systems via the HTTP API.
• The Prometheus query language allows you to
slice and dice the dimensional data for ad-hoc
exploration, graphing, and alerting.
Time Series Selectors
• Instant Vector - One Value per time series Guaranteed. In the simplest
form, only a metric name is specified
• Range Vector - Any Number of Value between two timestamps. a
range duration is appended in square brackets ([]) at the end of a
vector selector
Metric types
• Counter :A counter is a cumulative metric that
represents a single monotonically increasing counter
whose value can only increase or be reset to zero on
restart. For example, you can use a counter to represent
the number of requests served, tasks completed, or
errors.
• Gauge :A gauge is a metric that represents a single
numerical value that can arbitrarily go up and down. i.e
temperatures or current memory usage
• Histogram :A histogram samples observations (usually
things like request durations or response sizes) and
counts them in configurable buckets.
• Summary:Similar to a histogram, a summary samples
observations (usually things like request durations and
response sizes).
https://povilasv.me/prometheus-tracking-request-duration/
Operators
• Binary Comparison Operators:
== , !=, >,<,>=,<=
• Binary Arithmetic Operators:
+, -, *, /,% (modulo), ^(power/exponentiation)
• Logical/set Binary operators:
and (intersection),or (union),unless (complement)
• Built-in aggregation operators:
sum, min, max, avg, stddev,stdvar,count, count_values, bottomk, topk, quantile
- These operators can either be used to aggregate over all label dimensions or preserve
distinct dimensions using,
by, without
https://blog.pvincent.io/2017/12/prometheus-blog-series-part-2-metric-types/
Basic Functions
• PromQL has 46 functions & growing…
• Most of the mathematical functions &
day, month, year, minute, hour, time are
avilable.
• In Prometheus perspective, we use
below mostly,
 Rate()
 irate() -irate should only be used when graphing
volatile, fast-moving counters.
 increase()
 label_join()/label_replace()
 <aggregation>_over_time()
min_over_time
max_over_time
avg_over_time
sum_over_time
count_over_time
Wow! Functions
• delta()
• holt_winters()
• predict_linear()
• clamp_max()
• clamp_min()
• histogram_quantile()
Holt-Winters
https://www.otexts.org/fpp/7/5
New Relic Doc
 Averages unfortunately have the big drawback
of hiding distribution and prevent the discovery
of outliers/deviation.
 Quantiles are better measurement for this kind
of metrics, as they allow to understand
distribution. For example, if the request latency
0.5-quantile (50th percentile) is 100ms, it
means that 50% of requests completed under
100ms. Similarly, if the 0.99-quantile (99th
percentile) is 4s, it means that 1% of requests
responded in more than 4s.
predict_linear()
Demo Queries
• max by(instance)(node_filesystem_size_bytes)
• max without(device, fstype, mountpoint)(node_filesystem_size_bytes)
• sum without(device, fstype, mountpoint)(node_filesystem_size_bytes)
• sum(node_filesystem_size_bytes)
• round(sum(node_filesystem_size_bytes)/1024/1024/1024)
• round(sum by(instance, device)(node_filesystem_size_bytes)/1024/1024/1024)
• rate(node_load1[5m])
• rate(node_cpu_seconds_total{mode="system"}[5m])
• min_over_time(node_load1[5m])
• max_over_time(node_load1[5m])
• avg_over_time(node_load1[5m])
• sum_over_time(node_load1[5m])
• count_over_time(node_load1[5m])
• delta(node_hwmon_temp_celsius[1h])
• clamp_max(node_load1,1.2)
• clamp_min(clamp_max(node_load1,1.2),1.05)
• predict_linear(node_load1[1h],4*3600)
• quantile without(cpu)(0.9, rate(node_cpu_seconds_total{mode="system"}[5m]))
• topk(3, sum by (mode) (node_cpu_seconds_total))
• bottomk(3, sum by (le) (alertmanager_http_request_duration_seconds_bucket))
Grafana – Demo
• Download and install grafana as described in url https://grafana.com/grafana/download/beta
• Post install, Follow as below to start, stop or check status accordingly. There are different way
too, follow installation guide for more data (attached logs)
gmv-evo@gmvevo:~/Downloads$ sudo systemctl start grafana-server
gmv-evo@gmvevo:~/Downloads$ sudo systemctl status grafana-server
gmv-evo@gmvevo:~/Downloads$ sudo systemctl stop grafana-server
• Open Url as follows and configure login process -http://localhost:3000.
• Configure Prometheus dashboard as generic and import Node Exporter dashboard: -
https://grafana.com/grafana/dashboards/1860
Wow! Grafana – An Dashboard Does for us!!!
Out of Syllabus – Trigger to look out
• Remote Endpoints and Storage - long term storage
• Alertmanager - Webhook Receiver (Gmail, etc)
• Prometheus Concerns - fixed by Cortex and Thanos
https://grafana.com/blog/2019/11/21/promcon-recap-two-
households-both-alike-in-dignity-cortex-and-thanos/
• Prometheus open bugs and fixes:
https://github.com/prometheus/prometheus/issues?
• Cloud Monitoring : Nagios vs. Prometheus
• Google's mtail - Extract Prometheus metrics from application logs.
• Prometheus is a system to collect and process metrics, not an event
logging system - ELK stack Answer.
Study Material –Free & Cost
Free
• https://prometheus.io/docs/introduction/overview/
• https://promcon.io/2019-munich/stream/
• Prometheus Monitoring : The Definitive Guide in 2019
• subreddit collecting all Prometheus-related resources on the internet.
• https://training.robustperception.io/ - Introduction to Prometheus
• Soundcloud - What makesPrometheusa “next generation”monitoring
system?
Cost
• Understanding PromQL by Robust Perception
• Prometheus: Up & Running by oreilly
Thanks for Listening!!!
be happy and make happy @how? given by my aasan:-
Go below what you have # Dream above what you have # First love what you have
Spread info what you have # Get info what others have # Help as per what you have

Más contenido relacionado

La actualidad más candente

Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaSyah Dwi Prihatmoko
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy Docker, Inc.
 
Prometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingPrometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingJulien Pivotto
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheusKasper Nissen
 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with PrometheusQAware GmbH
 
End to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max IndenEnd to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max IndenParis Container Day
 
Prometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring SystemPrometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring SystemFabian Reinartz
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)Lucas Jellema
 
How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?Wojciech Barczyński
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusGrafana Labs
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with PrometheusShiao-An Yuan
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTWNGINX, Inc.
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheusCeline George
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introductionRico Chen
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Grafana Labs
 
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdfPrometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdfKnoldus Inc.
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with BackstageOpsta
 

La actualidad más candente (20)

Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
 
Prometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingPrometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is coming
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheus
 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with Prometheus
 
End to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max IndenEnd to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max Inden
 
Prometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring SystemPrometheus – a next-gen Monitoring System
Prometheus – a next-gen Monitoring System
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Prometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome MonitoringPrometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome Monitoring
 
How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTW
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
 
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdfPrometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
Prometheus-Grafana-RahulSoni1584KnolX.pptx.pdf
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with Backstage
 

Similar a Prometheus - Intro, CNCF, TSDB,PromQL,Grafana

Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructureFernando Lopez Aguilar
 
MongoDB at MapMyFitness
MongoDB at MapMyFitnessMongoDB at MapMyFitness
MongoDB at MapMyFitnessMapMyFitness
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataGetInData
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source Nitesh Jadhav
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Brian Brazil
 
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin Kuberton
 
Streaming meetup
Streaming meetupStreaming meetup
Streaming meetupkarthik_krk
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scaleAdam Hamsik
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scaleJuraj Hantak
 
Using Sumo Logic - Apr 2018
Using Sumo Logic - Apr 2018Using Sumo Logic - Apr 2018
Using Sumo Logic - Apr 2018Sumo Logic
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Brian Brazil
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...GetInData
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITOpenStack
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Brian Brazil
 

Similar a Prometheus - Intro, CNCF, TSDB,PromQL,Grafana (20)

System monitoring
System monitoringSystem monitoring
System monitoring
 
Nagios En
Nagios EnNagios En
Nagios En
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
MongoDB at MapMyFitness
MongoDB at MapMyFitnessMongoDB at MapMyFitness
MongoDB at MapMyFitness
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)
 
Graylog
GraylogGraylog
Graylog
 
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps Perspective
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
 
Streaming meetup
Streaming meetupStreaming meetup
Streaming meetup
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
Using Sumo Logic - Apr 2018
Using Sumo Logic - Apr 2018Using Sumo Logic - Apr 2018
Using Sumo Logic - Apr 2018
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
 

Último

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Último (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Prometheus - Intro, CNCF, TSDB,PromQL,Grafana

  • 1. a open-source monitoring solution. Prometheus - Monitoring system & time series database
  • 2. Takeaways: • What is Prometheus? • Difference Between Nagios vs Prometheus • PromQL (Prometheus Query Language) • Time series DB • Grafana • Live Demo
  • 3. What is Prometheus? • Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. • Inspired by Google’s Borgmon Monitoring System • Written in Go .. Go, also known as Golang.. Go is syntactically similar to C. Go is widely used in production at Google and in many other organizations and open-source projects. • It is now a standalone open source project and maintained independently of any company. To emphasize this, and to clarify the project's governance structure, Prometheus joined the CNCF in 2016 as the second hosted project, after Kubernetes. • The core Prometheus server is a single binary, with no dependencies like Zookeeper, Consul, Cassandra, Hadoop or the internet. All it needs is local disk, preferably an SSD. • It is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true. https://appinventiv.com/blog/mini-guide-to-go-programming-language/
  • 4. ABOUT • The Linux Foundation is the parent. • OpenSource cloud computing for applications. Not to confuse with OpenStack which is for infrastructure. • Netflix pioneered the concept of cloud native as a practical tool • Cloud native is a term used to describe container- based environments. Cloud native technologies are used to develop applications built with services packaged in containers, deployed as microservices and managed on elastic infrastructure through agile DevOps processes and continuous delivery workflows. • August 9, 2018 - CNCF Announces Prometheus Graduation. https://www.cncf.io/webinars/what-is-cloud-native-and-why-does-it-exist/
  • 5. Why Prometheus?  Multi-Dimensional Data Model – Ex: instance, service, endpoint, and method.  Operational Simplicity  Scalable data Collection  Powerful query Language. All of these features existed in various systems. However, Prometheus combined them all.
  • 6. Nagios – an Overview • The Industry Standard In IT Infrastructure Monitoring • First launched in 1999.Nagios is officially sponsored by Nagios Enterprises. • Nagios Core, is a free and open-source computer-software application that monitors systems, networks and infrastructure. Nagios offers monitoring and alerting services for servers, switches, applications and services. It alerts users when things go wrong and alerts them a second time when the problem has been resolved. • NDOUTILS -The NDOUTILS addon is designed to store all configuration and event data from Nagios in a database. It requires a MariaDB or MySQL database for storing Nagios Core data . • RRDtool and Highcharts are included to create customizable graphs that can be displayed in dashboards. • (Nagios Core vs Nagios XI) Nagios Core is open source whereas Nagios XI is a commercial, enterprise version of Nagios. • Historical performance data that is used to generate graphs are stored in Round Robin Database (RRD) files. • Rrdcached - On a Nagios XI server, rrdcached collects host and service performance data and then flushes it to the appropriate rrd files at a specified interval. This reduces the amount of disk activity needed to keep a large number of rrd files current for performance graphs.
  • 7. Nagios vs Prometheus • Nagios is primarily about alerting based on the exit codes of scripts. • Nagios is host-based. Each host can have one or more services and each service can perform one check. • There is no notion of labels or a query language. • Nagios has no storage per-se, beyond the current check state. There are plugins which can store data such as for visualisation. • Nagios XI - Using Grafana With Existing Performance Data: Grafana uses the existing performance data files (RRD) to generate the graphs. • Overall, Nagios is suitable for basic monitoring of small and/or static systems where blackbox probing is sufficient. If you want to do whitebox monitoring, or have a dynamic or cloud based environment, then Prometheus is a good choice.
  • 8. Cacti Cacti Should we cry or laugh?
  • 9. Prometheus – By Canonical • Ref: https://prometheus.io/blog/2016/11/16/interview-with-canonical/
  • 11. Architecture - Explanation • Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs. It stores all scraped samples locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts. • Also pulling is slightly better than pushing. • For cases where you must push, we offer the Pushgateway as occasionally you will need to monitor components which cannot be scraped. The Prometheus Pushgateway allows you to push time series from short-lived service-level batch jobs to an intermediary job which Prometheus can scrape. • Limitation:-Not for Billing using the status collected for monitoring as as the collected data will likely not be detailed and complete enough. • Grafana or other API consumers can be used to visualize the collected data.
  • 12. Alertmanager • Grouping: Useful during larger outages when many systems fail at once and hundreds to thousands of alerts may be firing simultaneously • Inhibition is a concept of suppressing notifications for certain alerts if certain other alerts are already firing. • Silences are a straightforward way to simply mute alerts for a given time • Following external systems are supported: Email Generic Webhooks HipChat OpsGenie PagerDuty Pushover Slack • To make Prometheus highly available: Run identical Prometheus servers on two or more separate machines. Identical alerts will be deduplicated by the Alertmanager.
  • 13. Time Series Database (TSDB) • What is a time series -The value of something tracked over time. • Labels (key/value pairs). Identifier -> (t0, v0), (t1, v1), (t2, v2), (t3, v3), .... Each data point is a tuple of a timestamp and a value. For the purpose of monitoring, the timestamp is an integer and the value any number. Example : - This could be temperature once a day, or requests to your API once a minute. The latter could look like: my_api_requests: 5@1:00PM 2@1:01PM 18@1:02PM • Fundamentally the same as the one of OpenTSDB • Prometheus includes a local on-disk time series database, but also optionally integrates with remote storage systems • Ingested samples are grouped into blocks of two hours. Each two-hour block consists of a directory containing one or more chunk files that contain all time series samples for that window of time, as well as a metadata file and index file (which indexes metric names and labels to time series in the chunk files). When series are deleted via the API, deletion records are stored in separate tombstone files (instead of deleting the data immediately from the chunk files). • limitation of the local storage is that it is not clustered or replicated. Hence Using RAID for disk availiablity, snapshots for backups, capacity planning, etc, is recommended for improved durability. Alternatively, external storage may be used via the remote read/write APIs.
  • 14. TSDB Configuration:- • Prometheus has several flags that allow configuring the local storage. The most important ones are: --storage.tsdb.path: This determines where Prometheus writes its database. Defaults to data/. --storage.tsdb.retention.time: This determines when to remove old data. Defaults to 15d. --storage.tsdb.retention.size: This determines the maximum number of bytes that storage blocks can use The oldest data will be removed first. Defaults to 0 or disabled. --storage.tsdb.wal-compression: This flag enables compression of the write-ahead log (WAL). Depending on your data, you can expect the WAL size to be halved with little extra cpu load. • TSDB Storage as follows
  • 15. Prometheus - Demo Free Online Demo: http://demo.robustperception.io:9090/graph
  • 16. • Prometheus means Forethinker • Prometheus is Titan. i.e A titan is an extremely important person. Albert Einstein was a titan in the world of science. • A Trickster figure, he was a champion of mankind known for his wily intelligence, who stole fire from Zeus and the gods and gave it to mortals. • Prometheus is a 2012 science fiction film of spaceship. Are You a Titan or just wearing Titan Watch?
  • 17. Let’s Start - Prometheus • Prerequisite: Configure Prometheus.yml (i.e scrape interval, target server to be monitored, alertmanager configuration, etc) • Config file is written in YAML format. Prometheus can reload its configuration at runtime. A configuration reload is triggered by sending a SIGHUP to the Prometheus process or sending a HTTP POST request to the /-/reload endpoint (when the --web.enable-lifecycle flag is enabled). • The kill command can send all of the above signals to commands and process. However, commands only give response if they are programmed to recognize those signals. Particularly useful signals include: There are 64 signal(kill –l), Some are as below  SIGHUP (1) - Hangup detected on controlling terminal or death of controlling process.  SIGKILL (9) - Kill signal i.e. kill running process.  SIGSTOP (19) - Stop process.  SIGCONT (18) - Continue process if stopped. To send a kill signal to PID # 1234 use: kill -9 1234 To send a kSIGHUP signal to PID # 1234 use: kill -1 1234
  • 18. Prometheus – Exporter • Exporters bridge the gap between Prometheus and system which don’t export metrics in the Prometheus format. • There are official & externally contributed exporter available like for mysql, oracledb, DELL/IBM Hw, jira,Hadoop storage, apache http,AWS APIs, Docker,SNMP etc https://prometheus.io/docs/instrumenting/exporters/ • Build Your Own Exporter:-  Important Cronjob success or not.  Any New Error from timesten db - error.log  Online Selling Website perspective – Total order success vs failure.  Order Data Metric - Dashboard Integration  Important file received/processed or not.  Top selling product/category  5star to 1star review metric analysis.  etc.
  • 19. Node-Exporter - Monitors For hardware and OS Metrics
  • 20. PromQL - Prometheus Query Language • Prometheus provides a functional query language. • It lets user select and aggregate time series data in real time. The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus's expression browser, or consumed by external systems via the HTTP API. • The Prometheus query language allows you to slice and dice the dimensional data for ad-hoc exploration, graphing, and alerting.
  • 21. Time Series Selectors • Instant Vector - One Value per time series Guaranteed. In the simplest form, only a metric name is specified • Range Vector - Any Number of Value between two timestamps. a range duration is appended in square brackets ([]) at the end of a vector selector
  • 22. Metric types • Counter :A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors. • Gauge :A gauge is a metric that represents a single numerical value that can arbitrarily go up and down. i.e temperatures or current memory usage • Histogram :A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. • Summary:Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). https://povilasv.me/prometheus-tracking-request-duration/
  • 23. Operators • Binary Comparison Operators: == , !=, >,<,>=,<= • Binary Arithmetic Operators: +, -, *, /,% (modulo), ^(power/exponentiation) • Logical/set Binary operators: and (intersection),or (union),unless (complement) • Built-in aggregation operators: sum, min, max, avg, stddev,stdvar,count, count_values, bottomk, topk, quantile - These operators can either be used to aggregate over all label dimensions or preserve distinct dimensions using, by, without https://blog.pvincent.io/2017/12/prometheus-blog-series-part-2-metric-types/
  • 24. Basic Functions • PromQL has 46 functions & growing… • Most of the mathematical functions & day, month, year, minute, hour, time are avilable. • In Prometheus perspective, we use below mostly,  Rate()  irate() -irate should only be used when graphing volatile, fast-moving counters.  increase()  label_join()/label_replace()  <aggregation>_over_time() min_over_time max_over_time avg_over_time sum_over_time count_over_time
  • 25. Wow! Functions • delta() • holt_winters() • predict_linear() • clamp_max() • clamp_min() • histogram_quantile() Holt-Winters https://www.otexts.org/fpp/7/5 New Relic Doc  Averages unfortunately have the big drawback of hiding distribution and prevent the discovery of outliers/deviation.  Quantiles are better measurement for this kind of metrics, as they allow to understand distribution. For example, if the request latency 0.5-quantile (50th percentile) is 100ms, it means that 50% of requests completed under 100ms. Similarly, if the 0.99-quantile (99th percentile) is 4s, it means that 1% of requests responded in more than 4s. predict_linear()
  • 26. Demo Queries • max by(instance)(node_filesystem_size_bytes) • max without(device, fstype, mountpoint)(node_filesystem_size_bytes) • sum without(device, fstype, mountpoint)(node_filesystem_size_bytes) • sum(node_filesystem_size_bytes) • round(sum(node_filesystem_size_bytes)/1024/1024/1024) • round(sum by(instance, device)(node_filesystem_size_bytes)/1024/1024/1024) • rate(node_load1[5m]) • rate(node_cpu_seconds_total{mode="system"}[5m]) • min_over_time(node_load1[5m]) • max_over_time(node_load1[5m]) • avg_over_time(node_load1[5m]) • sum_over_time(node_load1[5m]) • count_over_time(node_load1[5m]) • delta(node_hwmon_temp_celsius[1h]) • clamp_max(node_load1,1.2) • clamp_min(clamp_max(node_load1,1.2),1.05) • predict_linear(node_load1[1h],4*3600) • quantile without(cpu)(0.9, rate(node_cpu_seconds_total{mode="system"}[5m])) • topk(3, sum by (mode) (node_cpu_seconds_total)) • bottomk(3, sum by (le) (alertmanager_http_request_duration_seconds_bucket))
  • 27. Grafana – Demo • Download and install grafana as described in url https://grafana.com/grafana/download/beta • Post install, Follow as below to start, stop or check status accordingly. There are different way too, follow installation guide for more data (attached logs) gmv-evo@gmvevo:~/Downloads$ sudo systemctl start grafana-server gmv-evo@gmvevo:~/Downloads$ sudo systemctl status grafana-server gmv-evo@gmvevo:~/Downloads$ sudo systemctl stop grafana-server • Open Url as follows and configure login process -http://localhost:3000. • Configure Prometheus dashboard as generic and import Node Exporter dashboard: - https://grafana.com/grafana/dashboards/1860
  • 28. Wow! Grafana – An Dashboard Does for us!!!
  • 29. Out of Syllabus – Trigger to look out • Remote Endpoints and Storage - long term storage • Alertmanager - Webhook Receiver (Gmail, etc) • Prometheus Concerns - fixed by Cortex and Thanos https://grafana.com/blog/2019/11/21/promcon-recap-two- households-both-alike-in-dignity-cortex-and-thanos/ • Prometheus open bugs and fixes: https://github.com/prometheus/prometheus/issues? • Cloud Monitoring : Nagios vs. Prometheus • Google's mtail - Extract Prometheus metrics from application logs. • Prometheus is a system to collect and process metrics, not an event logging system - ELK stack Answer.
  • 30. Study Material –Free & Cost Free • https://prometheus.io/docs/introduction/overview/ • https://promcon.io/2019-munich/stream/ • Prometheus Monitoring : The Definitive Guide in 2019 • subreddit collecting all Prometheus-related resources on the internet. • https://training.robustperception.io/ - Introduction to Prometheus • Soundcloud - What makesPrometheusa “next generation”monitoring system? Cost • Understanding PromQL by Robust Perception • Prometheus: Up & Running by oreilly
  • 31. Thanks for Listening!!! be happy and make happy @how? given by my aasan:- Go below what you have # Dream above what you have # First love what you have Spread info what you have # Get info what others have # Help as per what you have