SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
Observing the HashiCorp Ecosystem From Prometheus
Kris Buytaert & Julien Pivotto
June 21, 2022
O11y
Who are we ?
O11y 0
Kris Buytaert
• I used to be a developer
• Then I became an Ops person
• Chief Trolling/Travel/Technical Officer @ Inuits.eu
• Chief Yak Shaver @ o11y.eu
• Organiser of #devopsdays, #cfgmgmtcamp, #loadays, ...
• Cofounder of all of the above
• Everything is a Freaking DNS Problem
• DNS : devops needs sushi
• @krisbuytaert on twitter/github
O11y 1
Julien Pivotto
• Prometheus maintainer
• Open Source Observability Expert
• Principal Software Architect & CoFounder @ o11y.eu
• DevOps believer
• @roidelapluie on twitter/github
O11y 2
O11y
• Inuits.eu Spinoff
• Open Source Observability
• Currently supporting the Prometheus Ecosystem
• Professional Services & Support (now)
• Long Term Enterprise Support (next month)
• Prometheus Distribution (soon)
O11y 3
Introduction, a brief history of Open Source Monitoring
O11y 3
July 2008 Ottawa Linux Symposium Paper
• Bloated Java Tools
• Dysfunctional Open Core Software
• DBA Required
• Nagios was king in the Open Source world
O11y 4
June 2011 #monitoringsucks
• John Vincent (@lusis) , june 2011
• A #devops sub-movement
• (manual configuration, not in sync with reality, hosts only, services sometimes,
applications never)
O11y 5
October 2011 #monitoringlove
• Ulf Mansson, #devopsdays Rome 2011
• A new found love for monitoring
• Triggered by { New Open Source Tools * Automation }
O11y 6
November 2012 Prometheus
O11y 7
What is monitoring?
• High level overview of the state of a service/component
• Availability
• Technical components
• Performance ?
What is going on?
O11y 8
Pitfalls of traditional monitoring
• Drift from reality
• Total lack of automation
• Total lack of automation
• Total lack of automation
• Total lack of automation
• Partial automation
• Lots of work to maintain
• Binary states: it works - it does not work
• Alert fatigue
• Alert fatigue
• Alert fatigue
• Alert fatigue
O11y 9
What is observability?
• Understand how your services behave
• Like you are at their place
• Without incident specific code
Why is this going on?
O11y 10
How do monitoring and observability connect?
• Monitoring is required
• If lucky, monitoring is enough
• Observability is removing luck <- @roidelapluie
O11y 11
What is observability - in Practice?
Three pillars:
• Metrics
• Logs
• Traces
O11y 12
Metrics
https:/
/play.grafana.org/
O11y 13
Logs
https:/
/play.grafana.org/
O11y 14
Traces
https:/
/www.jaegertracing.io/
O11y 15
Prometheus
O11y 15
Prometheus
• Prometheus is an Open Source CNCF Project
• Collects and stores metrics
• Pull-based
• Service discovery (including Consul)
• Alerting
O11y 16
The Prometheus ecosystem
• Exporters for every piece of the infra
• Maintained by multiple companies
• Long-Term Support release coming Q3 2022
O11y 17
Prometheus data model
• Metric have labels
• Labels differentiate metrics, e.g.:
• HTTP response code
• Datacenter name
O11y 18
PromQL
• Prometheus Query Language
• Powerful yet simple query language
rate(http_requests_total[5m])
O11y 19
Prometheus + Consul
O11y 19
Observing your services
• consul_sd_configs
• Stream consul services list to Prometheus
• Up-to-date service list
• Use the flexibility of labels
• Add relevant labels
• Filter targets
O11y 20
consul_sd_configs labels
• __meta_consul_service
• __meta_consul_tags
• __meta_consul_node
• __meta_consul_service_metadata_
• __meta_consul_dc
O11y 21
Alerting philosophy
• Page on actionable critical failure
• Avoid paging on Consul Health Check failure
• Keep “ambiance” alerts to get the atmosphere and quickly find the cause
O11y 22
Consul
O11y 22
consul_exporter
• Exporter maintained by Prometheus team
• Expose consul cluster health
• Optionally expose key/values
• e.g. store desired state in KV for graphing
• Connect to a single instance
O11y 23
Consul telemetry
• Built-in
• Runtime metrics (memory, CPU, ...)
• Autopilot, raft metrics
• Calls (rate, errors, latency)
O11y 24
Configure Consul telemetry
Consul configuration:
telemetry {
disable_hostname = true
prometheus_retention_time = "1h"
}
O11y 25
Configure Consul telemetry
Prometheus configuration:
scrape_jobs:
- name: consul
static_configs:
- <consulserver1>:8500
- <consulserver2>:8500
metrics_path: '/v1/agent/metrics'
param:
format: ["prometheus"]
O11y 26
Consul alerts (consul_exporter)
Is consul running?
up{job="consul_exporter"} == 0
consul_up{job="consul_exporter"} == 0
Is there a leader?
consul_raft_leader != 1
Are peers in raft?
sum(consul_raft_peers) != count(up{job="consul"})
O11y 27
Consul alerts (Consul telemetry)
Is consul running?
up{job="consul"} == 0
Is my cluster healthy?
consul_autopilot_healthy == 0
O11y 28
Vault
O11y 28
Configure Vault telemetry
Vault configuration:
telemetry {
disable_hostname = true
prometheus_retention_time = "1h"
}
O11y 29
Configure Consul telemetry
Prometheus configuration:
scrape_jobs:
- name: vault
static_configs:
- <vaultserver1>:8200
- <vaultserver2>:8200
metrics_path: '/v1/sys/metrics'
param:
format: ["prometheus"]
O11y 30
Vault alerting
Is Vault up?
up{job="vault"} == 0
Is Vault sealed?
vault_core_unsealed == 0
Is audit log working?
rate(vault_audit_log_request_failure[5m]) > 0
rate(vault_audit_log_response_failure[5m]) > 0
O11y 31
Alertmanager
O11y 31
Alert inhibition
• Suppressing notification from alerts of other alerts are firing.
• Reduces alerts, e.g. if vault is sealed.
O11y 32
Configuring inhibition
Alertmanager configuration:
inhibit_rules:
- source_match:
alertname: VaultIsSealed
target_match:
alertname: ErrorRateTooHigh
equal: [ datacenter ]
O11y 33
Conclusion
O11y 33
Conclusion
• Alerting should come from your end services
• Consul & Vault focused alerts will pinpoint causes
• Specific Vault & Consul alerts can page you (e.g. sealed)
• Draft dashboards based on your needs (response times, errors, etc)
O11y 34
Contact
O11y
https:/
/o11y.eu
info@o11y.eu
O11y 34

Más contenido relacionado

Similar a Observing the HashiCorp Ecosystem From Prometheus

OpenStack monitoring - Unidata S.p.A. Case Report
OpenStack monitoring - Unidata S.p.A. Case ReportOpenStack monitoring - Unidata S.p.A. Case Report
OpenStack monitoring - Unidata S.p.A. Case Report
Davide Guerri
 
Setting up your multiengine environment Apache Railo ColdFusion
Setting up your multiengine environment Apache Railo ColdFusionSetting up your multiengine environment Apache Railo ColdFusion
Setting up your multiengine environment Apache Railo ColdFusion
ColdFusionConference
 

Similar a Observing the HashiCorp Ecosystem From Prometheus (20)

OCCI status update
OCCI status updateOCCI status update
OCCI status update
 
OpenStack monitoring - Unidata S.p.A. Case Report
OpenStack monitoring - Unidata S.p.A. Case ReportOpenStack monitoring - Unidata S.p.A. Case Report
OpenStack monitoring - Unidata S.p.A. Case Report
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
 
The State of Logging on Docker
The State of Logging on DockerThe State of Logging on Docker
The State of Logging on Docker
 
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
 
Mastering Terraform and the Provider for OCI
Mastering Terraform and the Provider for OCIMastering Terraform and the Provider for OCI
Mastering Terraform and the Provider for OCI
 
Cross Community CI project
Cross Community CI projectCross Community CI project
Cross Community CI project
 
DevOps Days Ohio
DevOps Days OhioDevOps Days Ohio
DevOps Days Ohio
 
Monitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In Azure
 
Open Source Infrastructure / Development & Security > How to make it work?
Open Source Infrastructure / Development & Security > How to make it work? Open Source Infrastructure / Development & Security > How to make it work?
Open Source Infrastructure / Development & Security > How to make it work?
 
KubeCon 2019 Recap (Parts 1-3)
KubeCon 2019 Recap (Parts 1-3)KubeCon 2019 Recap (Parts 1-3)
KubeCon 2019 Recap (Parts 1-3)
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTW
 
Eric Loyd - Fractal Nagios
Eric Loyd - Fractal NagiosEric Loyd - Fractal Nagios
Eric Loyd - Fractal Nagios
 
Monitoring the Hashistack with Prometheus
Monitoring the Hashistack with PrometheusMonitoring the Hashistack with Prometheus
Monitoring the Hashistack with Prometheus
 
Swimming upstream: OPNFV Doctor project case study
Swimming upstream: OPNFV Doctor project case studySwimming upstream: OPNFV Doctor project case study
Swimming upstream: OPNFV Doctor project case study
 
Consul administration at scale
Consul administration at scaleConsul administration at scale
Consul administration at scale
 
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdfOSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
 
Setting up your multiengine environment Apache Railo ColdFusion
Setting up your multiengine environment Apache Railo ColdFusionSetting up your multiengine environment Apache Railo ColdFusion
Setting up your multiengine environment Apache Railo ColdFusion
 
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
 
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - PanoraysHacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
 

Más de Julien Pivotto

Más de Julien Pivotto (20)

Prometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingPrometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is coming
 
What's new in Prometheus?
What's new in Prometheus?What's new in Prometheus?
What's new in Prometheus?
 
Introduction to Grafana Loki
Introduction to Grafana LokiIntroduction to Grafana Loki
Introduction to Grafana Loki
 
Why you should revisit mgmt
Why you should revisit mgmtWhy you should revisit mgmt
Why you should revisit mgmt
 
5 tips for Prometheus Service Discovery
5 tips for Prometheus Service Discovery5 tips for Prometheus Service Discovery
5 tips for Prometheus Service Discovery
 
Prometheus and TLS - an Introduction
Prometheus and TLS - an IntroductionPrometheus and TLS - an Introduction
Prometheus and TLS - an Introduction
 
Powerful graphs in Grafana
Powerful graphs in GrafanaPowerful graphs in Grafana
Powerful graphs in Grafana
 
YAML Magic
YAML MagicYAML Magic
YAML Magic
 
HAProxy as Egress Controller
HAProxy as Egress ControllerHAProxy as Egress Controller
HAProxy as Egress Controller
 
Improved alerting with Prometheus and Alertmanager
Improved alerting with Prometheus and AlertmanagerImproved alerting with Prometheus and Alertmanager
Improved alerting with Prometheus and Alertmanager
 
SIngle Sign On with Keycloak
SIngle Sign On with KeycloakSIngle Sign On with Keycloak
SIngle Sign On with Keycloak
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
 
Incident Resolution as Code
Incident Resolution as CodeIncident Resolution as Code
Incident Resolution as Code
 
Monitor your CentOS stack with Prometheus
Monitor your CentOS stack with PrometheusMonitor your CentOS stack with Prometheus
Monitor your CentOS stack with Prometheus
 
Monitor your CentOS stack with Prometheus
Monitor your CentOS stack with PrometheusMonitor your CentOS stack with Prometheus
Monitor your CentOS stack with Prometheus
 
An introduction to Ansible
An introduction to AnsibleAn introduction to Ansible
An introduction to Ansible
 
Jsonnet
JsonnetJsonnet
Jsonnet
 
Cfgmgmt Challenges aren't technical anymore
Cfgmgmt Challenges aren't technical anymoreCfgmgmt Challenges aren't technical anymore
Cfgmgmt Challenges aren't technical anymore
 
Prometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observabilityPrometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observability
 
Taking advantage of Prometheus relabeling
Taking advantage of Prometheus relabelingTaking advantage of Prometheus relabeling
Taking advantage of Prometheus relabeling
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Observing the HashiCorp Ecosystem From Prometheus