SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
Copyright © 2018
HashiCorp
May 23, 2018
Monitoring a Vault and
Consul Cluster
“Technical Account Manager at HashiCorp
Peter Souter
Based in...
London, UK
Been using...
The HashiCorp stack about 7 years (Vagrant
FTW!)
Worn a lot of hats in my time...
Developer, Consultant, Pre-Sales, TAM
Interested in...
Making people’s operational life easier and
more secure
DEVOPS ALL THE THINGS
Introductions - Who is this person?
“▪ Consul is the main
recommended backend for Vault
▪ It allows Vault to have a proper
HA and DR story
▪ More info:
▪ https://www.vaultproject.io/guid
es/operations/vault-ha-consul.ht
ml
Vault and Consul - What a team!
“▪ Consul hit 1.0 last year!
▪ Vault is at 0.10… 1.0 is
coming “Sooner rather than
later” - Mitchell
▪ Other products “Soon”™
▪ Also, cool stuff is coming,
come to HashiDays
Amsterdam and HashiConf!
Maturing of Products
http://bit.do/barrels_image
“
▪Architecture diagrams
▪Scaling
▪Performance
▪Deployment Guides
▪Monitoring
With maturing comes operationalisation
“
Our research team is right now working on
Consul soaking and measuring at massive scale,
so if you’re hitting edge cases or have
information for us, we’d like to hear from you!
Come help us with Consul scaling research!
“
▪Architecture diagrams
▪Scaling
▪Performance
▪Deployment Guides
▪Monitoring
Today we’re going to focus on...
“
▪ Time-series telemetry data: This involves capturing metrics
from the application, storing them in a special database
designed for that purpose, and analyzing trends in the data
over time.
▪ Examples: Grafana, CloudWatch, DataDog, Circonus.
Time-series Telemetry Data
“
▪ Log analytics. This means capturing log files from the
system and the application, extracting useful signals
from the text, and then analyzing that data.
▪ Examples: Splunk, ELK, SumoLogic.
Log Analytics
“
▪ This involves active methods of connecting to the
application and interacting with it to ensure it is
responding properly.
▪ Examples: Nagios, Sensu, Keynote.
Active health checks
“▪ Vault and Consul use the go-metrics library to export telemetry.
▪ Currently they support the following options:
• Circonus
• DataDog's DogStatsd
• Statsite
• Statsd
▪ Note that DataDog's agent and Statsite are implementations of statsd, so the
last 3 options are nearly the same thing.
How do we get those metrics?
“
Where do they go?
▪ Once the metrics reach your statsd-compatible agent, they
need to be forwarded somewhere so they can be stored
and displayed. There are many options...
▪ For this demo we’re sticking to a TIGK Stack:
• Telegraf, InfluxDB, Grafana, Kapacitor
• (Normally that would be TICK, but Cronograf’s
dashboards are not as good as Grafana IMO)
“
Where do they go? - Architecture
“
Consul Telemetry - How?
https://www.consul.io/docs/agent/telemetry.html
➔ Two Entries:
◆ dogstatsd_addr: hostname and port of
the statsd daemon.
○ DogStatsd format instead of - tells
Consul to send tagswith each metric.
Tags can be used by Grafana to filter
data on your dashboards
◆ disable_hostname: true
◆ Tells Consul not to insert the hostname in
the names of the metrics it sends to
statsd, since the hostnames will be sent
as tags.
○ Without this option, the single metric
consul.raft.apply would become
multiple metrics
{
"telemetry": {
"dogstatsd_addr": "localhost:8125",
"disable_hostname": true
}
}
“
Vault Telemetry - How?
https://www.vaultproject.io/docs/configuration/telemetry.html
Pretty much the same!
telemetry {
dogstatsd_addr = "localhost:8125"
disable_hostname = true
}
“
Consul Telemetry - What?
▪ Consul has 86 different
metrics
▪ That’s good but… which
do I need to look at?
▪ And what’s the threshold
before I should get
worried?
▪ Halp
https://www.consul.io/docs/agent/telemetry.html
“
Consul Telemetry - Transaction Timing
Metric Name Description
consul.kvs.apply This measures the time it takes to complete an
update to the KV store.
consul.txn.apply This measures the time spent applying a
transaction operation.
consul.raft.apply This counts the number of Raft transactions
occurring over the interval.
consul.raft.commitTime This measures the time it takes to commit a new
entry to the Raft log on the leader.
Why they're important: Taken together, these metrics indicate how long it takes to complete write operations in
various parts of the Consul cluster. Generally these should all be fairly consistent and no more than a few
milliseconds. Sudden changes in any of the timing values could be due to unexpected load on the Consul servers, or
due to problems on the servers themselves.
What to look for: Deviations (in any of these metrics) of more than 50% from baseline over the previous hour.
“
Vault Telemetry - Seal Status
Metric Name Description
consul_health_checks[check_name="Vault Sealed Status"].passing Value of 1 indicates Vault is unsealed;
0 means sealed.
Why they're important: By default, Vault is sealed on startup, so if this value
changes to 0 during the day, Vault has restarted for some reason. And until it's
unsealed, it won't answer requests from clients.
What to look for: A value of 0 being reported by any host.
NOTE: This metric is actually reported by the Consul plugin to Telegraf.
Copyright © 2018 HashiCorp
We’re working on
guide-ifying this!
Copyright © 2018 HashiCorp
Demo
Copyright © 2018 HashiCorp
😞
Copyright © 2018 HashiCorp
Copyright © 2018 HashiCorp
Q&A

Más contenido relacionado

La actualidad más candente

Trying Continuous Delivery - pyconjp 2012
Trying Continuous Delivery - pyconjp 2012Trying Continuous Delivery - pyconjp 2012
Trying Continuous Delivery - pyconjp 2012
Toru Furukawa
 
Deploying Plack Web Applications: OSCON 2011
Deploying Plack Web Applications: OSCON 2011Deploying Plack Web Applications: OSCON 2011
Deploying Plack Web Applications: OSCON 2011
Tatsuhiko Miyagawa
 

La actualidad más candente (20)

Trying Continuous Delivery - pyconjp 2012
Trying Continuous Delivery - pyconjp 2012Trying Continuous Delivery - pyconjp 2012
Trying Continuous Delivery - pyconjp 2012
 
A Hands-on Introduction on Terraform Best Concepts and Best Practices
A Hands-on Introduction on Terraform Best Concepts and Best Practices A Hands-on Introduction on Terraform Best Concepts and Best Practices
A Hands-on Introduction on Terraform Best Concepts and Best Practices
 
PSGI and Plack from first principles
PSGI and Plack from first principlesPSGI and Plack from first principles
PSGI and Plack from first principles
 
Node.js cluster
Node.js clusterNode.js cluster
Node.js cluster
 
Deploying Plack Web Applications: OSCON 2011
Deploying Plack Web Applications: OSCON 2011Deploying Plack Web Applications: OSCON 2011
Deploying Plack Web Applications: OSCON 2011
 
How to Develop Puppet Modules: From Source to the Forge With Zero Clicks
How to Develop Puppet Modules: From Source to the Forge With Zero ClicksHow to Develop Puppet Modules: From Source to the Forge With Zero Clicks
How to Develop Puppet Modules: From Source to the Forge With Zero Clicks
 
Advanced VCL: how to use restart
Advanced VCL: how to use restartAdvanced VCL: how to use restart
Advanced VCL: how to use restart
 
Testing your infrastructure with litmus
Testing your infrastructure with litmusTesting your infrastructure with litmus
Testing your infrastructure with litmus
 
PostgreSQL High-Availability and Geographic Locality using consul
PostgreSQL High-Availability and Geographic Locality using consulPostgreSQL High-Availability and Geographic Locality using consul
PostgreSQL High-Availability and Geographic Locality using consul
 
Going crazy with Varnish and Symfony
Going crazy with Varnish and SymfonyGoing crazy with Varnish and Symfony
Going crazy with Varnish and Symfony
 
Securing Prometheus exporters using HashiCorp Vault
Securing Prometheus exporters using HashiCorp VaultSecuring Prometheus exporters using HashiCorp Vault
Securing Prometheus exporters using HashiCorp Vault
 
Terraform - Taming Modern Clouds
Terraform  - Taming Modern CloudsTerraform  - Taming Modern Clouds
Terraform - Taming Modern Clouds
 
Be Mean to Your Code
Be Mean to Your CodeBe Mean to Your Code
Be Mean to Your Code
 
Brining Harmony between Dev and Ops and Security Teams using Gauntlt at ISC2 ...
Brining Harmony between Dev and Ops and Security Teams using Gauntlt at ISC2 ...Brining Harmony between Dev and Ops and Security Teams using Gauntlt at ISC2 ...
Brining Harmony between Dev and Ops and Security Teams using Gauntlt at ISC2 ...
 
Introducing Middy, Node.js middleware engine for AWS Lambda (FrontConf Munich...
Introducing Middy, Node.js middleware engine for AWS Lambda (FrontConf Munich...Introducing Middy, Node.js middleware engine for AWS Lambda (FrontConf Munich...
Introducing Middy, Node.js middleware engine for AWS Lambda (FrontConf Munich...
 
VCL template abstraction model and automated deployments to Fastly
VCL template abstraction model and automated deployments to FastlyVCL template abstraction model and automated deployments to Fastly
VCL template abstraction model and automated deployments to Fastly
 
An introduction to Raku
An introduction to RakuAn introduction to Raku
An introduction to Raku
 
Javascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and GulpJavascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and Gulp
 
"Swoole: double troubles in c", Alexandr Vronskiy
"Swoole: double troubles in c", Alexandr Vronskiy"Swoole: double troubles in c", Alexandr Vronskiy
"Swoole: double troubles in c", Alexandr Vronskiy
 
Nevermore Unit Testing
Nevermore Unit TestingNevermore Unit Testing
Nevermore Unit Testing
 

Similar a Monitoring a Vault and Consul cluster - 24th May 2018

The hardest part of microservices: your data
The hardest part of microservices: your dataThe hardest part of microservices: your data
The hardest part of microservices: your data
Christian Posta
 

Similar a Monitoring a Vault and Consul cluster - 24th May 2018 (20)

Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)
 
Full Consistency Lag and its Applications
Full Consistency Lag and its ApplicationsFull Consistency Lag and its Applications
Full Consistency Lag and its Applications
 
The hardest part of microservices: your data
The hardest part of microservices: your dataThe hardest part of microservices: your data
The hardest part of microservices: your data
 
Nelson: Rigorous Deployment for a Functional World
Nelson: Rigorous Deployment for a Functional WorldNelson: Rigorous Deployment for a Functional World
Nelson: Rigorous Deployment for a Functional World
 
Implementing Progressive Delivery with Your Team (by Leigh Capili)
Implementing Progressive Delivery with Your Team (by Leigh Capili)Implementing Progressive Delivery with Your Team (by Leigh Capili)
Implementing Progressive Delivery with Your Team (by Leigh Capili)
 
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
 
SiestaTime - Defcon27 Red Team Village
SiestaTime - Defcon27 Red Team VillageSiestaTime - Defcon27 Red Team Village
SiestaTime - Defcon27 Red Team Village
 
PHX DevOps Days: Service Mesh Landscape
PHX DevOps Days: Service Mesh LandscapePHX DevOps Days: Service Mesh Landscape
PHX DevOps Days: Service Mesh Landscape
 
ConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttlingConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttling
 
Approaches to application request throttling
Approaches to application request throttlingApproaches to application request throttling
Approaches to application request throttling
 
The Hardest Part of Microservices: Calling Your Services
The Hardest Part of Microservices: Calling Your ServicesThe Hardest Part of Microservices: Calling Your Services
The Hardest Part of Microservices: Calling Your Services
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
API World: The service-mesh landscape
API World: The service-mesh landscapeAPI World: The service-mesh landscape
API World: The service-mesh landscape
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
 
The hitchhiker’s guide to Prometheus
The hitchhiker’s guide to PrometheusThe hitchhiker’s guide to Prometheus
The hitchhiker’s guide to Prometheus
 
Prometheus monitoring
Prometheus monitoringPrometheus monitoring
Prometheus monitoring
 
Approaches for application request throttling - dotNetCologne
Approaches for application request throttling - dotNetCologneApproaches for application request throttling - dotNetCologne
Approaches for application request throttling - dotNetCologne
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)
 

Más de Peter Souter

I don't know what I'm Doing: A newbie guide for Golang for DevOps
I don't know what I'm Doing: A newbie guide for Golang for DevOpsI don't know what I'm Doing: A newbie guide for Golang for DevOps
I don't know what I'm Doing: A newbie guide for Golang for DevOps
Peter Souter
 

Más de Peter Souter (10)

I don't know what I'm Doing: A newbie guide for Golang for DevOps
I don't know what I'm Doing: A newbie guide for Golang for DevOpsI don't know what I'm Doing: A newbie guide for Golang for DevOps
I don't know what I'm Doing: A newbie guide for Golang for DevOps
 
Consul Connect - EPAM SEC - 22nd september 2018
Consul Connect - EPAM SEC - 22nd september 2018Consul Connect - EPAM SEC - 22nd september 2018
Consul Connect - EPAM SEC - 22nd september 2018
 
Maintaining Layer 8
Maintaining Layer 8Maintaining Layer 8
Maintaining Layer 8
 
Knee deep in the undef - Tales from refactoring old Puppet codebases
Knee deep in the undef  - Tales from refactoring old Puppet codebasesKnee deep in the undef  - Tales from refactoring old Puppet codebases
Knee deep in the undef - Tales from refactoring old Puppet codebases
 
Compliance and auditing with Puppet
Compliance and auditing with PuppetCompliance and auditing with Puppet
Compliance and auditing with Puppet
 
Lock it down
Lock it downLock it down
Lock it down
 
Hardening Your Config Management - Security and Attack Vectors in Config Mana...
Hardening Your Config Management - Security and Attack Vectors in Config Mana...Hardening Your Config Management - Security and Attack Vectors in Config Mana...
Hardening Your Config Management - Security and Attack Vectors in Config Mana...
 
Puppet module anti patterns
Puppet module anti patternsPuppet module anti patterns
Puppet module anti patterns
 
Little Puppet Tools To Make Your Life Better
Little Puppet Tools To Make Your Life BetterLittle Puppet Tools To Make Your Life Better
Little Puppet Tools To Make Your Life Better
 
Testing servers like software
Testing servers like softwareTesting servers like software
Testing servers like software
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Monitoring a Vault and Consul cluster - 24th May 2018

  • 1. Copyright © 2018 HashiCorp May 23, 2018 Monitoring a Vault and Consul Cluster
  • 2. “Technical Account Manager at HashiCorp Peter Souter Based in... London, UK Been using... The HashiCorp stack about 7 years (Vagrant FTW!) Worn a lot of hats in my time... Developer, Consultant, Pre-Sales, TAM Interested in... Making people’s operational life easier and more secure DEVOPS ALL THE THINGS Introductions - Who is this person?
  • 3. “▪ Consul is the main recommended backend for Vault ▪ It allows Vault to have a proper HA and DR story ▪ More info: ▪ https://www.vaultproject.io/guid es/operations/vault-ha-consul.ht ml Vault and Consul - What a team!
  • 4. “▪ Consul hit 1.0 last year! ▪ Vault is at 0.10… 1.0 is coming “Sooner rather than later” - Mitchell ▪ Other products “Soon”™ ▪ Also, cool stuff is coming, come to HashiDays Amsterdam and HashiConf! Maturing of Products http://bit.do/barrels_image
  • 6. “ Our research team is right now working on Consul soaking and measuring at massive scale, so if you’re hitting edge cases or have information for us, we’d like to hear from you! Come help us with Consul scaling research!
  • 8. “ ▪ Time-series telemetry data: This involves capturing metrics from the application, storing them in a special database designed for that purpose, and analyzing trends in the data over time. ▪ Examples: Grafana, CloudWatch, DataDog, Circonus. Time-series Telemetry Data
  • 9. “ ▪ Log analytics. This means capturing log files from the system and the application, extracting useful signals from the text, and then analyzing that data. ▪ Examples: Splunk, ELK, SumoLogic. Log Analytics
  • 10. “ ▪ This involves active methods of connecting to the application and interacting with it to ensure it is responding properly. ▪ Examples: Nagios, Sensu, Keynote. Active health checks
  • 11. “▪ Vault and Consul use the go-metrics library to export telemetry. ▪ Currently they support the following options: • Circonus • DataDog's DogStatsd • Statsite • Statsd ▪ Note that DataDog's agent and Statsite are implementations of statsd, so the last 3 options are nearly the same thing. How do we get those metrics?
  • 12. “ Where do they go? ▪ Once the metrics reach your statsd-compatible agent, they need to be forwarded somewhere so they can be stored and displayed. There are many options... ▪ For this demo we’re sticking to a TIGK Stack: • Telegraf, InfluxDB, Grafana, Kapacitor • (Normally that would be TICK, but Cronograf’s dashboards are not as good as Grafana IMO)
  • 13. “ Where do they go? - Architecture
  • 14. “ Consul Telemetry - How? https://www.consul.io/docs/agent/telemetry.html ➔ Two Entries: ◆ dogstatsd_addr: hostname and port of the statsd daemon. ○ DogStatsd format instead of - tells Consul to send tagswith each metric. Tags can be used by Grafana to filter data on your dashboards ◆ disable_hostname: true ◆ Tells Consul not to insert the hostname in the names of the metrics it sends to statsd, since the hostnames will be sent as tags. ○ Without this option, the single metric consul.raft.apply would become multiple metrics { "telemetry": { "dogstatsd_addr": "localhost:8125", "disable_hostname": true } }
  • 15. “ Vault Telemetry - How? https://www.vaultproject.io/docs/configuration/telemetry.html Pretty much the same! telemetry { dogstatsd_addr = "localhost:8125" disable_hostname = true }
  • 16. “ Consul Telemetry - What? ▪ Consul has 86 different metrics ▪ That’s good but… which do I need to look at? ▪ And what’s the threshold before I should get worried? ▪ Halp https://www.consul.io/docs/agent/telemetry.html
  • 17. “ Consul Telemetry - Transaction Timing Metric Name Description consul.kvs.apply This measures the time it takes to complete an update to the KV store. consul.txn.apply This measures the time spent applying a transaction operation. consul.raft.apply This counts the number of Raft transactions occurring over the interval. consul.raft.commitTime This measures the time it takes to commit a new entry to the Raft log on the leader. Why they're important: Taken together, these metrics indicate how long it takes to complete write operations in various parts of the Consul cluster. Generally these should all be fairly consistent and no more than a few milliseconds. Sudden changes in any of the timing values could be due to unexpected load on the Consul servers, or due to problems on the servers themselves. What to look for: Deviations (in any of these metrics) of more than 50% from baseline over the previous hour.
  • 18. “ Vault Telemetry - Seal Status Metric Name Description consul_health_checks[check_name="Vault Sealed Status"].passing Value of 1 indicates Vault is unsealed; 0 means sealed. Why they're important: By default, Vault is sealed on startup, so if this value changes to 0 during the day, Vault has restarted for some reason. And until it's unsealed, it won't answer requests from clients. What to look for: A value of 0 being reported by any host. NOTE: This metric is actually reported by the Consul plugin to Telegraf.
  • 19. Copyright © 2018 HashiCorp We’re working on guide-ifying this!
  • 20. Copyright © 2018 HashiCorp Demo
  • 21. Copyright © 2018 HashiCorp 😞
  • 22. Copyright © 2018 HashiCorp
  • 23. Copyright © 2018 HashiCorp Q&A