SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
@bob_cotton@bob_cotton
Reveal Your Deepest Kubernetes Metrics
KubeCon EU 2018
@bob_cotton
About Me
● CTO & Co-Founder - FreshTracks.io - A CA Accelerator Incubation
● bob@freshtracks.io
● @bob_cotton
● Father, Fly Fisher & Avid homebrewer
@bob_cotton
Agenda
● Determining Important Metrics
○ Four Golden Signals
○ USE Method
○ RED Method
● Sources of metrics
○ Node
○ kubelet and containers
○ Kubernetes API
○ etcd
○ Derived metrics (kube-state-metrics)
● Metric Aggregation through the Kubernetes Hierarchy
@bob_cotton@bob_cotton
What are the Important Metrics?
@bob_cotton
Four Golden Signals
● Latency
○ The time it takes to service a request.
● Errors
○ The rate of requests that fail, either explicitly, implicitly, or by policy
● Traffic
○ A measure of how much demand is being placed on your system
● Saturation
○ How "full" your service is.
@bob_cotton
USE Method
● Introduced by Brendan Gregg for reasoning about system resources
○ Resources are all physical server functional components (CPUs, disks, busses…)
● Utilization
○ The average time that the resource was busy servicing work
● Saturation
○ The degree to which the resource has extra work which it can't service, often queued
● Errors
○ The count of error events
@bob_cotton
RED Method
● Introduced by Tom Wilkie
○ A subset of the Four Golden Signals for measuring Services
● Rate
○ The number of requests per second
● Errors
○ The number of errors per second
● Duration
○ The length of time required to service the request
@bob_cotton
USE is for Resources
RED is for Services
Kubernetes Has Both!
@bob_cotton@bob_cotton
Sources of Metrics in Kubernetes
@bob_cotton
Node
Node Metrics from node_exporter
● node_exporter installed a DaemonSet
○ One instance per node
● Standard Host Metrics
○ Load Average
○ CPU
○ Memory
○ Disk
○ Network
○ Many others
● ~1000 Unique series in a typical node
/metrics
node_exporter
@bob_cotton
USE for Node CPU
Utilization node_cpu
sum(rate(node_cpu{mode!=”idle”,mode!=”iowait”,
mode!~”^(?:guest.*)$”}[5m])) BY (instance)
Saturation node_load1
sum(node_load1) by (node) / count(node_cpu{mode="system"})by
(node) * 100
Errors N/A Not exposed by node_exporter
@bob_cotton
USE for Node Memory
Utilization node_memory_MemAvailable
node_memory_MemTotal
kube_node_status_capacity_memory_bytes
kube_node_status_allocatable_memory_bytes
1 - sum(node_memory_MemAvailable) by (node)/
sum(node_memory_MemTotal) by (node)
1- sum(kube_node_status_allocatable_memory_bytes)
by (exported_node) /
sum(kube_node_status_capacity_memory_bytes) by
(exported_node)
Saturation Don’t go into swap!
Errors node_edac_correctable_errors_total
node_edac_uncorrectable_errors_total
node_edac_csrow_correctable_errors_total
node_edac_csrow_uncorrectable_errors_total
Only available on some systems
@bob_cotton
Container Metrics from cAdvisor
● cAdvisor is embedded into the kubelet, so we scrape the kubelet to get container metrics
● These are the so-called Kubernetes “core” metrics
● For each container on the node:
○ CPU Usage (user and system) and time throttled
○ Filesystem read/writes/limits
○ Memory usage and limits
○ Network transmit/receive/dropped
Node
/metrics
kubelet
cAdvisor
node_exporter
@bob_cotton
USE for Container CPU
Utilization container_cpu_usage_seconds_total sum(rate(
container_cpu_usage_seconds_total[5m]))
by (container_name)
Saturation container_cpu_cfs_throttled_seconds_total sum(rate(
container_cpu_cfs_throttled_seconds_total[5m]) by
(container_name)
Errors N/A
@bob_cotton
USE for Container Memory
Utilization container_memory_usage_bytes
container_memory_working_set_bytes
sum(container_memory_working_set_bytes{name!~"POD"})
by (name)
Saturation Ratio of:
container_memory_working_set_bytes /
kube_pod_container_resource_limits_m
emory_bytes
sum(container_memory_working_set_bytes)
by (container_name) /
sum(label_join(kube_pod_container_resource_limits_memory_b
ytes, "container_name", "", "container"))
by (container_name)
Errors container_memory_failcnt -- Number
of memory usage hits limits.
container_memory_failures_total --
Cumulative count of memory
allocation failures.
sum(rate(
container_memory_failures_total
{type="pgmajfault"}[5m]))
by (container_name)
@bob_cotton
Kubernetes Metrics from the K8s API Server
● Metrics about the performance of the K8s API Server
○ Performance of controller work queues
○ Request Rates and Latencies
○ Etcd helper cache work queues and cache performance
○ General process status (File Descriptors/Memory/CPU Seconds)
○ Golang status (GC/Memory/Threads)
Node /metrics
kubelet
cAdvisor
node_exporter
Any other Pod
API Server
@bob_cotton
RED for Kubernetes API Server
Rate apiserver_request_count sum(rate(apiserver_request_count[5m])) by (verb)
Errors apiserver_request_count rate(apiserver_request_count{code=~"^(?:5..)$"}[5m
]) / rate(apiserver_request_count[5m])
Duration apiserver_request_latencies_bucket histogram_quantile(0.9,
rate(apiserver_request_latencies_bucket[5m])) / 1e+06
@bob_cotton
K8s Derived Metrics from kube-state-metrics
● Counts and metadata about many K8s types
○ Counts of many “nouns”
○ Resource Limits
○ Container states
■ ready/restarts/running/terminated/waiting
● *_labels series carries labels
○ Series has a constant value of 1
○ Join to other series for on-the-fly labeling using left_join
@bob_cotton
Etcd Metrics from etcd
● Etcd is “master of all truth” within a K8s cluster
○ Leader existence and leader change rate
○ Proposals committed/applied/pending/failed
○ Disk write performance
○ Inbound gRPC stats
■ etcd_http_received_total
■ etcd_http_failed_total
■ etcd_http_successful_duration_seconds_bucket
○ Intra-cluster gRPC stats
■ etcd_network_member_round_trip_time_seconds_bucket
■ ...
@bob_cotton
Core Metrics Aggregation
Namespace
Deployment
Pod
Container
● K8s clusters form a hierarchy
● We can aggregate the “core” metrics to any level
● This allows for some interesting monitoring opportunities
● Using Prometheus “recording rules” aggregate the core metrics
at every level
● Insights into all levels of your Kubernetes cluster
● This also applies to any custom application metric
@bob_cotton
Thanks
@bob_cotton
Resources
● USE Method
● RED Method
● Deep Dive into Kubernetes Metrics
● kube-state-metrics
@bob_cotton@bob_cotton
Scheduling and Autoscaling
i.e. The Metrics Pipeline
@bob_cotton
The New “Metrics Server”
● Replaces Heapster
● Standard (versioned and auth) API aggregated into the K8s API Server
● In “beta” in K8s 1.8
● Used by the scheduler and (eventually) the Horizontal Pod Autoscaler
● A stripped-down version of Heapster
● Reports on “core” metrics (CPU/Memory/Network) gathered from cAdvisor
● For internal to K8s use only.
● Pluggable for custom metrics
@bob_cotton
@bob_cotton
Feeding the Horizontal Pod Autoscaler
● Before the metrics server the HPA utilized Heapster for it’s Core metrics
○ This will be the metrics-server going forward
● API Adapter will bridge to third party monitoring system
○ e.g. Prometheus
@bob_cotton@bob_cotton
Labels, Re-Label and Recording Rules
Oh My...
@bob_cotton
Label/Value Based Data Model
● Graphite/StatsD
○ apache.192-168-5-1.home.200.http_request_total
○ apache.192-168-5-1.home.500.http_request_total
○ apache.192-168-5-1.about.200.http_request_total
● Prometheus
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”,
status=”200”}
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”,
status=”500”}
○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/about”,
status=”200”}
● Selecting Series
○ *.*.home.200.*.http_requests_total
○ http_requests_total{status=”200”, path=”/home”}
@bob_cotton
Kubernetes Labels
● Kubernetes gives us labels on all the things
● Our scrape targets live in the context of the K8s labels
○ This comes from service discovery
● We want to enhance the scraped metric labels with K8s labels
● This is why we need relabel rules in Prometheus
@bob_cotton
Prometheus
K8s API Server
TSDB
Kublet
(cAdvisor)
node-exporter
kube_state_metrics
App containers
other exporters
node_exporter
App containers
Kublet
(cAdvisor)
Service Discovery
@bob_cotton
K8s API Server
TSDB
Scrape Target
Service Discovery
Prometheus
0="{__address__ 300.196.17.41}"
1="{__meta_kubernetes_namespace default}"
2="{__meta_kubernetes_pod_annotation_freshtracks_io_data_sidecar true}"
3="{__meta_kubernetes_pod_annotation_freshtracks_io_path /metrics2}"
4="{__meta_kubernetes_pod_annotation_kubernetes_io_created_by "kind":"SerializedReference"?}"
5="{__meta_kubernetes_pod_annotation_kubernetes_io_limit_ranger LimitRanger plugin set: cpu
request for container prometheus-configmap-reload; cpu request for container data-sidecar}"
6="{__meta_kubernetes_pod_annotation_prometheus_io_port 8077}"
7="{__meta_kubernetes_pod_annotation_prometheus_io_scrape false}"
8="{__meta_kubernetes_pod_container_name prometheus-configmap-reload}"
9="{__meta_kubernetes_pod_host_ip 172.20.42.119}"
10="{__meta_kubernetes_pod_ip 100.96.17.41}"
11="{__meta_kubernetes_pod_label_freshtracks_io_cluster bowl.freshtracks.io}"
12="{__meta_kubernetes_pod_label_pod_template_hash 1636686694}"
13="{__meta_kubernetes_pod_label_run data-sidecar}"
14="{__meta_kubernetes_pod_name data-sidecar-1636686694-83crm}"
15="{__meta_kubernetes_pod_node_name ip-xx-xxx-xx-xxx.us-west-2.compute.internal}"
16="{__meta_kubernetes_pod_ready false}"
17="{__metrics_path__ /metrics}"
18="{__scheme__ http}"
19="{job ftio-data-sidecar-calc}"
<relabel_config>
{__address__ 300.196.17.41:8077}
{__scheme__ http}
{__metrics_path__ /metrics}
{job ftio-data-sidecar-calc}
{kubernetes_namespace default}
{container_name prometheus-configmap-reload}
http_requests_total{region=”us-east”,
az=”us-east-1”, instance_type=”m2.xlarge”,
instance=”i-3582k8”, hostname=”host1”} = 5439
http_requests_total{region=”us-east”,
az=”us-east-1”,
instance_type=”m2.xlarge”,
instance=”i-3582k8”,
hostname=”host1”,
instance=”300.196.17.41:8077”,
job=”ftio-data-sidecar-calc”,
kubernetes_namespace=”default”,
container_name=”prometheus-configmap-reload”,
} = 5439
<metric_relabel_config>
@bob_cotton
Recording Rules
Create a new series, derived from one or more existing series
# The name of the time series to output to. Must be a valid metric name.
record: <string>
# The PromQL expression to evaluate. Every evaluation cycle this is
# evaluated at the current time, and the result recorded as a new set of
# time series with the metric name as given by 'record'.
expr: <string>
# Labels to add or overwrite before storing the result.
labels:
[ <labelname>: <labelvalue> ]
@bob_cotton
Recording Rules
Create a new series, derived from one or more existing series
record: pod_name:cpu_usage_seconds:rate5m
expr: sum(rate(container_cpu_usage_seconds_total{pod_name=~"^(?:.+)$"}[5m]))
BY (pod_name)
labels:
ft_target: "true"

Más contenido relacionado

La actualidad más candente

Robust Operations of Kafka Streams
Robust Operations of Kafka StreamsRobust Operations of Kafka Streams
Robust Operations of Kafka Streamsconfluent
 
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021StreamNative
 
Kubernetes Requests and Limits
Kubernetes Requests and LimitsKubernetes Requests and Limits
Kubernetes Requests and LimitsAhmed AbouZaid
 
War Stories: DIY Kafka
War Stories: DIY KafkaWar Stories: DIY Kafka
War Stories: DIY Kafkaconfluent
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...HostedbyConfluent
 
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps WayDevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Waysmalltown
 
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchInfluxData
 
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...NETWAYS
 
Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Frank Kelly
 
Serverless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipelineServerless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipelineShu-Jeng Hsieh
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksRuslan Meshenberg
 
2016 08-30 Kubernetes talk for Waterloo DevOps
2016 08-30 Kubernetes talk for Waterloo DevOps2016 08-30 Kubernetes talk for Waterloo DevOps
2016 08-30 Kubernetes talk for Waterloo DevOpscraigbox
 
K8s best practices from the field!
K8s best practices from the field!K8s best practices from the field!
K8s best practices from the field!DoiT International
 
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesJonathan Katz
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19confluent
 
Kubernetes Overview - Deploy your app with confidence
Kubernetes Overview - Deploy your app with confidenceKubernetes Overview - Deploy your app with confidence
Kubernetes Overview - Deploy your app with confidenceOmer Barel
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!Jonathan Katz
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsdatamantra
 
Where is my cache architectural patterns for caching microservices by example
Where is my cache  architectural patterns for caching microservices by exampleWhere is my cache  architectural patterns for caching microservices by example
Where is my cache architectural patterns for caching microservices by exampleRafał Leszko
 

La actualidad más candente (20)

Robust Operations of Kafka Streams
Robust Operations of Kafka StreamsRobust Operations of Kafka Streams
Robust Operations of Kafka Streams
 
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
 
Kubernetes Requests and Limits
Kubernetes Requests and LimitsKubernetes Requests and Limits
Kubernetes Requests and Limits
 
War Stories: DIY Kafka
War Stories: DIY KafkaWar Stories: DIY Kafka
War Stories: DIY Kafka
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps WayDevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
 
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
 
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
 
Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...Streaming millions of Contact Center interactions in (near) real-time with Pu...
Streaming millions of Contact Center interactions in (near) real-time with Pu...
 
Serverless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipelineServerless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipeline
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talks
 
2016 08-30 Kubernetes talk for Waterloo DevOps
2016 08-30 Kubernetes talk for Waterloo DevOps2016 08-30 Kubernetes talk for Waterloo DevOps
2016 08-30 Kubernetes talk for Waterloo DevOps
 
K8s best practices from the field!
K8s best practices from the field!K8s best practices from the field!
K8s best practices from the field!
 
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & Kubernetes
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
 
Kubernetes Overview - Deploy your app with confidence
Kubernetes Overview - Deploy your app with confidenceKubernetes Overview - Deploy your app with confidence
Kubernetes Overview - Deploy your app with confidence
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 
Where is my cache architectural patterns for caching microservices by example
Where is my cache  architectural patterns for caching microservices by exampleWhere is my cache  architectural patterns for caching microservices by example
Where is my cache architectural patterns for caching microservices by example
 

Similar a 20180503 kube con eu kubernetes metrics deep dive

Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Idan Atias
 
Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 reviewManageIQ
 
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기Jinsu Moon
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes IntroductionMiloš Zubal
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDinakar Guniguntala
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
 
Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowTatiana Al-Chueyr
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...GetInData
 
Accumulo Summit Keynote 2018
Accumulo Summit Keynote 2018Accumulo Summit Keynote 2018
Accumulo Summit Keynote 2018Accumulo Summit
 
How to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL serverHow to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL serverEDB
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For OperatorsKevin Brockhoff
 
Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on KubernetesJoerg Henning
 
Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthScyllaDB
 
Kubernetes #2 monitoring
Kubernetes #2   monitoring Kubernetes #2   monitoring
Kubernetes #2 monitoring Terry Cho
 
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsOrchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsCloud Native Day Tel Aviv
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Key considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKey considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKafkaZone
 
Data Engineer's Lunch #50: Airbyte for Data Engineering
Data Engineer's Lunch #50: Airbyte for Data EngineeringData Engineer's Lunch #50: Airbyte for Data Engineering
Data Engineer's Lunch #50: Airbyte for Data EngineeringAnant Corporation
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesRishabh Indoria
 

Similar a 20180503 kube con eu kubernetes metrics deep dive (20)

Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)
 
Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 review
 
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
kubernetes를 부탁해~ Prometheus 기반 Monitoring 구축&활용기
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache Airflow
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Accumulo Summit Keynote 2018
Accumulo Summit Keynote 2018Accumulo Summit Keynote 2018
Accumulo Summit Keynote 2018
 
How to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL serverHow to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL server
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
 
Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on Kubernetes
 
Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster Health
 
Kubernetes #2 monitoring
Kubernetes #2   monitoring Kubernetes #2   monitoring
Kubernetes #2 monitoring
 
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsOrchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Key considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKey considerations in productionizing streaming applications
Key considerations in productionizing streaming applications
 
Data Engineer's Lunch #50: Airbyte for Data Engineering
Data Engineer's Lunch #50: Airbyte for Data EngineeringData Engineer's Lunch #50: Airbyte for Data Engineering
Data Engineer's Lunch #50: Airbyte for Data Engineering
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 

Último

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 

Último (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 

20180503 kube con eu kubernetes metrics deep dive

  • 1. @bob_cotton@bob_cotton Reveal Your Deepest Kubernetes Metrics KubeCon EU 2018
  • 2. @bob_cotton About Me ● CTO & Co-Founder - FreshTracks.io - A CA Accelerator Incubation ● bob@freshtracks.io ● @bob_cotton ● Father, Fly Fisher & Avid homebrewer
  • 3. @bob_cotton Agenda ● Determining Important Metrics ○ Four Golden Signals ○ USE Method ○ RED Method ● Sources of metrics ○ Node ○ kubelet and containers ○ Kubernetes API ○ etcd ○ Derived metrics (kube-state-metrics) ● Metric Aggregation through the Kubernetes Hierarchy
  • 5. @bob_cotton Four Golden Signals ● Latency ○ The time it takes to service a request. ● Errors ○ The rate of requests that fail, either explicitly, implicitly, or by policy ● Traffic ○ A measure of how much demand is being placed on your system ● Saturation ○ How "full" your service is.
  • 6. @bob_cotton USE Method ● Introduced by Brendan Gregg for reasoning about system resources ○ Resources are all physical server functional components (CPUs, disks, busses…) ● Utilization ○ The average time that the resource was busy servicing work ● Saturation ○ The degree to which the resource has extra work which it can't service, often queued ● Errors ○ The count of error events
  • 7. @bob_cotton RED Method ● Introduced by Tom Wilkie ○ A subset of the Four Golden Signals for measuring Services ● Rate ○ The number of requests per second ● Errors ○ The number of errors per second ● Duration ○ The length of time required to service the request
  • 8. @bob_cotton USE is for Resources RED is for Services Kubernetes Has Both!
  • 10. @bob_cotton Node Node Metrics from node_exporter ● node_exporter installed a DaemonSet ○ One instance per node ● Standard Host Metrics ○ Load Average ○ CPU ○ Memory ○ Disk ○ Network ○ Many others ● ~1000 Unique series in a typical node /metrics node_exporter
  • 11. @bob_cotton USE for Node CPU Utilization node_cpu sum(rate(node_cpu{mode!=”idle”,mode!=”iowait”, mode!~”^(?:guest.*)$”}[5m])) BY (instance) Saturation node_load1 sum(node_load1) by (node) / count(node_cpu{mode="system"})by (node) * 100 Errors N/A Not exposed by node_exporter
  • 12. @bob_cotton USE for Node Memory Utilization node_memory_MemAvailable node_memory_MemTotal kube_node_status_capacity_memory_bytes kube_node_status_allocatable_memory_bytes 1 - sum(node_memory_MemAvailable) by (node)/ sum(node_memory_MemTotal) by (node) 1- sum(kube_node_status_allocatable_memory_bytes) by (exported_node) / sum(kube_node_status_capacity_memory_bytes) by (exported_node) Saturation Don’t go into swap! Errors node_edac_correctable_errors_total node_edac_uncorrectable_errors_total node_edac_csrow_correctable_errors_total node_edac_csrow_uncorrectable_errors_total Only available on some systems
  • 13. @bob_cotton Container Metrics from cAdvisor ● cAdvisor is embedded into the kubelet, so we scrape the kubelet to get container metrics ● These are the so-called Kubernetes “core” metrics ● For each container on the node: ○ CPU Usage (user and system) and time throttled ○ Filesystem read/writes/limits ○ Memory usage and limits ○ Network transmit/receive/dropped Node /metrics kubelet cAdvisor node_exporter
  • 14. @bob_cotton USE for Container CPU Utilization container_cpu_usage_seconds_total sum(rate( container_cpu_usage_seconds_total[5m])) by (container_name) Saturation container_cpu_cfs_throttled_seconds_total sum(rate( container_cpu_cfs_throttled_seconds_total[5m]) by (container_name) Errors N/A
  • 15. @bob_cotton USE for Container Memory Utilization container_memory_usage_bytes container_memory_working_set_bytes sum(container_memory_working_set_bytes{name!~"POD"}) by (name) Saturation Ratio of: container_memory_working_set_bytes / kube_pod_container_resource_limits_m emory_bytes sum(container_memory_working_set_bytes) by (container_name) / sum(label_join(kube_pod_container_resource_limits_memory_b ytes, "container_name", "", "container")) by (container_name) Errors container_memory_failcnt -- Number of memory usage hits limits. container_memory_failures_total -- Cumulative count of memory allocation failures. sum(rate( container_memory_failures_total {type="pgmajfault"}[5m])) by (container_name)
  • 16. @bob_cotton Kubernetes Metrics from the K8s API Server ● Metrics about the performance of the K8s API Server ○ Performance of controller work queues ○ Request Rates and Latencies ○ Etcd helper cache work queues and cache performance ○ General process status (File Descriptors/Memory/CPU Seconds) ○ Golang status (GC/Memory/Threads) Node /metrics kubelet cAdvisor node_exporter Any other Pod API Server
  • 17. @bob_cotton RED for Kubernetes API Server Rate apiserver_request_count sum(rate(apiserver_request_count[5m])) by (verb) Errors apiserver_request_count rate(apiserver_request_count{code=~"^(?:5..)$"}[5m ]) / rate(apiserver_request_count[5m]) Duration apiserver_request_latencies_bucket histogram_quantile(0.9, rate(apiserver_request_latencies_bucket[5m])) / 1e+06
  • 18. @bob_cotton K8s Derived Metrics from kube-state-metrics ● Counts and metadata about many K8s types ○ Counts of many “nouns” ○ Resource Limits ○ Container states ■ ready/restarts/running/terminated/waiting ● *_labels series carries labels ○ Series has a constant value of 1 ○ Join to other series for on-the-fly labeling using left_join
  • 19. @bob_cotton Etcd Metrics from etcd ● Etcd is “master of all truth” within a K8s cluster ○ Leader existence and leader change rate ○ Proposals committed/applied/pending/failed ○ Disk write performance ○ Inbound gRPC stats ■ etcd_http_received_total ■ etcd_http_failed_total ■ etcd_http_successful_duration_seconds_bucket ○ Intra-cluster gRPC stats ■ etcd_network_member_round_trip_time_seconds_bucket ■ ...
  • 20. @bob_cotton Core Metrics Aggregation Namespace Deployment Pod Container ● K8s clusters form a hierarchy ● We can aggregate the “core” metrics to any level ● This allows for some interesting monitoring opportunities ● Using Prometheus “recording rules” aggregate the core metrics at every level ● Insights into all levels of your Kubernetes cluster ● This also applies to any custom application metric
  • 22. @bob_cotton Resources ● USE Method ● RED Method ● Deep Dive into Kubernetes Metrics ● kube-state-metrics
  • 24. @bob_cotton The New “Metrics Server” ● Replaces Heapster ● Standard (versioned and auth) API aggregated into the K8s API Server ● In “beta” in K8s 1.8 ● Used by the scheduler and (eventually) the Horizontal Pod Autoscaler ● A stripped-down version of Heapster ● Reports on “core” metrics (CPU/Memory/Network) gathered from cAdvisor ● For internal to K8s use only. ● Pluggable for custom metrics
  • 26. @bob_cotton Feeding the Horizontal Pod Autoscaler ● Before the metrics server the HPA utilized Heapster for it’s Core metrics ○ This will be the metrics-server going forward ● API Adapter will bridge to third party monitoring system ○ e.g. Prometheus
  • 28. @bob_cotton Label/Value Based Data Model ● Graphite/StatsD ○ apache.192-168-5-1.home.200.http_request_total ○ apache.192-168-5-1.home.500.http_request_total ○ apache.192-168-5-1.about.200.http_request_total ● Prometheus ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”200”} ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/home”, status=”500”} ○ http_request_total{job=”apache”, instance=”192.168.5.1”, path=”/about”, status=”200”} ● Selecting Series ○ *.*.home.200.*.http_requests_total ○ http_requests_total{status=”200”, path=”/home”}
  • 29. @bob_cotton Kubernetes Labels ● Kubernetes gives us labels on all the things ● Our scrape targets live in the context of the K8s labels ○ This comes from service discovery ● We want to enhance the scraped metric labels with K8s labels ● This is why we need relabel rules in Prometheus
  • 30. @bob_cotton Prometheus K8s API Server TSDB Kublet (cAdvisor) node-exporter kube_state_metrics App containers other exporters node_exporter App containers Kublet (cAdvisor) Service Discovery
  • 31. @bob_cotton K8s API Server TSDB Scrape Target Service Discovery Prometheus 0="{__address__ 300.196.17.41}" 1="{__meta_kubernetes_namespace default}" 2="{__meta_kubernetes_pod_annotation_freshtracks_io_data_sidecar true}" 3="{__meta_kubernetes_pod_annotation_freshtracks_io_path /metrics2}" 4="{__meta_kubernetes_pod_annotation_kubernetes_io_created_by "kind":"SerializedReference"?}" 5="{__meta_kubernetes_pod_annotation_kubernetes_io_limit_ranger LimitRanger plugin set: cpu request for container prometheus-configmap-reload; cpu request for container data-sidecar}" 6="{__meta_kubernetes_pod_annotation_prometheus_io_port 8077}" 7="{__meta_kubernetes_pod_annotation_prometheus_io_scrape false}" 8="{__meta_kubernetes_pod_container_name prometheus-configmap-reload}" 9="{__meta_kubernetes_pod_host_ip 172.20.42.119}" 10="{__meta_kubernetes_pod_ip 100.96.17.41}" 11="{__meta_kubernetes_pod_label_freshtracks_io_cluster bowl.freshtracks.io}" 12="{__meta_kubernetes_pod_label_pod_template_hash 1636686694}" 13="{__meta_kubernetes_pod_label_run data-sidecar}" 14="{__meta_kubernetes_pod_name data-sidecar-1636686694-83crm}" 15="{__meta_kubernetes_pod_node_name ip-xx-xxx-xx-xxx.us-west-2.compute.internal}" 16="{__meta_kubernetes_pod_ready false}" 17="{__metrics_path__ /metrics}" 18="{__scheme__ http}" 19="{job ftio-data-sidecar-calc}" <relabel_config> {__address__ 300.196.17.41:8077} {__scheme__ http} {__metrics_path__ /metrics} {job ftio-data-sidecar-calc} {kubernetes_namespace default} {container_name prometheus-configmap-reload} http_requests_total{region=”us-east”, az=”us-east-1”, instance_type=”m2.xlarge”, instance=”i-3582k8”, hostname=”host1”} = 5439 http_requests_total{region=”us-east”, az=”us-east-1”, instance_type=”m2.xlarge”, instance=”i-3582k8”, hostname=”host1”, instance=”300.196.17.41:8077”, job=”ftio-data-sidecar-calc”, kubernetes_namespace=”default”, container_name=”prometheus-configmap-reload”, } = 5439 <metric_relabel_config>
  • 32. @bob_cotton Recording Rules Create a new series, derived from one or more existing series # The name of the time series to output to. Must be a valid metric name. record: <string> # The PromQL expression to evaluate. Every evaluation cycle this is # evaluated at the current time, and the result recorded as a new set of # time series with the metric name as given by 'record'. expr: <string> # Labels to add or overwrite before storing the result. labels: [ <labelname>: <labelvalue> ]
  • 33. @bob_cotton Recording Rules Create a new series, derived from one or more existing series record: pod_name:cpu_usage_seconds:rate5m expr: sum(rate(container_cpu_usage_seconds_total{pod_name=~"^(?:.+)$"}[5m])) BY (pod_name) labels: ft_target: "true"