SlideShare a Scribd company logo
1 of 62
Download to read offline
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Automatically scaling
Kubernetes workloads
Sean O’Connor
Director of Engineering, Infra
Datadog
sean.oconnor@datadoghq.com
@theSeanOC
S V C 2 1 5 - S
“I think there is a
world market for
maybe 5 computers.”
—Thomas Watson, IBM
– Wildly misquoted!
“I think there is a
world market for
maybe 5 computers.”
—Thomas Watson, IBM
– Wildly misquoted!
– Sent prospectus to 20
clients, expected 5 orders
“I think there is a
world market for
maybe 5 computers.”
—Thomas Watson, IBM
– Wildly misquoted!
– Sent prospectus to 20
clients, expected 5 orders
– Received 18 orders!
“I think there is a
world market for
maybe 5 computers.”
—Thomas Watson, IBM
Who am I?
Sean O’Connor,
Director of Engineering
sean.oconnor@datadoghq.com
@theSeanOC
What is Datadog?
Observability platform
● Metrics
● Distributed traces (APM)
● Log analytics
● Synthetics
http://datadoghq.com
Capacity planning is
hard
More capacity?
Build a new office!
More capacity?
Add another rack!
Virtualization
FTW!
Config management
FTW!
The cloud!
The answer is always
containers*
*except when it’s not
Kubernetes
Horizontal Pod
Autoscaling
Basic in-cluster
metrics
1.2
Basic in-cluster
metrics
1.2 1.6
Custom in-cluster
metrics
Multiple metrics
Basic in-cluster
metrics
1.2 1.6 1.10
Custom in-cluster
metrics
Multiple metrics
Any metric, including
external sources
Choosing the
right metrics
– Throughput
– Success
– Error
– Performance
Work metrics
– Utilization
– Saturation
– Error
– Availability
Resource metrics
Services Pods
Almost like...
– Code/config changes
– Alerts
– Scaling events
Events
4 golden signals (LETS)
Latency
4 golden signals (LETS)
Latency Errors
4 golden signals (LETS)
Latency Errors
4 golden signals (LETS)
Traffic
Latency Errors
4 golden signals (LETS)
Traffic Saturation
How does the
Horizontal Pod
Autoscaler
(HPA) work?
External metrics
server
Getting external metrics
In Kubernetes
External metrics
server
Your metrics source
Getting external metrics
In Kubernetes e.g., Datadog
HPA External metrics
server
Your metrics source
Getting external metrics
In Kubernetes e.g., DatadogIn Kubernetes
Control loop
●Runs every
horizontal_pod_autoscaler_sync_period
Control loop
●Runs every
horizontal_pod_autoscaler_sync_period
●Polls and calculates current metric value
Control loop
●Runs every
horizontal_pod_autoscaler_sync_period
●Polls and calculates current metric value
●Calculates desired pod value
Control loop
●Runs every
horizontal_pod_autoscaler_sync_period
●Polls and calculates current metric value
●Calculates desired pod value
desired_pods = ceiling(
(current_metric/desired_metric) * current_pods
)
Calculating pods
Target metric value = 500 req/sec/pod
Current metric value = 1,000 req/sec/pod
Current number of pods = 2
Calculating pods
Target metric value = 500 req/sec/pod
Current metric value = 1,000 req/sec/pod
Current number of pods = 2
ceiling((1,000/500) * 2) = 4 pods
Calculating pods
Target metric value = 500 req/sec/pod
Current metric value = 250 req/sec/pod
Current number of pods = 4
Calculating pods
Target metric value = 500 req/sec/pod
Current metric value = 250 req/sec/pod
Current number of pods = 4
ceiling((250/500) * 4) = 2 pods
Control loop
●Runs every
horizontal_pod_autoscaler_sync_period
●Polls and calculates current metric value
●Calculates desired pod value
Control loop
●Runs every
horizontal_pod_autoscaler_sync_period
●Polls and calculates current metric value
●Calculates desired pod value
●Executes scaling
In practice
In practice
External metrics
server
In Kubernetes
Metrics server service (YAML)
kind: Service
apiVersion: v1
metadata:
name: datadog-custom-metrics-server
spec:
selector:
app: datadog-cluster-agent
ports:
- protocol: TCP
port: 443
targetPort: 443
Datadog Cluster Agent Deployment (more YAML!)
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: datadog-cluster-agent
spec:
template:
metadata:
labels:
app: datadog-cluster-agent
spec:
containers:
- image: datadog/cluster-agent:latest
name: datadog-cluster-agent
env:
- name: DD_EXTERNAL_METRICS_PROVIDER_ENABLED
value: 'true'
Expose it to Kubernetes (even more YAML!)
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.external.metrics.k8s.io
spec:
group: external.metrics.k8s.io
groupPriorityMinimum: 100
versionPriority: 100
service:
name: datadog-custom-metrics-server
version: v1beta1
In practice
External metrics
server
In Kubernetes
Your metrics source
e.g., Datadog
Metrics from Datadog
In practice
External metrics
server
In Kubernetes
Your metrics source
e.g., Datadog
HPA
In Kubernetes
Our example application (we love YAML!)
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 1
template:
metadata:
name: my-app
spec:
containers:
- name: my-app
image: my-cool-app:latest
HPA (you love YAML too!)
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: my-hpa
spec:
minReplicas: 1
maxReplicas: 3
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
metrics:
- type: External
external:
metricName: nginx.net.request_per_s
targetAverageValue: 500
kubectl describe hpa my-hpa
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able
to get the target's current
scale
ScalingActive False FailedGetExternalMetric the HPA was unable to compute
the replica count: unable to
get external metric
nginx.net.request_per_s: no
metrics returned from external
metrics API
kubectl describe hpa my-hpa
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able
to get the target's current
scale
ScalingActive True ValidMetricFound the HPA was able to
successfully calculate a
replica count from external
metric nginx.net.request_per_s
ScalingLimited False DesiredWithinRange the desired count is within the
acceptable range
kubectl describe hpa my-hpa
Reference: Deployment/my-app
Metrics: ( current / target )
"nginx.net.request_per_s" (target average value): 500 / 500
Min replicas: 1
Max replicas: 3
Deployment pods: 2 current / 2 desired
Recap
Kubernetes is the
natural evolution
HPA is a great tool
Recap
Kubernetes is the
natural evolution
HPA is a great tool
Use the right metrics
Think of work vs.
resource metrics
Consider the 4 golden signals
Recap
Additional resources
●Autoscaling K8s blog post:
https://dtdg.co/autoscaling-k8s
●Datadog Cluster Agent:
https://dtdg.co/cluster-agent
●Kubernetes HPA docs:
https://dtdg.co/k8s-hpa-docs
Thank you
sean.oconnor@datadoghq.com
@theSeanOC

More Related Content

What's hot

What's hot (20)

사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
 
SRV410 Deep Dive on AWS Batch
SRV410 Deep Dive on AWS BatchSRV410 Deep Dive on AWS Batch
SRV410 Deep Dive on AWS Batch
 
Amazon EKS를 위한 AWS CDK와 CDK8s 활용법 - 염지원, 김광영 AWS 솔루션즈 아키텍트 :: AWS Summit Seou...
Amazon EKS를 위한 AWS CDK와 CDK8s 활용법 - 염지원, 김광영 AWS 솔루션즈 아키텍트 :: AWS Summit Seou...Amazon EKS를 위한 AWS CDK와 CDK8s 활용법 - 염지원, 김광영 AWS 솔루션즈 아키텍트 :: AWS Summit Seou...
Amazon EKS를 위한 AWS CDK와 CDK8s 활용법 - 염지원, 김광영 AWS 솔루션즈 아키텍트 :: AWS Summit Seou...
 
AWS Networking Fundamentals - SVC304 - Anaheim AWS Summit
AWS Networking Fundamentals - SVC304 - Anaheim AWS SummitAWS Networking Fundamentals - SVC304 - Anaheim AWS Summit
AWS Networking Fundamentals - SVC304 - Anaheim AWS Summit
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding Kubernetes
 
Best Practices of Infrastructure as Code with Terraform
Best Practices of Infrastructure as Code with TerraformBest Practices of Infrastructure as Code with Terraform
Best Practices of Infrastructure as Code with Terraform
 
Let's Talk About: Azure Networking
Let's Talk About: Azure NetworkingLet's Talk About: Azure Networking
Let's Talk About: Azure Networking
 
Deep Dive into AWS SAM
Deep Dive into AWS SAMDeep Dive into AWS SAM
Deep Dive into AWS SAM
 
[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험
 
Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...
Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...
Terraform을 기반한 AWS 기반 대규모 마이크로서비스 인프라 운영 노하우 - 이용욱, 삼성전자 :: AWS Summit Seoul ...
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
Container Patterns
Container PatternsContainer Patterns
Container Patterns
 
Introduction to MongoDB Enterprise
Introduction to MongoDB EnterpriseIntroduction to MongoDB Enterprise
Introduction to MongoDB Enterprise
 
Kubernetes dealing with storage and persistence
Kubernetes  dealing with storage and persistenceKubernetes  dealing with storage and persistence
Kubernetes dealing with storage and persistence
 
Configuring global infrastructure in terraform
Configuring global infrastructure in terraformConfiguring global infrastructure in terraform
Configuring global infrastructure in terraform
 
Security and Compliance in the Cloud
Security and Compliance in the Cloud Security and Compliance in the Cloud
Security and Compliance in the Cloud
 
AWS Summit Seoul 2023 | Confluent와 함께하는 실시간 데이터와 클라우드 여정
AWS Summit Seoul 2023 | Confluent와 함께하는 실시간 데이터와 클라우드 여정AWS Summit Seoul 2023 | Confluent와 함께하는 실시간 데이터와 클라우드 여정
AWS Summit Seoul 2023 | Confluent와 함께하는 실시간 데이터와 클라우드 여정
 
Azure Resource Manager (ARM) Templates
Azure Resource Manager (ARM) TemplatesAzure Resource Manager (ARM) Templates
Azure Resource Manager (ARM) Templates
 
Azure DNS Private Resolver - Azure Example Scenarios _ Microsoft Learn.pdf
Azure DNS Private Resolver - Azure Example Scenarios _ Microsoft Learn.pdfAzure DNS Private Resolver - Azure Example Scenarios _ Microsoft Learn.pdf
Azure DNS Private Resolver - Azure Example Scenarios _ Microsoft Learn.pdf
 
Kubernetes 101 - an Introduction to Containers, Kubernetes, and OpenShift
Kubernetes 101 - an Introduction to Containers, Kubernetes, and OpenShiftKubernetes 101 - an Introduction to Containers, Kubernetes, and OpenShift
Kubernetes 101 - an Introduction to Containers, Kubernetes, and OpenShift
 

Similar to Automatically scaling Kubernetes workloads - SVC215-S - New York AWS Summit

KPI definition with Business Activity Monitor 2.0
KPI definition with Business Activity Monitor 2.0KPI definition with Business Activity Monitor 2.0
KPI definition with Business Activity Monitor 2.0
WSO2
 

Similar to Automatically scaling Kubernetes workloads - SVC215-S - New York AWS Summit (20)

Autoscaling in kubernetes v1
Autoscaling in kubernetes v1Autoscaling in kubernetes v1
Autoscaling in kubernetes v1
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
 
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
 
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
 
Bigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_appBigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_app
 
ql.io at NodePDX
ql.io at NodePDXql.io at NodePDX
ql.io at NodePDX
 
Knative Outro
Knative OutroKnative Outro
Knative Outro
 
KPI definition with Business Activity Monitor 2.0
KPI definition with Business Activity Monitor 2.0KPI definition with Business Activity Monitor 2.0
KPI definition with Business Activity Monitor 2.0
 
Optimizing Application Performance on Kubernetes
Optimizing Application Performance on KubernetesOptimizing Application Performance on Kubernetes
Optimizing Application Performance on Kubernetes
 
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes WorkloadsAWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
 
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...
Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...
Automatically scaling your Kubernetes workloads - SVC210-S - Santa Clara AWS ...
 
Fabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkFabric - Realtime stream processing framework
Fabric - Realtime stream processing framework
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
eBay Pulsar: Real-time analytics platform
eBay Pulsar: Real-time analytics platformeBay Pulsar: Real-time analytics platform
eBay Pulsar: Real-time analytics platform
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetes
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Automatically scaling Kubernetes workloads - SVC215-S - New York AWS Summit