SlideShare una empresa de Scribd logo
1 de 55
Descargar para leer sin conexión
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
Using Amazon CloudWatch for Amazon
ECS Resource Monitoring at Scale
B r e n d a n M c F a r l a n d
P l a t f o r m C o s t E n g i n e e r @ M a p b o x
N o v e m b e r 2 8 , 2 0 1 7
DEV333
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DEV333: Agenda
- Mapbox + platform cost engineering
- EC2 → ECS
- ECS + the shared resource problem
- ECS + Amazon CloudWatch
- Building a solution
- Results and lessons learned
- What’s next @ Mapbox
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Maps Navigation Search
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS from the beginning
- Thousands of customers
- Thousands of mobile applications
- Hundreds of millions of users
- Three billion probes a day, 100 million miles of anonymized telemetry
All this consumes trillions of compute seconds every day on Amazon EC2
Container Service (Amazon ECS)
Platform Cost Engineering
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Platform Cost Engineering
Role
- Cost monitoring
- On-call rotation
- Consult with stakeholders
- Innovate and advance
Principles
- Inform and empower stakeholders
- Stewardship of resource usage
- The x-axis is coming
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Scale @ Mapbox
Servers
Customers
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Scale @ Mapbox
Servers
Customers
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Scale @ Mapbox
Servers
Customers
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Scale @ Mapbox
Servers
Customers
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Many Dimensions of Scale
Scale: customers, requests, applications, products
Resources:
- Servers
- Engineers
- Cost
- Alarms
- Architectures
EC2 → ECS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Moving to ECS
Lower Cost
- Utilization
- Spot by default
- Centralized Capacity
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Moving to ECS
Efficient Operations
- Separate Infrastructure from Code
- Uniform architecture
- Security / Credentialing
- Local development environments
C4 R3 M4R3 R3
R3 R3 R3
M4 M4
M4 M4 M4
C4 C4
C4 C4 C4
Map Service Search Service Directions Service
Before ECS
C4 R3 M4R3 R3
R3 R3 R3
M4 M4
M4 M4 M4
C4 C4
C4 C4 C4
Map Service Search ServiceDirections Service
ECS
C4 R3 R3 R3
R3 R3 R3
M4
M4
M4 M4
M4 M4
C4 C4
C4 C4 C4
Map Service Search ServiceDirections Service
Spot & Instance Diversity
SpotFleet
C4
C4
R3
R3
Improved Instance Packing
Map
service
CPU 55%
Mem 5%
Search
service
CPU 25%
Mem 75%
Combined
services
with ECS
CPU 80%
Mem 80%
ECS + the shared resource problem
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
C4 C4 C4
C4 C4 C4
Map Service
Before ECS
A maps instance is a maps instance
= $100
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
After ECS Migration
C4 R3 M4R3 R3
R3 R3 R3
M4 M4
M4 M4 M4
C4 C4
C4 C4 C4
Map Service Search ServiceDirections Service
A cluster instance is…
a maps instance,
a directions instance,
a search and directions instance,
a maps and search instance,
all within the same day…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ECS Billing
C4 R3 M4R3 R3
R3 R3 R3
M4 M4
M4 M4 M4
C4 C4
C4 C4 C4
Map Service Search ServiceDirections Service
= $1,000
= ?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Still missing part of the picture
Challenges in a shared resource environment
- Tags at cluster level
- Multiple teams in one resource
- Loss of granularity for teams
We needed service & task resource consumption within our ECS environment
C4 R3 M4R3 R3
R3 R3 R3
M4 M4
M4 M4 M4
C4 C4
C4 C4 C4
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ECS is great!
ECS is a whole different monitoring space
Hard to answer previously basic questions:
• How much did Maps cost this month?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ECS + CloudWatch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ECS CloudWatch Metrics
Cluster Level
- CPU utilization
- CPU reservation
- Memory utilization
- Memory reservation
Cluster and service level
- CPU utilization
- CPU reservation
- Memory utilization
- Memory reservation
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Mapbox uses 31 custom metrics per cluster*
Some examples
• AgentConnectedPercent
• FailedTaskPlacement
• ActiveInstanceCount
• StatusCheckFailedInstances
• RunningTasksPerInstances
• Spot Termination
Custom Metrics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cluster Dashboards
AWS CloudFormation creates a CloudWatch dashboard for each cluster
var dashboard = require('./dashboard');
Resources: {
// …..
Dashboard: {
Type: 'AWS::CloudWatch::Dashboard',
Properties: { DashboardName: cf.sub('${AWS::StackName}-${AWS::Region}'),
DashboardBody: cf.sub(dashboard) }
}
}
- Include dashboard widgets in a ./dashboard.js file
- cf references are for https://github.com/mapbox/cloudfriend an open source tool for cloudformation
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Sample Widget
{ type: 'metric', x: 0, y: 3, width: 24, height: 6,
properties: {
view: 'timeSeries',
stacked: false,
metrics: [
['AWS/ECS', 'CPUUtilization', 'ClusterName',
'${Cluster}'],
['.', 'CPUReservation', '.', '.'],
['.', 'MemoryUtilization', '.', '.', { yAxis: 'left' }],
['.', 'MemoryReservation', '.', '.', { yAxis: 'left' }] ],
region: '${AWS::Region}',
title: 'Utilization/Reservation',
period: 300,
yAxis: { left: { min: 0 }, right: { min: 0 } }
}
}
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Additional Dashboard
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
But wait, how much did Maps cost this
month?
Variables
- 10+ clusters
- 10+ Spotfleet mixes with different az’s, prices, etc.
- Very different task profiles running
- Services v. RunTask
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
But wait, how much did Maps cost this
month?
Data
- How long a task ran on a cluster
- What resources that task reserved on the cluster
- The cost of resources on the cluster while the task ran
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Recording Task Duration
CloudWatch Event Rules
- Collect the start and stop event for every task on our infrastructure
Take 1:
ECS Task start/stop → CloudWatch Event rule → Amazon Kinesis Firehose → Amazon Simple
Storage Service (Amazon S3) → AWS Lambda → S3 → Amazon Athena
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Recording Task Duration Part 2
Challenges:
- Infrastructure running in all but two ECS regions
- Kinesis Firehose not supported in all ECS regions
- Volume of start and stop events
Take 2:
ECS Task start/stop → CloudWatch Event Rule → Lambda → Kinesis Firehose → S3 → Athena
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Putting It Together
Duration x Resources reserved → Task Resource Cost
Resources reserved
- ECS task names – Service_Team_CostCategory_Suffix:Revision
Resource usage pipeline:
Task definition registered → CloudWatch Event Rule → Lambda → S3 → Athena
Cost pipeline
Bucket notification → ECS Task → S3 → Athena
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Taking a pause
How much spending on tracking spending?
- Massive Task Volume
- One Athena query cost what?
Social impacts of information
- Socializing why
- High cost of faulty reporting
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Recording Task Duration
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
One Athena query cost what?
Data Warehouse
- S3
- Athena Queries
- Partition
- Compression
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Socialize why
Empower Teams v. Punish Teams
- Big numbers can be scary
Creating a healthy system
- Monitor → Learn → Take action
High Cost of faulty information
- Alarms
- Regular checks on reporting
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Results with this pipeline
Task level detail within our clusters
Queryable database of task start and stop events
Can cost our containers
- Container Y ran for 112 seconds on cluster Z
- Container Y reserves 1024 cpu and 2048 memory
- A cpu second on cluster Z cost $0.00001
The cost of container Y is $0.0001 x 114,688 cpu/seconds = $1.14
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reporting: Custom Cost Metrics
Scheduled Lambda → Athena → CloudWatch → S3
Team Cost
- Overview of what’s happening in the cluster at
the team level
Service Cost
- Stakeholders get feedback on application
level changes and their impact
Not every detail is logged to CloudWatch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reporting back out
Scheduled Lambda → Athena → CloudWatch → S3
Determining what to write to CloudWatch
- Cost on the Cluster can be rolled up by task, service, team and cost category.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Immediate Benefits
Can answer key questions:
• How much did my Application cost this month?
• Why’s the EC2 Bill more expensive?
- Diagnose Capacity fluctuations
- Usage bill for stakeholder teams
- ECS clusters organized for engineering
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Additional Benefits
Put a focus on right sizing
- Charged by reservation not utilization
Debugging task placement problems
- Replay task history on a box with stop and start events
Debugging Instance Events
- Disk Utilization
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s Next?
- Utilization and billing tightly linked in stakeholder reporting
- Following this model with other large AWS products
- Enhanced alarming / monitoring
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s Next?
We’re growing!
- https://www.mapbox.com/jobs/
We look for incredibly smart and radically pragmatic people. Our teams members
have a range of cultural, ethnic, social, and economic backgrounds. We prioritize the
progression, growth, and leadership of individuals from all backgrounds and strongly
believe in the value of cultivating a diverse team and encourage people of all
backgrounds to apply.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

Más contenido relacionado

La actualidad más candente

Real world High Performance & High Throughput Computing on AWS
Real world High Performance & High Throughput Computing on AWSReal world High Performance & High Throughput Computing on AWS
Real world High Performance & High Throughput Computing on AWSAmazon Web Services
 
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...Amazon Web Services
 
STG307_Deep Dive on Amazon Elastic File System (Amazon EFS)
STG307_Deep Dive on Amazon Elastic File System (Amazon EFS)STG307_Deep Dive on Amazon Elastic File System (Amazon EFS)
STG307_Deep Dive on Amazon Elastic File System (Amazon EFS)Amazon Web Services
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansAmazon Web Services
 
NET302_Global Traffic Management with Amazon Route 53
NET302_Global Traffic Management with Amazon Route 53NET302_Global Traffic Management with Amazon Route 53
NET302_Global Traffic Management with Amazon Route 53Amazon Web Services
 
Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)Amazon Web Services
 
CMP217_Scale In-Memory Workloads on Amazon EC2 X1 and X1e Instances with up t...
CMP217_Scale In-Memory Workloads on Amazon EC2 X1 and X1e Instances with up t...CMP217_Scale In-Memory Workloads on Amazon EC2 X1 and X1e Instances with up t...
CMP217_Scale In-Memory Workloads on Amazon EC2 X1 and X1e Instances with up t...Amazon Web Services
 
NET201_Creating Your Virtual Data Center
NET201_Creating Your Virtual Data CenterNET201_Creating Your Virtual Data Center
NET201_Creating Your Virtual Data CenterAmazon Web Services
 
DAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL PerformanceDAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL PerformanceAmazon Web Services
 
NET308_VPC Design Scenarios for Real-Life Use Cases
NET308_VPC Design Scenarios for Real-Life Use CasesNET308_VPC Design Scenarios for Real-Life Use Cases
NET308_VPC Design Scenarios for Real-Life Use CasesAmazon Web Services
 
ARC304_From One to Many Evolving VPC Design
ARC304_From One to Many Evolving VPC DesignARC304_From One to Many Evolving VPC Design
ARC304_From One to Many Evolving VPC DesignAmazon Web Services
 
NET304_Deep Dive into the New Network Load Balancer
NET304_Deep Dive into the New Network Load BalancerNET304_Deep Dive into the New Network Load Balancer
NET304_Deep Dive into the New Network Load BalancerAmazon Web Services
 
ARC303_Running Lean Architectures How to Optimize for Cost Efficiency
ARC303_Running Lean Architectures How to Optimize for Cost EfficiencyARC303_Running Lean Architectures How to Optimize for Cost Efficiency
ARC303_Running Lean Architectures How to Optimize for Cost EfficiencyAmazon Web Services
 
DAT320_Moving a Galaxy into Cloud
DAT320_Moving a Galaxy into CloudDAT320_Moving a Galaxy into Cloud
DAT320_Moving a Galaxy into CloudAmazon Web Services
 
AWS Commercial Management and Cost Optimisation - Dec 2017
AWS Commercial Management and Cost Optimisation - Dec 2017AWS Commercial Management and Cost Optimisation - Dec 2017
AWS Commercial Management and Cost Optimisation - Dec 2017Amazon Web Services
 
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017Amazon Web Services
 
Scaling up to Your First 10 Million Users
Scaling up to Your First 10 Million UsersScaling up to Your First 10 Million Users
Scaling up to Your First 10 Million UsersAmazon Web Services
 

La actualidad más candente (20)

Real world High Performance & High Throughput Computing on AWS
Real world High Performance & High Throughput Computing on AWSReal world High Performance & High Throughput Computing on AWS
Real world High Performance & High Throughput Computing on AWS
 
STG306_Deep Dive on Amazon EBS
STG306_Deep Dive on Amazon EBSSTG306_Deep Dive on Amazon EBS
STG306_Deep Dive on Amazon EBS
 
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
 
STG307_Deep Dive on Amazon Elastic File System (Amazon EFS)
STG307_Deep Dive on Amazon Elastic File System (Amazon EFS)STG307_Deep Dive on Amazon Elastic File System (Amazon EFS)
STG307_Deep Dive on Amazon Elastic File System (Amazon EFS)
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data Oceans
 
NET302_Global Traffic Management with Amazon Route 53
NET302_Global Traffic Management with Amazon Route 53NET302_Global Traffic Management with Amazon Route 53
NET302_Global Traffic Management with Amazon Route 53
 
Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)
 
CMP217_Scale In-Memory Workloads on Amazon EC2 X1 and X1e Instances with up t...
CMP217_Scale In-Memory Workloads on Amazon EC2 X1 and X1e Instances with up t...CMP217_Scale In-Memory Workloads on Amazon EC2 X1 and X1e Instances with up t...
CMP217_Scale In-Memory Workloads on Amazon EC2 X1 and X1e Instances with up t...
 
NET201_Creating Your Virtual Data Center
NET201_Creating Your Virtual Data CenterNET201_Creating Your Virtual Data Center
NET201_Creating Your Virtual Data Center
 
DAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL PerformanceDAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL Performance
 
NET308_VPC Design Scenarios for Real-Life Use Cases
NET308_VPC Design Scenarios for Real-Life Use CasesNET308_VPC Design Scenarios for Real-Life Use Cases
NET308_VPC Design Scenarios for Real-Life Use Cases
 
ARC304_From One to Many Evolving VPC Design
ARC304_From One to Many Evolving VPC DesignARC304_From One to Many Evolving VPC Design
ARC304_From One to Many Evolving VPC Design
 
NET304_Deep Dive into the New Network Load Balancer
NET304_Deep Dive into the New Network Load BalancerNET304_Deep Dive into the New Network Load Balancer
NET304_Deep Dive into the New Network Load Balancer
 
ARC303_Running Lean Architectures How to Optimize for Cost Efficiency
ARC303_Running Lean Architectures How to Optimize for Cost EfficiencyARC303_Running Lean Architectures How to Optimize for Cost Efficiency
ARC303_Running Lean Architectures How to Optimize for Cost Efficiency
 
DAT320_Moving a Galaxy into Cloud
DAT320_Moving a Galaxy into CloudDAT320_Moving a Galaxy into Cloud
DAT320_Moving a Galaxy into Cloud
 
AWS Commercial Management and Cost Optimisation - Dec 2017
AWS Commercial Management and Cost Optimisation - Dec 2017AWS Commercial Management and Cost Optimisation - Dec 2017
AWS Commercial Management and Cost Optimisation - Dec 2017
 
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
 
DAT325_Snapchat Stories
DAT325_Snapchat StoriesDAT325_Snapchat Stories
DAT325_Snapchat Stories
 
ARC205_Born in the Cloud
ARC205_Born in the CloudARC205_Born in the Cloud
ARC205_Born in the Cloud
 
Scaling up to Your First 10 Million Users
Scaling up to Your First 10 Million UsersScaling up to Your First 10 Million Users
Scaling up to Your First 10 Million Users
 

Similar a DEV333_Using Amazon CloudWatch for Amazon ECS Resource Monitoring at Scale

Born in the Cloud, Built like a Startup
Born in the Cloud, Built like a StartupBorn in the Cloud, Built like a Startup
Born in the Cloud, Built like a StartupAmazon Web Services
 
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017Amazon Web Services
 
SRV317_Unlocking High Performance Computing for Financial Services with Serve...
SRV317_Unlocking High Performance Computing for Financial Services with Serve...SRV317_Unlocking High Performance Computing for Financial Services with Serve...
SRV317_Unlocking High Performance Computing for Financial Services with Serve...Amazon Web Services
 
CON203_Driving Innovation with Containers
CON203_Driving Innovation with ContainersCON203_Driving Innovation with Containers
CON203_Driving Innovation with ContainersAmazon Web Services
 
Driving Innovation with Containers - CON203 - re:Invent 2017
Driving Innovation with Containers - CON203 - re:Invent 2017Driving Innovation with Containers - CON203 - re:Invent 2017
Driving Innovation with Containers - CON203 - re:Invent 2017Amazon Web Services
 
AWS SysOps Administrator Training | AWS SysOps Tutorial | Edureka
AWS SysOps Administrator Training | AWS SysOps Tutorial | EdurekaAWS SysOps Administrator Training | AWS SysOps Tutorial | Edureka
AWS SysOps Administrator Training | AWS SysOps Tutorial | EdurekaEdureka!
 
ATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing OperationsATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing OperationsAmazon Web Services
 
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...Amazon Web Services
 
Auto Scaling: The Fleet Management Solution for Planet Earth - CMP201 - re:In...
Auto Scaling: The Fleet Management Solution for Planet Earth - CMP201 - re:In...Auto Scaling: The Fleet Management Solution for Planet Earth - CMP201 - re:In...
Auto Scaling: The Fleet Management Solution for Planet Earth - CMP201 - re:In...Amazon Web Services
 
Building .NET-based Serverless Architectures and Running .NET Core Microservi...
Building .NET-based Serverless Architectures and Running .NET Core Microservi...Building .NET-based Serverless Architectures and Running .NET Core Microservi...
Building .NET-based Serverless Architectures and Running .NET Core Microservi...Amazon Web Services
 
CMP209_Getting started with Docker on AWS
CMP209_Getting started with Docker on AWSCMP209_Getting started with Docker on AWS
CMP209_Getting started with Docker on AWSAmazon Web Services
 
Build a Java Spring Application on Amazon ECS - CON332 - re:Invent 2017
Build a Java Spring Application on Amazon ECS - CON332 - re:Invent 2017Build a Java Spring Application on Amazon ECS - CON332 - re:Invent 2017
Build a Java Spring Application on Amazon ECS - CON332 - re:Invent 2017Amazon Web Services
 
Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...
Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...
Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...Amazon Web Services
 
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...Amazon Web Services
 
Amazon Amazon Elastic Container Service (Amazon ECS)
Amazon Amazon Elastic Container Service (Amazon ECS)Amazon Amazon Elastic Container Service (Amazon ECS)
Amazon Amazon Elastic Container Service (Amazon ECS)Amazon Web Services
 
Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech TalksAccelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech TalksAmazon Web Services
 

Similar a DEV333_Using Amazon CloudWatch for Amazon ECS Resource Monitoring at Scale (20)

Born in the Cloud, Built like a Startup
Born in the Cloud, Built like a StartupBorn in the Cloud, Built like a Startup
Born in the Cloud, Built like a Startup
 
AWS 容器服務入門實務
AWS 容器服務入門實務AWS 容器服務入門實務
AWS 容器服務入門實務
 
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
 
Containers on AWS
Containers on AWSContainers on AWS
Containers on AWS
 
SRV317_Unlocking High Performance Computing for Financial Services with Serve...
SRV317_Unlocking High Performance Computing for Financial Services with Serve...SRV317_Unlocking High Performance Computing for Financial Services with Serve...
SRV317_Unlocking High Performance Computing for Financial Services with Serve...
 
CON203_Driving Innovation with Containers
CON203_Driving Innovation with ContainersCON203_Driving Innovation with Containers
CON203_Driving Innovation with Containers
 
Driving Innovation with Containers - CON203 - re:Invent 2017
Driving Innovation with Containers - CON203 - re:Invent 2017Driving Innovation with Containers - CON203 - re:Invent 2017
Driving Innovation with Containers - CON203 - re:Invent 2017
 
AWS SysOps Administrator Training | AWS SysOps Tutorial | Edureka
AWS SysOps Administrator Training | AWS SysOps Tutorial | EdurekaAWS SysOps Administrator Training | AWS SysOps Tutorial | Edureka
AWS SysOps Administrator Training | AWS SysOps Tutorial | Edureka
 
ATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing OperationsATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing Operations
 
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
 
AWS Cost Optimisation Solutions
AWS Cost Optimisation SolutionsAWS Cost Optimisation Solutions
AWS Cost Optimisation Solutions
 
Auto Scaling: The Fleet Management Solution for Planet Earth - CMP201 - re:In...
Auto Scaling: The Fleet Management Solution for Planet Earth - CMP201 - re:In...Auto Scaling: The Fleet Management Solution for Planet Earth - CMP201 - re:In...
Auto Scaling: The Fleet Management Solution for Planet Earth - CMP201 - re:In...
 
Building .NET-based Serverless Architectures and Running .NET Core Microservi...
Building .NET-based Serverless Architectures and Running .NET Core Microservi...Building .NET-based Serverless Architectures and Running .NET Core Microservi...
Building .NET-based Serverless Architectures and Running .NET Core Microservi...
 
CMP209_Getting started with Docker on AWS
CMP209_Getting started with Docker on AWSCMP209_Getting started with Docker on AWS
CMP209_Getting started with Docker on AWS
 
Build a Java Spring Application on Amazon ECS - CON332 - re:Invent 2017
Build a Java Spring Application on Amazon ECS - CON332 - re:Invent 2017Build a Java Spring Application on Amazon ECS - CON332 - re:Invent 2017
Build a Java Spring Application on Amazon ECS - CON332 - re:Invent 2017
 
Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...
Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...
Getting Started with Containers in the Cloud: AWS Developer Workshop at Web S...
 
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...
 
Amazon ECS Deep Dive
Amazon ECS Deep DiveAmazon ECS Deep Dive
Amazon ECS Deep Dive
 
Amazon Amazon Elastic Container Service (Amazon ECS)
Amazon Amazon Elastic Container Service (Amazon ECS)Amazon Amazon Elastic Container Service (Amazon ECS)
Amazon Amazon Elastic Container Service (Amazon ECS)
 
Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech TalksAccelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

DEV333_Using Amazon CloudWatch for Amazon ECS Resource Monitoring at Scale

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT Using Amazon CloudWatch for Amazon ECS Resource Monitoring at Scale B r e n d a n M c F a r l a n d P l a t f o r m C o s t E n g i n e e r @ M a p b o x N o v e m b e r 2 8 , 2 0 1 7 DEV333
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DEV333: Agenda - Mapbox + platform cost engineering - EC2 → ECS - ECS + the shared resource problem - ECS + Amazon CloudWatch - Building a solution - Results and lessons learned - What’s next @ Mapbox
  • 3.
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Maps Navigation Search
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS from the beginning - Thousands of customers - Thousands of mobile applications - Hundreds of millions of users - Three billion probes a day, 100 million miles of anonymized telemetry All this consumes trillions of compute seconds every day on Amazon EC2 Container Service (Amazon ECS)
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Platform Cost Engineering Role - Cost monitoring - On-call rotation - Consult with stakeholders - Innovate and advance Principles - Inform and empower stakeholders - Stewardship of resource usage - The x-axis is coming
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Scale @ Mapbox Servers Customers
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Scale @ Mapbox Servers Customers
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Scale @ Mapbox Servers Customers
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Scale @ Mapbox Servers Customers
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Many Dimensions of Scale Scale: customers, requests, applications, products Resources: - Servers - Engineers - Cost - Alarms - Architectures
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Moving to ECS Lower Cost - Utilization - Spot by default - Centralized Capacity
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Moving to ECS Efficient Operations - Separate Infrastructure from Code - Uniform architecture - Security / Credentialing - Local development environments
  • 21. C4 R3 M4R3 R3 R3 R3 R3 M4 M4 M4 M4 M4 C4 C4 C4 C4 C4 Map Service Search Service Directions Service Before ECS
  • 22. C4 R3 M4R3 R3 R3 R3 R3 M4 M4 M4 M4 M4 C4 C4 C4 C4 C4 Map Service Search ServiceDirections Service ECS
  • 23. C4 R3 R3 R3 R3 R3 R3 M4 M4 M4 M4 M4 M4 C4 C4 C4 C4 C4 Map Service Search ServiceDirections Service Spot & Instance Diversity SpotFleet C4 C4 R3 R3
  • 24. Improved Instance Packing Map service CPU 55% Mem 5% Search service CPU 25% Mem 75% Combined services with ECS CPU 80% Mem 80%
  • 25. ECS + the shared resource problem
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. C4 C4 C4 C4 C4 C4 Map Service Before ECS A maps instance is a maps instance = $100
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. After ECS Migration C4 R3 M4R3 R3 R3 R3 R3 M4 M4 M4 M4 M4 C4 C4 C4 C4 C4 Map Service Search ServiceDirections Service A cluster instance is… a maps instance, a directions instance, a search and directions instance, a maps and search instance, all within the same day…
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ECS Billing C4 R3 M4R3 R3 R3 R3 R3 M4 M4 M4 M4 M4 C4 C4 C4 C4 C4 Map Service Search ServiceDirections Service = $1,000 = ?
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Still missing part of the picture Challenges in a shared resource environment - Tags at cluster level - Multiple teams in one resource - Loss of granularity for teams We needed service & task resource consumption within our ECS environment C4 R3 M4R3 R3 R3 R3 R3 M4 M4 M4 M4 M4 C4 C4 C4 C4 C4
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ECS is great! ECS is a whole different monitoring space Hard to answer previously basic questions: • How much did Maps cost this month?
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ECS CloudWatch Metrics Cluster Level - CPU utilization - CPU reservation - Memory utilization - Memory reservation Cluster and service level - CPU utilization - CPU reservation - Memory utilization - Memory reservation
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Mapbox uses 31 custom metrics per cluster* Some examples • AgentConnectedPercent • FailedTaskPlacement • ActiveInstanceCount • StatusCheckFailedInstances • RunningTasksPerInstances • Spot Termination Custom Metrics
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cluster Dashboards AWS CloudFormation creates a CloudWatch dashboard for each cluster var dashboard = require('./dashboard'); Resources: { // ….. Dashboard: { Type: 'AWS::CloudWatch::Dashboard', Properties: { DashboardName: cf.sub('${AWS::StackName}-${AWS::Region}'), DashboardBody: cf.sub(dashboard) } } } - Include dashboard widgets in a ./dashboard.js file - cf references are for https://github.com/mapbox/cloudfriend an open source tool for cloudformation
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A Sample Widget { type: 'metric', x: 0, y: 3, width: 24, height: 6, properties: { view: 'timeSeries', stacked: false, metrics: [ ['AWS/ECS', 'CPUUtilization', 'ClusterName', '${Cluster}'], ['.', 'CPUReservation', '.', '.'], ['.', 'MemoryUtilization', '.', '.', { yAxis: 'left' }], ['.', 'MemoryReservation', '.', '.', { yAxis: 'left' }] ], region: '${AWS::Region}', title: 'Utilization/Reservation', period: 300, yAxis: { left: { min: 0 }, right: { min: 0 } } } }
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Additional Dashboard
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. But wait, how much did Maps cost this month? Variables - 10+ clusters - 10+ Spotfleet mixes with different az’s, prices, etc. - Very different task profiles running - Services v. RunTask
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. But wait, how much did Maps cost this month? Data - How long a task ran on a cluster - What resources that task reserved on the cluster - The cost of resources on the cluster while the task ran
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Recording Task Duration CloudWatch Event Rules - Collect the start and stop event for every task on our infrastructure Take 1: ECS Task start/stop → CloudWatch Event rule → Amazon Kinesis Firehose → Amazon Simple Storage Service (Amazon S3) → AWS Lambda → S3 → Amazon Athena
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Recording Task Duration Part 2 Challenges: - Infrastructure running in all but two ECS regions - Kinesis Firehose not supported in all ECS regions - Volume of start and stop events Take 2: ECS Task start/stop → CloudWatch Event Rule → Lambda → Kinesis Firehose → S3 → Athena
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Putting It Together Duration x Resources reserved → Task Resource Cost Resources reserved - ECS task names – Service_Team_CostCategory_Suffix:Revision Resource usage pipeline: Task definition registered → CloudWatch Event Rule → Lambda → S3 → Athena Cost pipeline Bucket notification → ECS Task → S3 → Athena
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Taking a pause How much spending on tracking spending? - Massive Task Volume - One Athena query cost what? Social impacts of information - Socializing why - High cost of faulty reporting
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Recording Task Duration
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. One Athena query cost what? Data Warehouse - S3 - Athena Queries - Partition - Compression
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Socialize why Empower Teams v. Punish Teams - Big numbers can be scary Creating a healthy system - Monitor → Learn → Take action High Cost of faulty information - Alarms - Regular checks on reporting
  • 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Results with this pipeline Task level detail within our clusters Queryable database of task start and stop events Can cost our containers - Container Y ran for 112 seconds on cluster Z - Container Y reserves 1024 cpu and 2048 memory - A cpu second on cluster Z cost $0.00001 The cost of container Y is $0.0001 x 114,688 cpu/seconds = $1.14
  • 49. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reporting: Custom Cost Metrics Scheduled Lambda → Athena → CloudWatch → S3 Team Cost - Overview of what’s happening in the cluster at the team level Service Cost - Stakeholders get feedback on application level changes and their impact Not every detail is logged to CloudWatch
  • 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reporting back out Scheduled Lambda → Athena → CloudWatch → S3 Determining what to write to CloudWatch - Cost on the Cluster can be rolled up by task, service, team and cost category.
  • 51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Immediate Benefits Can answer key questions: • How much did my Application cost this month? • Why’s the EC2 Bill more expensive? - Diagnose Capacity fluctuations - Usage bill for stakeholder teams - ECS clusters organized for engineering
  • 52. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Additional Benefits Put a focus on right sizing - Charged by reservation not utilization Debugging task placement problems - Replay task history on a box with stop and start events Debugging Instance Events - Disk Utilization
  • 53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s Next? - Utilization and billing tightly linked in stakeholder reporting - Following this model with other large AWS products - Enhanced alarming / monitoring
  • 54. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s Next? We’re growing! - https://www.mapbox.com/jobs/ We look for incredibly smart and radically pragmatic people. Our teams members have a range of cultural, ethnic, social, and economic backgrounds. We prioritize the progression, growth, and leadership of individuals from all backgrounds and strongly believe in the value of cultivating a diverse team and encourage people of all backgrounds to apply.
  • 55. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!