SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
“Sh!@$ on Fire, Yo!”
True Stories Inspired by Real Events
Brendan Aye
Technical Director, Platform Architecture
James Webb
Member of Technical Staff
2
Platform and Infrastructure Engineering
§ 55 Team Members, including redundant and
geo-distributed Joes
§ Virtual Infrastructure
§ 5,000 Virtual Hosts
§ 50,000 Virtual Machines
§ CloudFoundry
§ 30 Foundations
§ 75,000 Application Instances
§ Kubernetes
§ 90 Clusters
§ 22,000 Pods
Who We Are
T-Mobile Confidential
3
Platform KPIs
Synthetic Transactions
BlackBox Monitoring
Server Infrastructure
Network Infrastructure
Slack
Application Requests
Container Metrics
What Do We Watch?
4
Architecting a Highly Available CloudFoundry App
Foundation A Foundation B Foundation C
Myapp.geo.cf.mydomain.com Myapp.geo.cf.mydomain.com Myapp.geo.cf.mydomain.com
Load Balancer Load Balancer Load Balancer
GSLB
Clients
Myapp.geo.cf.mydomain.com
Myapp.geo.cf.mydomain.com
Myapp.geo.cf.mydomain.com
Myapp.geo.cf.mydomain.com
Myapp.geo.cf.mydomain.com
Myapp.geo.cf.mydomain.com
55
§ Platform team built shiny GSLB-as-a-service
§ Customer team consumed shiny GSLB-as-a-
service
§ Clients queried GLSB to determine endpoint
and established persistent HTTP
connections
§ App teams took one region out of load which
correctly de-registered it with GSLB
§ Persistent connections don’t need to query
GSLB anymore and the LoadBalancer kept
the connections alive… L
What Went Wrong?
T-Mobile Confidential
6
§ Improved Documentation! GLSB is only one
method to load balance application traffic, so
explaining its benefits and drawbacks is
crucial to a successful partnership.
§ Sharing incident post-mortem with GSLB
customers so they understand what went
wrong, and how they can plan for expected
failure.
§ Suggesting disabling HTTP keep-alive when
using GLSB
§ Investing in alternative platform-supported
load balancing methodologies.
How Did We Get Better?
T-Mobile Confidential
77
§ Homebrew Java Application running on
WebLogic
§ Running in a single Kubernetes cluster, but
with many instances spread across
multiple share-nothing AZs
§ Application upgrades and restarts
working fine and not causing any impacts
to service
§ Multi-tenant cluster managed by Platform
Team with daytime upgrades planned
during CloudFoundry Summit 2019
Anatomy of a Failing Kubernetes App
88
§ Cluster upgrades kicked off with max-in-flight
of one
§ As nodes quickly cycled through upgrades,
application had fewer and fewer ‘ready’
pods
§ By the time remaining nodes were upgraded,
all customer pods were in a crashed state
and failing to come back up
§ Management was displeased with our
daytime upgrades with no Change Request
leading to a P1 Incident
What Went Wrong?
T-Mobile Confidential
9
§ Switching application to depend on
/dev/urandom instead of /dev/random
§ Customers implemented Pod Disruption
Budgets (PDB) to maintain a minimum
of 66% of ready pods before upgrades
can proceed
§ File a Change Request for anything that
touches a customer-facing cluster (yes,
even non-production)
How Did We Get Better?How Did We Get Better?
How Do You
Prevent
Incidents?
11
§ Adopt a policy of radical transparency
with your customers
§ Assume your customers are right until
you can demonstrate otherwise
§ Avoid seeing Mean-Time-To-Blame as a
useful KPI
§ When your platform is at fault, accept
responsibility, fix the issue, and explain
how you’ll improve
§ When a customer is doing something
that will lead to failure, ensure your
concern is heard and partner for
success
You
Can’t
T-Mobile Confidential
Let’s talk
“Sh*^%# on Fire, Yo!”: A True Story Inspired by Real Events

Más contenido relacionado

La actualidad más candente

Handling Secrets in Your Cloud Native Architecture
Handling Secrets in Your Cloud Native ArchitectureHandling Secrets in Your Cloud Native Architecture
Handling Secrets in Your Cloud Native ArchitectureVMware Tanzu
 
DevSecOps with Confidence
DevSecOps with ConfidenceDevSecOps with Confidence
DevSecOps with ConfidenceVMware Tanzu
 
Creating Polyglot Communication Between Kubernetes Clusters and Legacy System...
Creating Polyglot Communication Between Kubernetes Clusters and Legacy System...Creating Polyglot Communication Between Kubernetes Clusters and Legacy System...
Creating Polyglot Communication Between Kubernetes Clusters and Legacy System...VMware Tanzu
 
Spring Cloud Kubernetes: An Easier Path from Idea to Production
Spring Cloud Kubernetes: An Easier Path from Idea to ProductionSpring Cloud Kubernetes: An Easier Path from Idea to Production
Spring Cloud Kubernetes: An Easier Path from Idea to ProductionVMware Tanzu
 
State of Steeltoe 2020
State of Steeltoe 2020State of Steeltoe 2020
State of Steeltoe 2020VMware Tanzu
 
Observability Enhancements in Steeltoe
Observability Enhancements in Steeltoe Observability Enhancements in Steeltoe
Observability Enhancements in Steeltoe VMware Tanzu
 
DevOps KPIs as a Service: Daimler’s Solution
DevOps KPIs as a Service: Daimler’s SolutionDevOps KPIs as a Service: Daimler’s Solution
DevOps KPIs as a Service: Daimler’s SolutionVMware Tanzu
 
The Path Towards Spring Boot Native Applications
The Path Towards Spring Boot Native ApplicationsThe Path Towards Spring Boot Native Applications
The Path Towards Spring Boot Native ApplicationsVMware Tanzu
 
A Leader’s Guide to DevOps Practices and Culture
A Leader’s Guide to DevOps Practices and CultureA Leader’s Guide to DevOps Practices and Culture
A Leader’s Guide to DevOps Practices and CultureVMware Tanzu
 
Not Just Initializing
Not Just InitializingNot Just Initializing
Not Just InitializingVMware Tanzu
 
VMware Tanzu Introduction- June 11, 2020
VMware Tanzu Introduction- June 11, 2020VMware Tanzu Introduction- June 11, 2020
VMware Tanzu Introduction- June 11, 2020VMware Tanzu
 
From Pivotal to VMware Tanzu: What you need to know
From Pivotal to VMware Tanzu: What you need to knowFrom Pivotal to VMware Tanzu: What you need to know
From Pivotal to VMware Tanzu: What you need to knowVMware Tanzu
 
Building Kubernetes images at scale with Tanzu Build Service
Building Kubernetes images at scale with Tanzu Build ServiceBuilding Kubernetes images at scale with Tanzu Build Service
Building Kubernetes images at scale with Tanzu Build ServiceVMware Tanzu
 
Adopting Azure, Cloud Foundry and Microservice Architecture at Merrill Corpor...
Adopting Azure, Cloud Foundry and Microservice Architecture at Merrill Corpor...Adopting Azure, Cloud Foundry and Microservice Architecture at Merrill Corpor...
Adopting Azure, Cloud Foundry and Microservice Architecture at Merrill Corpor...VMware Tanzu
 
Accelerate Spring Apps to Cloud at Scale
Accelerate Spring Apps to Cloud at ScaleAccelerate Spring Apps to Cloud at Scale
Accelerate Spring Apps to Cloud at ScaleAsir Selvasingh
 
Spring: Your Next Java Micro-Framework
Spring: Your Next Java Micro-FrameworkSpring: Your Next Java Micro-Framework
Spring: Your Next Java Micro-FrameworkVMware Tanzu
 
IoT Scale Event-Stream Processing for Connected Fleet at Penske
IoT Scale Event-Stream Processing for Connected Fleet at PenskeIoT Scale Event-Stream Processing for Connected Fleet at Penske
IoT Scale Event-Stream Processing for Connected Fleet at PenskeVMware Tanzu
 
Successful and Sustainable Business Transformation: The 4 x 3 Approach
Successful and Sustainable Business Transformation: The 4 x 3 ApproachSuccessful and Sustainable Business Transformation: The 4 x 3 Approach
Successful and Sustainable Business Transformation: The 4 x 3 ApproachVMware Tanzu
 
vSphere with Kubernetes Virtual Event- June 16, 2020
vSphere with Kubernetes Virtual Event- June 16, 2020vSphere with Kubernetes Virtual Event- June 16, 2020
vSphere with Kubernetes Virtual Event- June 16, 2020VMware Tanzu
 

La actualidad más candente (20)

Handling Secrets in Your Cloud Native Architecture
Handling Secrets in Your Cloud Native ArchitectureHandling Secrets in Your Cloud Native Architecture
Handling Secrets in Your Cloud Native Architecture
 
DevSecOps with Confidence
DevSecOps with ConfidenceDevSecOps with Confidence
DevSecOps with Confidence
 
Creating Polyglot Communication Between Kubernetes Clusters and Legacy System...
Creating Polyglot Communication Between Kubernetes Clusters and Legacy System...Creating Polyglot Communication Between Kubernetes Clusters and Legacy System...
Creating Polyglot Communication Between Kubernetes Clusters and Legacy System...
 
Spring Cloud Kubernetes: An Easier Path from Idea to Production
Spring Cloud Kubernetes: An Easier Path from Idea to ProductionSpring Cloud Kubernetes: An Easier Path from Idea to Production
Spring Cloud Kubernetes: An Easier Path from Idea to Production
 
State of Steeltoe 2020
State of Steeltoe 2020State of Steeltoe 2020
State of Steeltoe 2020
 
Observability Enhancements in Steeltoe
Observability Enhancements in Steeltoe Observability Enhancements in Steeltoe
Observability Enhancements in Steeltoe
 
What Is Spring?
What Is Spring?What Is Spring?
What Is Spring?
 
DevOps KPIs as a Service: Daimler’s Solution
DevOps KPIs as a Service: Daimler’s SolutionDevOps KPIs as a Service: Daimler’s Solution
DevOps KPIs as a Service: Daimler’s Solution
 
The Path Towards Spring Boot Native Applications
The Path Towards Spring Boot Native ApplicationsThe Path Towards Spring Boot Native Applications
The Path Towards Spring Boot Native Applications
 
A Leader’s Guide to DevOps Practices and Culture
A Leader’s Guide to DevOps Practices and CultureA Leader’s Guide to DevOps Practices and Culture
A Leader’s Guide to DevOps Practices and Culture
 
Not Just Initializing
Not Just InitializingNot Just Initializing
Not Just Initializing
 
VMware Tanzu Introduction- June 11, 2020
VMware Tanzu Introduction- June 11, 2020VMware Tanzu Introduction- June 11, 2020
VMware Tanzu Introduction- June 11, 2020
 
From Pivotal to VMware Tanzu: What you need to know
From Pivotal to VMware Tanzu: What you need to knowFrom Pivotal to VMware Tanzu: What you need to know
From Pivotal to VMware Tanzu: What you need to know
 
Building Kubernetes images at scale with Tanzu Build Service
Building Kubernetes images at scale with Tanzu Build ServiceBuilding Kubernetes images at scale with Tanzu Build Service
Building Kubernetes images at scale with Tanzu Build Service
 
Adopting Azure, Cloud Foundry and Microservice Architecture at Merrill Corpor...
Adopting Azure, Cloud Foundry and Microservice Architecture at Merrill Corpor...Adopting Azure, Cloud Foundry and Microservice Architecture at Merrill Corpor...
Adopting Azure, Cloud Foundry and Microservice Architecture at Merrill Corpor...
 
Accelerate Spring Apps to Cloud at Scale
Accelerate Spring Apps to Cloud at ScaleAccelerate Spring Apps to Cloud at Scale
Accelerate Spring Apps to Cloud at Scale
 
Spring: Your Next Java Micro-Framework
Spring: Your Next Java Micro-FrameworkSpring: Your Next Java Micro-Framework
Spring: Your Next Java Micro-Framework
 
IoT Scale Event-Stream Processing for Connected Fleet at Penske
IoT Scale Event-Stream Processing for Connected Fleet at PenskeIoT Scale Event-Stream Processing for Connected Fleet at Penske
IoT Scale Event-Stream Processing for Connected Fleet at Penske
 
Successful and Sustainable Business Transformation: The 4 x 3 Approach
Successful and Sustainable Business Transformation: The 4 x 3 ApproachSuccessful and Sustainable Business Transformation: The 4 x 3 Approach
Successful and Sustainable Business Transformation: The 4 x 3 Approach
 
vSphere with Kubernetes Virtual Event- June 16, 2020
vSphere with Kubernetes Virtual Event- June 16, 2020vSphere with Kubernetes Virtual Event- June 16, 2020
vSphere with Kubernetes Virtual Event- June 16, 2020
 

Similar a “Sh*^%# on Fire, Yo!”: A True Story Inspired by Real Events

Blue Shield of California: Improving Service and Competitiveness with IBM Pur...
Blue Shield of California: Improving Service and Competitiveness with IBM Pur...Blue Shield of California: Improving Service and Competitiveness with IBM Pur...
Blue Shield of California: Improving Service and Competitiveness with IBM Pur...Perficient, Inc.
 
Webinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration MigraineWebinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration MigrainePeak Hosting
 
Blue Shield of CA Revolutionizes its Portal Environment on IBM PureApplicatio...
Blue Shield of CA Revolutionizes its Portal Environment on IBM PureApplicatio...Blue Shield of CA Revolutionizes its Portal Environment on IBM PureApplicatio...
Blue Shield of CA Revolutionizes its Portal Environment on IBM PureApplicatio...Perficient, Inc.
 
Planning for a (Mostly) Hassle-Free Cloud Migration | VTUG 2016 Winter Warmer
Planning for a (Mostly) Hassle-Free Cloud Migration | VTUG 2016 Winter WarmerPlanning for a (Mostly) Hassle-Free Cloud Migration | VTUG 2016 Winter Warmer
Planning for a (Mostly) Hassle-Free Cloud Migration | VTUG 2016 Winter WarmerJoe Conlin
 
A Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROIA Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROIRightScale
 
(DVO201) Scaling Your Web Applications with AWS Elastic Beanstalk
(DVO201) Scaling Your Web Applications with AWS Elastic Beanstalk(DVO201) Scaling Your Web Applications with AWS Elastic Beanstalk
(DVO201) Scaling Your Web Applications with AWS Elastic BeanstalkAmazon Web Services
 
The simplest cloud migration in the world by Webscale
The simplest cloud migration in the world by WebscaleThe simplest cloud migration in the world by Webscale
The simplest cloud migration in the world by WebscaleWebscale Networks
 
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfWhy Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfDATAVERSITY
 
Turning up the HEAT with IBM MobileFirst for iOS Apps
Turning up the HEAT with IBM MobileFirst for iOS AppsTurning up the HEAT with IBM MobileFirst for iOS Apps
Turning up the HEAT with IBM MobileFirst for iOS AppsMichael Elder
 
Technical Webinar with AWS - Everything You Need to Measure in Your Migration
Technical Webinar with AWS - Everything You Need to Measure in Your MigrationTechnical Webinar with AWS - Everything You Need to Measure in Your Migration
Technical Webinar with AWS - Everything You Need to Measure in Your MigrationNew Relic
 
Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020VMware Tanzu
 
How to move to the cloud
How to move to the cloudHow to move to the cloud
How to move to the cloudInterxion
 
Cloud Foundry Summit 2015: Leaving your Comfort Zone - Garmin and Cloud Foundry
Cloud Foundry Summit 2015: Leaving your Comfort Zone - Garmin and Cloud FoundryCloud Foundry Summit 2015: Leaving your Comfort Zone - Garmin and Cloud Foundry
Cloud Foundry Summit 2015: Leaving your Comfort Zone - Garmin and Cloud FoundryVMware Tanzu
 
Implementing Service Oriented Architecture
Implementing Service Oriented ArchitectureImplementing Service Oriented Architecture
Implementing Service Oriented ArchitectureAmazon Web Services
 
DevOps and Application Delivery for Hybrid Cloud - DevOpsSummit session
DevOps and Application Delivery for Hybrid Cloud  - DevOpsSummit sessionDevOps and Application Delivery for Hybrid Cloud  - DevOpsSummit session
DevOps and Application Delivery for Hybrid Cloud - DevOpsSummit sessionSanjeev Sharma
 
Implementing Service Oriented Architecture
Implementing Service Oriented ArchitectureImplementing Service Oriented Architecture
Implementing Service Oriented ArchitectureAmazon Web Services
 
Implementing Service Oriented Architecture
Implementing Service Oriented Architecture Implementing Service Oriented Architecture
Implementing Service Oriented Architecture Amazon Web Services
 
WebSphere Application Server - Meeting Your Cloud and On-Premise Demands
WebSphere Application Server - Meeting Your Cloud and On-Premise DemandsWebSphere Application Server - Meeting Your Cloud and On-Premise Demands
WebSphere Application Server - Meeting Your Cloud and On-Premise DemandsIan Robinson
 
Cloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platformCloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platformCodemotion
 

Similar a “Sh*^%# on Fire, Yo!”: A True Story Inspired by Real Events (20)

Blue Shield of California: Improving Service and Competitiveness with IBM Pur...
Blue Shield of California: Improving Service and Competitiveness with IBM Pur...Blue Shield of California: Improving Service and Competitiveness with IBM Pur...
Blue Shield of California: Improving Service and Competitiveness with IBM Pur...
 
Webinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration MigraineWebinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration Migraine
 
Blue Shield of CA Revolutionizes its Portal Environment on IBM PureApplicatio...
Blue Shield of CA Revolutionizes its Portal Environment on IBM PureApplicatio...Blue Shield of CA Revolutionizes its Portal Environment on IBM PureApplicatio...
Blue Shield of CA Revolutionizes its Portal Environment on IBM PureApplicatio...
 
Planning for a (Mostly) Hassle-Free Cloud Migration | VTUG 2016 Winter Warmer
Planning for a (Mostly) Hassle-Free Cloud Migration | VTUG 2016 Winter WarmerPlanning for a (Mostly) Hassle-Free Cloud Migration | VTUG 2016 Winter Warmer
Planning for a (Mostly) Hassle-Free Cloud Migration | VTUG 2016 Winter Warmer
 
A Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROIA Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROI
 
(DVO201) Scaling Your Web Applications with AWS Elastic Beanstalk
(DVO201) Scaling Your Web Applications with AWS Elastic Beanstalk(DVO201) Scaling Your Web Applications with AWS Elastic Beanstalk
(DVO201) Scaling Your Web Applications with AWS Elastic Beanstalk
 
The simplest cloud migration in the world by Webscale
The simplest cloud migration in the world by WebscaleThe simplest cloud migration in the world by Webscale
The simplest cloud migration in the world by Webscale
 
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfWhy Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
 
Turning up the HEAT with IBM MobileFirst for iOS Apps
Turning up the HEAT with IBM MobileFirst for iOS AppsTurning up the HEAT with IBM MobileFirst for iOS Apps
Turning up the HEAT with IBM MobileFirst for iOS Apps
 
Technical Webinar with AWS - Everything You Need to Measure in Your Migration
Technical Webinar with AWS - Everything You Need to Measure in Your MigrationTechnical Webinar with AWS - Everything You Need to Measure in Your Migration
Technical Webinar with AWS - Everything You Need to Measure in Your Migration
 
Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020
 
How to move to the cloud
How to move to the cloudHow to move to the cloud
How to move to the cloud
 
Cloud Foundry Summit 2015: Leaving your Comfort Zone - Garmin and Cloud Foundry
Cloud Foundry Summit 2015: Leaving your Comfort Zone - Garmin and Cloud FoundryCloud Foundry Summit 2015: Leaving your Comfort Zone - Garmin and Cloud Foundry
Cloud Foundry Summit 2015: Leaving your Comfort Zone - Garmin and Cloud Foundry
 
Implementing Service Oriented Architecture
Implementing Service Oriented ArchitectureImplementing Service Oriented Architecture
Implementing Service Oriented Architecture
 
DevOps and Application Delivery for Hybrid Cloud - DevOpsSummit session
DevOps and Application Delivery for Hybrid Cloud  - DevOpsSummit sessionDevOps and Application Delivery for Hybrid Cloud  - DevOpsSummit session
DevOps and Application Delivery for Hybrid Cloud - DevOpsSummit session
 
Implementing Service Oriented Architecture
Implementing Service Oriented ArchitectureImplementing Service Oriented Architecture
Implementing Service Oriented Architecture
 
Implementing Service Oriented Architecture
Implementing Service Oriented Architecture Implementing Service Oriented Architecture
Implementing Service Oriented Architecture
 
WebSphere Application Server - Meeting Your Cloud and On-Premise Demands
WebSphere Application Server - Meeting Your Cloud and On-Premise DemandsWebSphere Application Server - Meeting Your Cloud and On-Premise Demands
WebSphere Application Server - Meeting Your Cloud and On-Premise Demands
 
Cloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platformCloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platform
 
DevOps Case Studies
DevOps Case StudiesDevOps Case Studies
DevOps Case Studies
 

Más de VMware Tanzu

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItVMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleVMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductVMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And BeyondVMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptxVMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchVMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishVMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - FrenchVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootVMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerVMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeVMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsVMware Tanzu
 

Más de VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Último

Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 

Último (20)

Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 

“Sh*^%# on Fire, Yo!”: A True Story Inspired by Real Events

  • 1. “Sh!@$ on Fire, Yo!” True Stories Inspired by Real Events Brendan Aye Technical Director, Platform Architecture James Webb Member of Technical Staff
  • 2. 2 Platform and Infrastructure Engineering § 55 Team Members, including redundant and geo-distributed Joes § Virtual Infrastructure § 5,000 Virtual Hosts § 50,000 Virtual Machines § CloudFoundry § 30 Foundations § 75,000 Application Instances § Kubernetes § 90 Clusters § 22,000 Pods Who We Are T-Mobile Confidential
  • 3. 3 Platform KPIs Synthetic Transactions BlackBox Monitoring Server Infrastructure Network Infrastructure Slack Application Requests Container Metrics What Do We Watch?
  • 4. 4 Architecting a Highly Available CloudFoundry App Foundation A Foundation B Foundation C Myapp.geo.cf.mydomain.com Myapp.geo.cf.mydomain.com Myapp.geo.cf.mydomain.com Load Balancer Load Balancer Load Balancer GSLB Clients Myapp.geo.cf.mydomain.com Myapp.geo.cf.mydomain.com Myapp.geo.cf.mydomain.com Myapp.geo.cf.mydomain.com Myapp.geo.cf.mydomain.com Myapp.geo.cf.mydomain.com
  • 5. 55 § Platform team built shiny GSLB-as-a-service § Customer team consumed shiny GSLB-as-a- service § Clients queried GLSB to determine endpoint and established persistent HTTP connections § App teams took one region out of load which correctly de-registered it with GSLB § Persistent connections don’t need to query GSLB anymore and the LoadBalancer kept the connections alive… L What Went Wrong? T-Mobile Confidential
  • 6. 6 § Improved Documentation! GLSB is only one method to load balance application traffic, so explaining its benefits and drawbacks is crucial to a successful partnership. § Sharing incident post-mortem with GSLB customers so they understand what went wrong, and how they can plan for expected failure. § Suggesting disabling HTTP keep-alive when using GLSB § Investing in alternative platform-supported load balancing methodologies. How Did We Get Better? T-Mobile Confidential
  • 7. 77 § Homebrew Java Application running on WebLogic § Running in a single Kubernetes cluster, but with many instances spread across multiple share-nothing AZs § Application upgrades and restarts working fine and not causing any impacts to service § Multi-tenant cluster managed by Platform Team with daytime upgrades planned during CloudFoundry Summit 2019 Anatomy of a Failing Kubernetes App
  • 8. 88 § Cluster upgrades kicked off with max-in-flight of one § As nodes quickly cycled through upgrades, application had fewer and fewer ‘ready’ pods § By the time remaining nodes were upgraded, all customer pods were in a crashed state and failing to come back up § Management was displeased with our daytime upgrades with no Change Request leading to a P1 Incident What Went Wrong? T-Mobile Confidential
  • 9. 9 § Switching application to depend on /dev/urandom instead of /dev/random § Customers implemented Pod Disruption Budgets (PDB) to maintain a minimum of 66% of ready pods before upgrades can proceed § File a Change Request for anything that touches a customer-facing cluster (yes, even non-production) How Did We Get Better?How Did We Get Better?
  • 11. 11 § Adopt a policy of radical transparency with your customers § Assume your customers are right until you can demonstrate otherwise § Avoid seeing Mean-Time-To-Blame as a useful KPI § When your platform is at fault, accept responsibility, fix the issue, and explain how you’ll improve § When a customer is doing something that will lead to failure, ensure your concern is heard and partner for success You Can’t T-Mobile Confidential