SlideShare a Scribd company logo
1 of 57
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
Colm MacCárthaigh
PID loops and the art of keeping
systems stable
@colmmacc
2019-06-24
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
pid-loops/
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
Control Theory: Where the fruit is
hanging so low IT IS TOUCHING
THE GROUND
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
META
© 2019, Amazon Web Services, Inc. or its Affiliates.
© 2019, Amazon Web Services, Inc. or its Affiliates.
© 2019, Amazon Web Services, Inc. or its Affiliates.
ObservePresent
FeedbackReact
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
Control Theory
© 2019, Amazon Web Services, Inc. or its Affiliates.
Prior Art
© 2019, Amazon Web Services, Inc. or its Affiliates.
Prior Art
© 2019, Amazon Web Services, Inc. or its Affiliates.
Control Theory and PID loops
Comes up in the context of …
Autoscaling and placement: Instances, Storage, Network, etc.
Fairness algorithms: TCP, Queues, Throttling
Systems stability
© 2019, Amazon Web Services, Inc. or its Affiliates.
The Furnace
© 2019, Amazon Web Services, Inc. or its Affiliates.
The Furnace
© 2019, Amazon Web Services, Inc. or its Affiliates.
The Furnace
Measure
React
© 2019, Amazon Web Services, Inc. or its Affiliates.
The Furnace
error
Time
© 2019, Amazon Web Services, Inc. or its Affiliates.
The Furnace
error
Time
© 2019, Amazon Web Services, Inc. or its Affiliates.
The Furnace
error
Time
© 2019, Amazon Web Services, Inc. or its Affiliates.
Autoscaling
error
Time
© 2019, Amazon Web Services, Inc. or its Affiliates.
Autoscaling
Measure
React
© 2019, Amazon Web Services, Inc. or its Affiliates.
Autoscaling : forecasting and fancy integrals!
Any signal can be processed with Fourier Analysis to find underlying constituent
frequencies
Real-world operational systems often have strong daily, weekly, annual cycles, etc.
Holt-Winters Forecasting can simulate these cycles into the future
Machine Learning can do even better!
© 2019, Amazon Web Services, Inc. or its Affiliates.
Autoscaling : forecasting and fancy integrals!
© 2019, Amazon Web Services, Inc. or its Affiliates.
Placement and fairness
© 2019, Amazon Web Services, Inc. or its Affiliates.
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
X-Ray Vision: Open Loops
1
© 2019, Amazon Web Services, Inc. or its Affiliates.
X-Ray Vision: Open Loops
# launch 10 Instances
for (i = 0; i < 10; i++)
instance[i] = ec2_launch_instance()
# wait a minute
sleep(60);
# Register the instances
for (i = 0; i < 10; i++)
register_instance(instance[i]);
© 2019, Amazon Web Services, Inc. or its Affiliates.
X-Ray Vision: Open Loops
• A surprising number of real-world systems are Open Loops
• Potential reasons:
• Organic out-growth from scripts
• Imperative programming “Do this, then do this” is very natural
• Infrastructure is very very reliable these days
• Infrequent actions
© 2019, Amazon Web Services, Inc. or its Affiliates.
X-Ray Vision: Open Loops
© 2019, Amazon Web Services, Inc. or its Affiliates.
X-Ray Vision: Open Loops
© 2019, Amazon Web Services, Inc. or its Affiliates.
X-Ray Vision: Open Loops
• Closing loops:
• Embrace “Measure first. Then react.”
• Measure a lot of things. Check everything you can think to.
• Avoid infrequent operations – make them more frequent where possible.
© 2019, Amazon Web Services, Inc. or its Affiliates.
X-Ray Vision: Open Loops
© 2019, Amazon Web Services, Inc. or its Affiliates.
X-Ray Vision: Open Loops
• Measure-first systems tend to be more naturally declarative
• In general, declarative are easily to formally verify
• TLA+ , F*, CoQ, SAW/Cryptol
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
X-Ray Vision: Power Laws
2
© 2019, Amazon Web Services, Inc. or its Affiliates.
Power Laws
© 2019, Amazon Web Services, Inc. or its Affiliates.
Power Laws
© 2019, Amazon Web Services, Inc. or its Affiliates.
Power Laws
© 2019, Amazon Web Services, Inc. or its Affiliates.
Power Laws
© 2019, Amazon Web Services, Inc. or its Affiliates.
Power Laws
© 2019, Amazon Web Services, Inc. or its Affiliates.
Power Laws
• First, compartmentalize.
• More compartments means relatively smaller blast radius.
• Many real-world control systems reflect this lesson of scale.
• What next?
© 2019, Amazon Web Services, Inc. or its Affiliates.
Power Laws
• Exponential Back-off
• Brings our own power-law to the table
• Rate-limiters
• Simple token buckets can be incredibly effective
• Working Backpressure
• AWS SDK retry strategy = Token buckets + Rate-Limiters + persistent state
© 2019, Amazon Web Services, Inc. or its Affiliates.
Power Laws
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
X-Ray Vision: Liveness and Lag
3
© 2019, Amazon Web Services, Inc. or its Affiliates.
X-Ray Vision: Liveness and Lag
• Operating on old information can be worse than operating on no information
• Simple example: system gets very busy and workflows and metrics pipelines can build
up
• Ephemeral “shocks” such as spiky loads or brief outages can end up taking very long
to recover
© 2019, Amazon Web Services, Inc. or its Affiliates.
X-Ray Vision: Liveness and Lag
• Strive for O(1) scaling as much as possible
• Provision everything, every time
• Report everything, every time
• Do everything, every time
© 2019, Amazon Web Services, Inc. or its Affiliates.
X-Ray Vision: Liveness and Lag
• If you need to use a bus or queue, think carefully about limits on the size of that
queue
• In general: short queues are safer
• LIFO queues can be a great strategy for information channels
• Naturally prioritizes recent state
• Out of order back-fill for any “catching up”
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
X-Ray Vision: False Functions
4
© 2019, Amazon Web Services, Inc. or its Affiliates.
False functions
load
utilization
© 2019, Amazon Web Services, Inc. or its Affiliates.
False functions
load
utilization
© 2019, Amazon Web Services, Inc. or its Affiliates.
False functions
• Hall of fame false function: Unix load
• Runners-up: system latency, network latency
• Hard to predict Garbage Collector behavior can be confounding
• CPU can be surprisingly effective
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
X-Ray Vision: Edge Triggering
5
© 2019, Amazon Web Services, Inc. or its Affiliates.
Edge Triggering
load
utilization
© 2019, Amazon Web Services, Inc. or its Affiliates.
Edge Triggering
• Edge Triggering invites modal behavior
• Often the new mode kicks in at a time of high-stress
• Edge Triggering often associated with the “Deliver exactly once” problem
• O.k. for alerting humans but usually an anti-pattern for control systems
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
Summary
© 2019, Amazon Web Services, Inc. or its Affiliates.
Summary
• “Measure first” and “Integrate feedback” are deeply rewarding concepts
• Right now, this knowledge is highly leveraged
• We can think of distributed systems in terms of control theory, with 100 years of
powerful mental models available
• Control Theory can help us formally analyze the stability of systems
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
Q&A
Colm MacCárthaigh
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
Thank you!
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
pid-loops/

More Related Content

More from C4Media

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinC4Media
 

More from C4Media (20)

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

PID Loops and the Art of Keeping Systems Stable

  • 1. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Colm MacCárthaigh PID loops and the art of keeping systems stable @colmmacc 2019-06-24
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ pid-loops/
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Control Theory: Where the fruit is hanging so low IT IS TOUCHING THE GROUND
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. META
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates.
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates.
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates. ObservePresent FeedbackReact
  • 9. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Control Theory
  • 10. © 2019, Amazon Web Services, Inc. or its Affiliates. Prior Art
  • 11. © 2019, Amazon Web Services, Inc. or its Affiliates. Prior Art
  • 12. © 2019, Amazon Web Services, Inc. or its Affiliates. Control Theory and PID loops Comes up in the context of … Autoscaling and placement: Instances, Storage, Network, etc. Fairness algorithms: TCP, Queues, Throttling Systems stability
  • 13. © 2019, Amazon Web Services, Inc. or its Affiliates. The Furnace
  • 14. © 2019, Amazon Web Services, Inc. or its Affiliates. The Furnace
  • 15. © 2019, Amazon Web Services, Inc. or its Affiliates. The Furnace Measure React
  • 16. © 2019, Amazon Web Services, Inc. or its Affiliates. The Furnace error Time
  • 17. © 2019, Amazon Web Services, Inc. or its Affiliates. The Furnace error Time
  • 18. © 2019, Amazon Web Services, Inc. or its Affiliates. The Furnace error Time
  • 19. © 2019, Amazon Web Services, Inc. or its Affiliates. Autoscaling error Time
  • 20. © 2019, Amazon Web Services, Inc. or its Affiliates. Autoscaling Measure React
  • 21. © 2019, Amazon Web Services, Inc. or its Affiliates. Autoscaling : forecasting and fancy integrals! Any signal can be processed with Fourier Analysis to find underlying constituent frequencies Real-world operational systems often have strong daily, weekly, annual cycles, etc. Holt-Winters Forecasting can simulate these cycles into the future Machine Learning can do even better!
  • 22. © 2019, Amazon Web Services, Inc. or its Affiliates. Autoscaling : forecasting and fancy integrals!
  • 23. © 2019, Amazon Web Services, Inc. or its Affiliates. Placement and fairness
  • 24. © 2019, Amazon Web Services, Inc. or its Affiliates.
  • 25. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops 1
  • 26. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops # launch 10 Instances for (i = 0; i < 10; i++) instance[i] = ec2_launch_instance() # wait a minute sleep(60); # Register the instances for (i = 0; i < 10; i++) register_instance(instance[i]);
  • 27. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops • A surprising number of real-world systems are Open Loops • Potential reasons: • Organic out-growth from scripts • Imperative programming “Do this, then do this” is very natural • Infrastructure is very very reliable these days • Infrequent actions
  • 28. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops
  • 29. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops
  • 30. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops • Closing loops: • Embrace “Measure first. Then react.” • Measure a lot of things. Check everything you can think to. • Avoid infrequent operations – make them more frequent where possible.
  • 31. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops
  • 32. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Open Loops • Measure-first systems tend to be more naturally declarative • In general, declarative are easily to formally verify • TLA+ , F*, CoQ, SAW/Cryptol
  • 33. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Power Laws 2
  • 34. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws
  • 35. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws
  • 36. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws
  • 37. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws
  • 38. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws
  • 39. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws • First, compartmentalize. • More compartments means relatively smaller blast radius. • Many real-world control systems reflect this lesson of scale. • What next?
  • 40. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws • Exponential Back-off • Brings our own power-law to the table • Rate-limiters • Simple token buckets can be incredibly effective • Working Backpressure • AWS SDK retry strategy = Token buckets + Rate-Limiters + persistent state
  • 41. © 2019, Amazon Web Services, Inc. or its Affiliates. Power Laws
  • 42. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Liveness and Lag 3
  • 43. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Liveness and Lag • Operating on old information can be worse than operating on no information • Simple example: system gets very busy and workflows and metrics pipelines can build up • Ephemeral “shocks” such as spiky loads or brief outages can end up taking very long to recover
  • 44. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Liveness and Lag • Strive for O(1) scaling as much as possible • Provision everything, every time • Report everything, every time • Do everything, every time
  • 45. © 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Liveness and Lag • If you need to use a bus or queue, think carefully about limits on the size of that queue • In general: short queues are safer • LIFO queues can be a great strategy for information channels • Naturally prioritizes recent state • Out of order back-fill for any “catching up”
  • 46. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: False Functions 4
  • 47. © 2019, Amazon Web Services, Inc. or its Affiliates. False functions load utilization
  • 48. © 2019, Amazon Web Services, Inc. or its Affiliates. False functions load utilization
  • 49. © 2019, Amazon Web Services, Inc. or its Affiliates. False functions • Hall of fame false function: Unix load • Runners-up: system latency, network latency • Hard to predict Garbage Collector behavior can be confounding • CPU can be surprisingly effective
  • 50. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. X-Ray Vision: Edge Triggering 5
  • 51. © 2019, Amazon Web Services, Inc. or its Affiliates. Edge Triggering load utilization
  • 52. © 2019, Amazon Web Services, Inc. or its Affiliates. Edge Triggering • Edge Triggering invites modal behavior • Often the new mode kicks in at a time of high-stress • Edge Triggering often associated with the “Deliver exactly once” problem • O.k. for alerting humans but usually an anti-pattern for control systems
  • 53. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Summary
  • 54. © 2019, Amazon Web Services, Inc. or its Affiliates. Summary • “Measure first” and “Integrate feedback” are deeply rewarding concepts • Right now, this knowledge is highly leveraged • We can think of distributed systems in terms of control theory, with 100 years of powerful mental models available • Control Theory can help us formally analyze the stability of systems
  • 55. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Q&A Colm MacCárthaigh
  • 56. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Thank you!
  • 57. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ pid-loops/