SlideShare una empresa de Scribd logo
1 de 92
Descargar para leer sin conexión
Fault-ToleranceOnTheCheapMaking systems that probably won’t fall over.
Hi, folks!
I do things
to/with
computers.
I’m a real-time,
networked
systems engineer.
Real-Time Systems
Real-Time Systems• Computation on a deadline
Real-Time Systems• Computation on a deadline
• Fail-safe / Fail-operational
Real-Time Systems• Computation on a deadline
• Fail-safe / Fail-operational
• Guaranteed response / Best
effort
Real-Time Systems• Computation on a deadline
• Fail-safe / Fail-operational
• Guaranteed response / Best
effort
• Resource adequate /
inadequate
Networked Systems
Networked Systems• Out-of-order messages
Networked Systems• Out-of-order messages
• No legitimate concept of “now”
Networked Systems• Out-of-order messages
• No legitimate concept of “now”
• High-latency transmission
Networked Systems• Out-of-order messages
• No legitimate concept of “now”
• High-latency transmission
• Lossy transmission
Punk Rock
Version
Here’s a
socket.
Here’s an
interrupt.
Go program
a computer!
AdRoll
real-time
bidding
Erlang
Fault
Tolerance
Sub-components
fail. The system
does not.
Well, not right
away…
What’s it take?
Option 1: Perfection
• Total control over the whole
mechanism.
Option 1: Perfection
• Total control over the whole
mechanism.
• Total understanding of the
problem domain.
Option 1: Perfection
• Total control over the whole
mechanism.
• Total understanding of the
problem domain.
• Specific, explicit system goals.
Option 1: Perfection
• Total control over the whole
mechanism.
• Total understanding of the
problem domain.
• Specific, explicit system goals.
• Well-known service lifetime.
Option 1: Perfection
Option 1: Perfection
“They Write the Right Stuff”
Fast Company, 2005
• Extremely expensive.
Option 1: Perfection
• Extremely expensive.
• Intentionally stifles creativity.
Option 1: Perfection
• Extremely expensive.
• Intentionally stifles creativity.
• Design up front.
Option 1: Perfection
• Extremely expensive.
• Intentionally stifles creativity.
• Design up front.
• Complete control of the system
is not complete.
Option 1: Perfection
Option 2:
Hope for the Best
• Little up-front knowledge of the
problem domain.
Option 2: Hope for the Best
• Little up-front knowledge of the
problem domain.
• Implicit or short-term system
goals.
Option 2: Hope for the Best
• Little up-front knowledge of the
problem domain.
• Implicit or short-term system
goals.
• No money down.
Option 2: Hope for the Best
• Little up-front knowledge of the
problem domain.
• Implicit or short-term system
goals.
• No money down.
• Ingenuity under pressure.
Option 2: Hope for the Best
Option 2: Hope for the Best
“Move fast and
break things!”
• Ignorance of problem domain
leads to long-term system issues.
Option 2: Hope for the Best
• Ignorance of problem domain
leads to long-term system issues.
• Failures do propagate out
toward users.
Option 2: Hope for the Best
• Ignorance of problem domain
leads to long-term system issues.
• Failures do propagate out
toward users.
• No, money down!
Option 2: Hope for the Best
• Ignorance of problem domain
leads to long-term system issues.
• Failures do propagate out
toward users.
• No, money down!
• Hard to change cultural values.
Option 2: Hope for the Best
Option 3:
Embrace Faults
Option 3: Embrace Faults
• Partial control over the whole
mechanism.
Option 3: Embrace Faults
• Partial control over the whole
mechanism.
• Partial understanding of the
problem domain.
Option 3: Embrace Faults
• Partial control over the whole
mechanism.
• Partial understanding of the
problem domain.
• Sorta explicit system goals.
Option 3: Embrace Faults
• Partial control over the whole
mechanism.
• Partial understanding of the
problem domain.
• Sorta explicit system goals.
• Able to spot a failure when you
see one.
Option 3: Embrace Faults
“Fail fast. Either
do the right thing
or stop.”
“Why Do Computers Stop and What Can Be Done
About it?”, Jim Gray, 1985 (paraphrase)
Option 3: Embrace Faults
• Faults are isolated but must be
resolved in production.
Option 3: Embrace Faults
• Faults are isolated but must be
resolved in production.
• Must carefully design for
introspection.
Option 3: Embrace Faults
• Faults are isolated but must be
resolved in production.
• Must carefully design for
introspection.
• Moderate design up-front.
Option 3: Embrace Faults
• Faults are isolated but must be
resolved in production.
• Must carefully design for
introspection.
• Moderate design up-front.
• Pay a little now, pay a little later.
Let’s talk
embracing
faults.
There are four
conceptual stages to
consider.
Component
The most
atomic level
of the system.
Progress here
has an outsized
impact.
Immutable
Data
Structures
Isolate
Side-Effects
Compile-Time
Guarantees
Why test, when
you can prove?
This is
Functional
Programming
Machine
Faults in
components are
exercised here.
Faults in
interactions are
exercised here.
Supervise,
and restart.
Use only
addressable
names.
Distinguish
your critical
components.
Cluster
Redundant
Components
No Single
Points of
Failure
Mean Time
to Failure
Estimates
Instrument
and Monitor
Organization
A finely built machine
without a supporting
organization is a disaster
waiting to happen.
A finely built machine
without
organization is a disaster
waiting to happen.
Chernobyl
STS-51-L
Deepwater Horizon
Magnitogorsk
Damascus Incident
Chevron Refinery
BART ATC
Asiana #214
Therac-25
New Orleans Levee
Correct the conditions
that allowed mistakes,
as well as the mistake.
Process is
Priceless
Build flexible
tools for experts.
Separate
Your
Concerns
Build with
Failure in mind.
Have
resources
you’re
willing to
sacrifice.
Study
accidents.
Every system
carries the
potential for its
own destruction.
Some things aren’t
worth building.
Understand
Networks.
0. The network is unreliable.
1. Latency is non-zero.
2. Bandwidth is finite.
3. The network is insecure.
4. Topology changes.
5. There are many administrators.
6. Transport cost is non-zero.
7. The network is heterogenous.
Thanks
so
much!
@bltroutwine
Recommended Reading“Normal Accidents: Living with High-Risk
Technologies”, Charles Perrow
“Digital Apollo: Human and Machine in
Spaceflight”, David A. Mindel
“Command and Control: Nuclear Weapons,
the Damascus Accident, and the Illusion of
Safety”, Eric Schlosser
“Erlang Programming”, Simon Thompson and
Francesco Cesarini
“Steeltown, USSR”, Stephen Kotkin
“Crash-Only Software”, George Candea and
Armando Fox
“The Truth About Chernobyl”, Grigorii
Medvedev
“Real-Time Systems: Design Principles for
Distributed Embedded Applications”,
Hermann Kopetz
“ Th e Ap o l l o G u i d a n c e C o m p u t e r :
Architecture and Operation”, Frank O’Brien
“Why Do Computers Stop and What Can Be
Done About It?”, Jim Gray
“Thirteen: The Apollo Flight That Failed”,
Henry S.F. Cooper Jr.

Más contenido relacionado

La actualidad más candente

BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...
BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...
BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...BlueHat Security Conference
 
Resilience reloaded - more resilience patterns
Resilience reloaded - more resilience patternsResilience reloaded - more resilience patterns
Resilience reloaded - more resilience patternsUwe Friedrichsen
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programsgreenwop
 
Stop Feeding IBM i Performance Hogs - Robot
Stop Feeding IBM i Performance Hogs - RobotStop Feeding IBM i Performance Hogs - Robot
Stop Feeding IBM i Performance Hogs - RobotHelpSystems
 
First adventure within a shell - Andrea Telatin at Quadram Institute
First adventure within a shell - Andrea Telatin at Quadram InstituteFirst adventure within a shell - Andrea Telatin at Quadram Institute
First adventure within a shell - Andrea Telatin at Quadram InstituteAndrea Telatin
 
Testing & Integration (The Remix)
 Testing & Integration (The Remix) Testing & Integration (The Remix)
Testing & Integration (The Remix)Ines Sombra
 
DEFCON 23 - John Seymour - “Quantum” Classification of Malware
DEFCON 23 - John Seymour - “Quantum” Classification of MalwareDEFCON 23 - John Seymour - “Quantum” Classification of Malware
DEFCON 23 - John Seymour - “Quantum” Classification of MalwareFelipe Prado
 
DockerCon SF 2019 - TDD is Dead
DockerCon SF 2019 - TDD is DeadDockerCon SF 2019 - TDD is Dead
DockerCon SF 2019 - TDD is DeadKevin Crawley
 
We hear you like papers
We hear you like papersWe hear you like papers
We hear you like papersInes Sombra
 
The Dirty Little Secrets They Didn’t Teach You In Pentesting Class
The Dirty Little Secrets They Didn’t Teach You In Pentesting Class The Dirty Little Secrets They Didn’t Teach You In Pentesting Class
The Dirty Little Secrets They Didn’t Teach You In Pentesting Class Chris Gates
 
Top Security Challenges Facing Credit Unions Today
Top Security Challenges Facing Credit Unions TodayTop Security Challenges Facing Credit Unions Today
Top Security Challenges Facing Credit Unions TodayChris Gates
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth PresentationEric Ries
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopKevin Crawley
 
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attac
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attacDefcon 22-paul-mcmillan-attacking-the-iot-using-timing-attac
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attacPriyanka Aash
 
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...Agile Testing Alliance
 
More fun using Kautilya
More fun using KautilyaMore fun using Kautilya
More fun using KautilyaNikhil Mittal
 
Creating Havoc using Human Interface Device
Creating Havoc using Human Interface DeviceCreating Havoc using Human Interface Device
Creating Havoc using Human Interface DevicePositive Hack Days
 
Deep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data ServicesDeep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data ServicesMarco Parenzan
 

La actualidad más candente (20)

BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...
BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...
BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...
 
Resilience reloaded - more resilience patterns
Resilience reloaded - more resilience patternsResilience reloaded - more resilience patterns
Resilience reloaded - more resilience patterns
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programs
 
Stop Feeding IBM i Performance Hogs - Robot
Stop Feeding IBM i Performance Hogs - RobotStop Feeding IBM i Performance Hogs - Robot
Stop Feeding IBM i Performance Hogs - Robot
 
First adventure within a shell - Andrea Telatin at Quadram Institute
First adventure within a shell - Andrea Telatin at Quadram InstituteFirst adventure within a shell - Andrea Telatin at Quadram Institute
First adventure within a shell - Andrea Telatin at Quadram Institute
 
Resilience engineering
Resilience engineeringResilience engineering
Resilience engineering
 
Testing & Integration (The Remix)
 Testing & Integration (The Remix) Testing & Integration (The Remix)
Testing & Integration (The Remix)
 
DEFCON 23 - John Seymour - “Quantum” Classification of Malware
DEFCON 23 - John Seymour - “Quantum” Classification of MalwareDEFCON 23 - John Seymour - “Quantum” Classification of Malware
DEFCON 23 - John Seymour - “Quantum” Classification of Malware
 
DockerCon SF 2019 - TDD is Dead
DockerCon SF 2019 - TDD is DeadDockerCon SF 2019 - TDD is Dead
DockerCon SF 2019 - TDD is Dead
 
Tests antipatterns
Tests antipatternsTests antipatterns
Tests antipatterns
 
We hear you like papers
We hear you like papersWe hear you like papers
We hear you like papers
 
The Dirty Little Secrets They Didn’t Teach You In Pentesting Class
The Dirty Little Secrets They Didn’t Teach You In Pentesting Class The Dirty Little Secrets They Didn’t Teach You In Pentesting Class
The Dirty Little Secrets They Didn’t Teach You In Pentesting Class
 
Top Security Challenges Facing Credit Unions Today
Top Security Challenges Facing Credit Unions TodayTop Security Challenges Facing Credit Unions Today
Top Security Challenges Facing Credit Unions Today
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability Workshop
 
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attac
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attacDefcon 22-paul-mcmillan-attacking-the-iot-using-timing-attac
Defcon 22-paul-mcmillan-attacking-the-iot-using-timing-attac
 
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
 
More fun using Kautilya
More fun using KautilyaMore fun using Kautilya
More fun using Kautilya
 
Creating Havoc using Human Interface Device
Creating Havoc using Human Interface DeviceCreating Havoc using Human Interface Device
Creating Havoc using Human Interface Device
 
Deep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data ServicesDeep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data Services
 

Destacado

Carpetatextiles
CarpetatextilesCarpetatextiles
Carpetatextilesamandaru
 
Garbikariak
GarbikariakGarbikariak
Garbikariaklahmakii
 
Oklahoma Native Plant Society Newsletter - Summer 2012
Oklahoma Native Plant Society Newsletter - Summer 2012Oklahoma Native Plant Society Newsletter - Summer 2012
Oklahoma Native Plant Society Newsletter - Summer 2012Fulvia52x
 
SEOGuardian - Muebles de Baño Online en España - 6 meses después
SEOGuardian - Muebles de Baño Online en España - 6 meses despuésSEOGuardian - Muebles de Baño Online en España - 6 meses después
SEOGuardian - Muebles de Baño Online en España - 6 meses despuésBint
 
Kosten technischer Qualität in der Softwareentwicklung
Kosten technischer Qualität in der SoftwareentwicklungKosten technischer Qualität in der Softwareentwicklung
Kosten technischer Qualität in der SoftwareentwicklungSebastian Dietrich
 
Los profesionales y la seguridad del paciente en España
Los profesionales y la seguridad del paciente en EspañaLos profesionales y la seguridad del paciente en España
Los profesionales y la seguridad del paciente en EspañaPlan de Calidad para el SNS
 
Luis albarracin base de datos 2 parte
Luis albarracin base de datos 2 parteLuis albarracin base de datos 2 parte
Luis albarracin base de datos 2 parteLuis Albarracin
 
Teatro de la sensación taller de protocolo social-saber ser saber estar
Teatro de la sensación taller de protocolo social-saber ser saber estarTeatro de la sensación taller de protocolo social-saber ser saber estar
Teatro de la sensación taller de protocolo social-saber ser saber estarMiguel Muñoz de Morales
 
Content Marketing. Gli errori da non ripetere su internet. Dieci storie dal v...
Content Marketing. Gli errori da non ripetere su internet. Dieci storie dal v...Content Marketing. Gli errori da non ripetere su internet. Dieci storie dal v...
Content Marketing. Gli errori da non ripetere su internet. Dieci storie dal v...Stefano Labate
 
Tratamiento intratimpanico acufeno crónico en pacientes con DM2
Tratamiento intratimpanico acufeno crónico en pacientes con DM2Tratamiento intratimpanico acufeno crónico en pacientes con DM2
Tratamiento intratimpanico acufeno crónico en pacientes con DM2Marcial Hayakawa
 
Bulls Eye: Targeting Your Website and Email in Blackbaud NetCommunity
Bulls Eye: Targeting Your Website and Email in Blackbaud NetCommunityBulls Eye: Targeting Your Website and Email in Blackbaud NetCommunity
Bulls Eye: Targeting Your Website and Email in Blackbaud NetCommunityBlackbaud
 
USECON RoX2016: Künstliche Intelligenz - Joy of Use
USECON RoX2016: Künstliche Intelligenz - Joy of UseUSECON RoX2016: Künstliche Intelligenz - Joy of Use
USECON RoX2016: Künstliche Intelligenz - Joy of UseUSECON
 
Sustainable Socio-Economic Development – The Role of Gold, and Gold Mining
Sustainable Socio-Economic Development – The Role of Gold, and Gold MiningSustainable Socio-Economic Development – The Role of Gold, and Gold Mining
Sustainable Socio-Economic Development – The Role of Gold, and Gold MiningWorld Gold Council
 
Arte japones
Arte japonesArte japones
Arte japonesUANE
 
Six Concepts of Geography
Six Concepts of GeographySix Concepts of Geography
Six Concepts of GeographyPaul Wozney
 

Destacado (20)

Carpetatextiles
CarpetatextilesCarpetatextiles
Carpetatextiles
 
Studieninformation Onlinejournalismus (Bachelor)
Studieninformation Onlinejournalismus (Bachelor)Studieninformation Onlinejournalismus (Bachelor)
Studieninformation Onlinejournalismus (Bachelor)
 
Garbikariak
GarbikariakGarbikariak
Garbikariak
 
Oklahoma Native Plant Society Newsletter - Summer 2012
Oklahoma Native Plant Society Newsletter - Summer 2012Oklahoma Native Plant Society Newsletter - Summer 2012
Oklahoma Native Plant Society Newsletter - Summer 2012
 
Caso2 ana
Caso2 anaCaso2 ana
Caso2 ana
 
SEOGuardian - Muebles de Baño Online en España - 6 meses después
SEOGuardian - Muebles de Baño Online en España - 6 meses despuésSEOGuardian - Muebles de Baño Online en España - 6 meses después
SEOGuardian - Muebles de Baño Online en España - 6 meses después
 
Kosten technischer Qualität in der Softwareentwicklung
Kosten technischer Qualität in der SoftwareentwicklungKosten technischer Qualität in der Softwareentwicklung
Kosten technischer Qualität in der Softwareentwicklung
 
Los profesionales y la seguridad del paciente en España
Los profesionales y la seguridad del paciente en EspañaLos profesionales y la seguridad del paciente en España
Los profesionales y la seguridad del paciente en España
 
Oracle Continuidad de Negocio
Oracle Continuidad de NegocioOracle Continuidad de Negocio
Oracle Continuidad de Negocio
 
Luis albarracin base de datos 2 parte
Luis albarracin base de datos 2 parteLuis albarracin base de datos 2 parte
Luis albarracin base de datos 2 parte
 
Teatro de la sensación taller de protocolo social-saber ser saber estar
Teatro de la sensación taller de protocolo social-saber ser saber estarTeatro de la sensación taller de protocolo social-saber ser saber estar
Teatro de la sensación taller de protocolo social-saber ser saber estar
 
modul tmk 1(4)
modul tmk 1(4)modul tmk 1(4)
modul tmk 1(4)
 
Content Marketing. Gli errori da non ripetere su internet. Dieci storie dal v...
Content Marketing. Gli errori da non ripetere su internet. Dieci storie dal v...Content Marketing. Gli errori da non ripetere su internet. Dieci storie dal v...
Content Marketing. Gli errori da non ripetere su internet. Dieci storie dal v...
 
Anticonceptivo Norplant
Anticonceptivo NorplantAnticonceptivo Norplant
Anticonceptivo Norplant
 
Tratamiento intratimpanico acufeno crónico en pacientes con DM2
Tratamiento intratimpanico acufeno crónico en pacientes con DM2Tratamiento intratimpanico acufeno crónico en pacientes con DM2
Tratamiento intratimpanico acufeno crónico en pacientes con DM2
 
Bulls Eye: Targeting Your Website and Email in Blackbaud NetCommunity
Bulls Eye: Targeting Your Website and Email in Blackbaud NetCommunityBulls Eye: Targeting Your Website and Email in Blackbaud NetCommunity
Bulls Eye: Targeting Your Website and Email in Blackbaud NetCommunity
 
USECON RoX2016: Künstliche Intelligenz - Joy of Use
USECON RoX2016: Künstliche Intelligenz - Joy of UseUSECON RoX2016: Künstliche Intelligenz - Joy of Use
USECON RoX2016: Künstliche Intelligenz - Joy of Use
 
Sustainable Socio-Economic Development – The Role of Gold, and Gold Mining
Sustainable Socio-Economic Development – The Role of Gold, and Gold MiningSustainable Socio-Economic Development – The Role of Gold, and Gold Mining
Sustainable Socio-Economic Development – The Role of Gold, and Gold Mining
 
Arte japones
Arte japonesArte japones
Arte japones
 
Six Concepts of Geography
Six Concepts of GeographySix Concepts of Geography
Six Concepts of Geography
 

Similar a Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over

Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesJonathan Creasy
 
How I failed to build a runbook automation system
How I failed to build a runbook automation systemHow I failed to build a runbook automation system
How I failed to build a runbook automation systemTimothyBonci
 
2016-04-28 - VU Amsterdam - testing safety critical systems
2016-04-28 - VU Amsterdam - testing safety critical systems2016-04-28 - VU Amsterdam - testing safety critical systems
2016-04-28 - VU Amsterdam - testing safety critical systemsJaap van Ekris
 
Testing Safety Critical Systems (10-02-2014, VU amsterdam)
Testing Safety Critical Systems (10-02-2014, VU amsterdam)Testing Safety Critical Systems (10-02-2014, VU amsterdam)
Testing Safety Critical Systems (10-02-2014, VU amsterdam)Jaap van Ekris
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorialduleepa
 
2015 05-07 - vu amsterdam - testing safety critical systems
2015 05-07 - vu amsterdam - testing safety critical systems2015 05-07 - vu amsterdam - testing safety critical systems
2015 05-07 - vu amsterdam - testing safety critical systemsJaap van Ekris
 
Non-Functional Requirements
Non-Functional RequirementsNon-Functional Requirements
Non-Functional RequirementsDavid Simons
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database SystemsDaniel Abadi
 
Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Brian Troutwine
 
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsGo Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsJonas Bonér
 
2017 03-10 - vu amsterdam - testing safety critical systems
2017 03-10 - vu amsterdam - testing safety critical systems2017 03-10 - vu amsterdam - testing safety critical systems
2017 03-10 - vu amsterdam - testing safety critical systemsJaap van Ekris
 
FutureOfTesting2008
FutureOfTesting2008FutureOfTesting2008
FutureOfTesting2008vipulkocher
 
Testing safety critical systems: Practice and Theory (14-05-2013, VU Amsterdam)
Testing safety critical systems: Practice and Theory (14-05-2013, VU Amsterdam)Testing safety critical systems: Practice and Theory (14-05-2013, VU Amsterdam)
Testing safety critical systems: Practice and Theory (14-05-2013, VU Amsterdam)Jaap van Ekris
 
Elite Bug Squashing
Elite Bug SquashingElite Bug Squashing
Elite Bug SquashingTony Brown
 
Testing Is How You Avoid Looking Stupid
Testing Is How You Avoid Looking StupidTesting Is How You Avoid Looking Stupid
Testing Is How You Avoid Looking StupidSteve Branam
 
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos MonkeyMongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos MonkeyMongoDB
 
Design for failure in the IoT: what could possibly go wrong?
Design for failure in the IoT: what could possibly go wrong?Design for failure in the IoT: what could possibly go wrong?
Design for failure in the IoT: what could possibly go wrong?Claire Rowland
 

Similar a Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over (20)

Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeries
 
Chaos engineering
Chaos engineering Chaos engineering
Chaos engineering
 
How I failed to build a runbook automation system
How I failed to build a runbook automation systemHow I failed to build a runbook automation system
How I failed to build a runbook automation system
 
Fault tolerance techniques
Fault tolerance techniquesFault tolerance techniques
Fault tolerance techniques
 
2016-04-28 - VU Amsterdam - testing safety critical systems
2016-04-28 - VU Amsterdam - testing safety critical systems2016-04-28 - VU Amsterdam - testing safety critical systems
2016-04-28 - VU Amsterdam - testing safety critical systems
 
Testing Safety Critical Systems (10-02-2014, VU amsterdam)
Testing Safety Critical Systems (10-02-2014, VU amsterdam)Testing Safety Critical Systems (10-02-2014, VU amsterdam)
Testing Safety Critical Systems (10-02-2014, VU amsterdam)
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorial
 
2015 05-07 - vu amsterdam - testing safety critical systems
2015 05-07 - vu amsterdam - testing safety critical systems2015 05-07 - vu amsterdam - testing safety critical systems
2015 05-07 - vu amsterdam - testing safety critical systems
 
Non-Functional Requirements
Non-Functional RequirementsNon-Functional Requirements
Non-Functional Requirements
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database Systems
 
Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014
 
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsGo Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
 
2017 03-10 - vu amsterdam - testing safety critical systems
2017 03-10 - vu amsterdam - testing safety critical systems2017 03-10 - vu amsterdam - testing safety critical systems
2017 03-10 - vu amsterdam - testing safety critical systems
 
Drop, Stop & Roll
Drop, Stop & RollDrop, Stop & Roll
Drop, Stop & Roll
 
FutureOfTesting2008
FutureOfTesting2008FutureOfTesting2008
FutureOfTesting2008
 
Testing safety critical systems: Practice and Theory (14-05-2013, VU Amsterdam)
Testing safety critical systems: Practice and Theory (14-05-2013, VU Amsterdam)Testing safety critical systems: Practice and Theory (14-05-2013, VU Amsterdam)
Testing safety critical systems: Practice and Theory (14-05-2013, VU Amsterdam)
 
Elite Bug Squashing
Elite Bug SquashingElite Bug Squashing
Elite Bug Squashing
 
Testing Is How You Avoid Looking Stupid
Testing Is How You Avoid Looking StupidTesting Is How You Avoid Looking Stupid
Testing Is How You Avoid Looking Stupid
 
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos MonkeyMongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
 
Design for failure in the IoT: what could possibly go wrong?
Design for failure in the IoT: what could possibly go wrong?Design for failure in the IoT: what could possibly go wrong?
Design for failure in the IoT: what could possibly go wrong?
 

Más de Brian Troutwine

(Moonconf 2016) Fetching Moths from the Works: Correctness Methods in Software
(Moonconf 2016) Fetching Moths from the Works: Correctness Methods in Software(Moonconf 2016) Fetching Moths from the Works: Correctness Methods in Software
(Moonconf 2016) Fetching Moths from the Works: Correctness Methods in SoftwareBrian Troutwine
 
Getting Uphill on a Candle: Crushed Spines, Detached Retinas and One Small Step
Getting Uphill on a Candle: Crushed Spines, Detached Retinas and One Small StepGetting Uphill on a Candle: Crushed Spines, Detached Retinas and One Small Step
Getting Uphill on a Candle: Crushed Spines, Detached Retinas and One Small StepBrian Troutwine
 
The Charming Genius of the Apollo Guidance Computer
The Charming Genius of the Apollo Guidance ComputerThe Charming Genius of the Apollo Guidance Computer
The Charming Genius of the Apollo Guidance ComputerBrian Troutwine
 
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard WorldMonitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard WorldBrian Troutwine
 
Let it crash! The Erlang Approach to Building Reliable Services
Let it crash! The Erlang Approach to Building Reliable ServicesLet it crash! The Erlang Approach to Building Reliable Services
Let it crash! The Erlang Approach to Building Reliable ServicesBrian Troutwine
 
Automation With Humans in Mind: Making Complex Systems Predictable, Reliable ...
Automation With Humans in Mind: Making Complex Systems Predictable, Reliable ...Automation With Humans in Mind: Making Complex Systems Predictable, Reliable ...
Automation With Humans in Mind: Making Complex Systems Predictable, Reliable ...Brian Troutwine
 
Erlang, LFE, Joxa and Elixir: Established and Emerging Languages in the Erlan...
Erlang, LFE, Joxa and Elixir: Established and Emerging Languages in the Erlan...Erlang, LFE, Joxa and Elixir: Established and Emerging Languages in the Erlan...
Erlang, LFE, Joxa and Elixir: Established and Emerging Languages in the Erlan...Brian Troutwine
 
Instrumentation as a Living Documentation: Teaching Humans About Complex Systems
Instrumentation as a Living Documentation: Teaching Humans About Complex SystemsInstrumentation as a Living Documentation: Teaching Humans About Complex Systems
Instrumentation as a Living Documentation: Teaching Humans About Complex SystemsBrian Troutwine
 
10 Billion a Day, 100 Milliseconds Per: Monitoring Real-Time Bidding at AdRoll
10 Billion a Day, 100 Milliseconds Per: Monitoring Real-Time Bidding at AdRoll10 Billion a Day, 100 Milliseconds Per: Monitoring Real-Time Bidding at AdRoll
10 Billion a Day, 100 Milliseconds Per: Monitoring Real-Time Bidding at AdRollBrian Troutwine
 
Monitoring with exometer at AdRoll
Monitoring with exometer at AdRollMonitoring with exometer at AdRoll
Monitoring with exometer at AdRollBrian Troutwine
 

Más de Brian Troutwine (10)

(Moonconf 2016) Fetching Moths from the Works: Correctness Methods in Software
(Moonconf 2016) Fetching Moths from the Works: Correctness Methods in Software(Moonconf 2016) Fetching Moths from the Works: Correctness Methods in Software
(Moonconf 2016) Fetching Moths from the Works: Correctness Methods in Software
 
Getting Uphill on a Candle: Crushed Spines, Detached Retinas and One Small Step
Getting Uphill on a Candle: Crushed Spines, Detached Retinas and One Small StepGetting Uphill on a Candle: Crushed Spines, Detached Retinas and One Small Step
Getting Uphill on a Candle: Crushed Spines, Detached Retinas and One Small Step
 
The Charming Genius of the Apollo Guidance Computer
The Charming Genius of the Apollo Guidance ComputerThe Charming Genius of the Apollo Guidance Computer
The Charming Genius of the Apollo Guidance Computer
 
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard WorldMonitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
 
Let it crash! The Erlang Approach to Building Reliable Services
Let it crash! The Erlang Approach to Building Reliable ServicesLet it crash! The Erlang Approach to Building Reliable Services
Let it crash! The Erlang Approach to Building Reliable Services
 
Automation With Humans in Mind: Making Complex Systems Predictable, Reliable ...
Automation With Humans in Mind: Making Complex Systems Predictable, Reliable ...Automation With Humans in Mind: Making Complex Systems Predictable, Reliable ...
Automation With Humans in Mind: Making Complex Systems Predictable, Reliable ...
 
Erlang, LFE, Joxa and Elixir: Established and Emerging Languages in the Erlan...
Erlang, LFE, Joxa and Elixir: Established and Emerging Languages in the Erlan...Erlang, LFE, Joxa and Elixir: Established and Emerging Languages in the Erlan...
Erlang, LFE, Joxa and Elixir: Established and Emerging Languages in the Erlan...
 
Instrumentation as a Living Documentation: Teaching Humans About Complex Systems
Instrumentation as a Living Documentation: Teaching Humans About Complex SystemsInstrumentation as a Living Documentation: Teaching Humans About Complex Systems
Instrumentation as a Living Documentation: Teaching Humans About Complex Systems
 
10 Billion a Day, 100 Milliseconds Per: Monitoring Real-Time Bidding at AdRoll
10 Billion a Day, 100 Milliseconds Per: Monitoring Real-Time Bidding at AdRoll10 Billion a Day, 100 Milliseconds Per: Monitoring Real-Time Bidding at AdRoll
10 Billion a Day, 100 Milliseconds Per: Monitoring Real-Time Bidding at AdRoll
 
Monitoring with exometer at AdRoll
Monitoring with exometer at AdRollMonitoring with exometer at AdRoll
Monitoring with exometer at AdRoll
 

Último

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 

Último (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 

Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over