TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Fuller.david
1. National Aeronautics and Space Administration
How Complex Systems Fail
David Fuller
NASA Glenn Research Center
www.nasa.gov 1
2. National Aeronautics and Space Administration
How Complex Systems Fail
A Short Treatise by Richard Cook, MD
• Written by Richard Cook, MD, Director of the
Cognitive Technologies Laboratory at the University
of Chicago
• http://www.ctlab.org/documents/How%20Complex%2
0Systems%20Fail.pdf
• 18 short paragraphs on complex systems that will
help every project manager understand and reduce
risk in their project
www.nasa.gov 2
3. National Aeronautics and Space Administration
1. Complex Systems are Intrinsically
Hazardous Systems
• Complex systems are found in transportation,
healthcare, power generation, and space.
• Because they are complex, they are inherently and
unavoidably hazardous.
• The defenses that are created against these hazards
characterize these systems.
www.nasa.gov 3
4. National Aeronautics and Space Administration
2. Complex Systems are Heavily and
Successfully Defended Against Failure
• Multiple layers of defense against hazards in:
– Machine
– Human
– Organizational
– Institutional
– Regulatory
• These defenses keep operations away from
accidents
www.nasa.gov 4
5. National Aeronautics and Space Administration
3. Catastrophe Requires Multiple Failures
• Defenses are generally successful.
• Catastrophic failures occur when small or
disconnected failures come together.
• Most initial failure trajectories are blocked by the
systems safety components.
• Trajectories that reach operational level are blocked
by humans operating the system.
www.nasa.gov 5
6. National Aeronautics and Space Administration
4. Complex Systems Contain Changing
Mixtures of Latent Failures
• Multiple flaws are always present.
• Individual flaws are considered minor factors
because they are insufficient individually to cause
failure.
• Eradication of latent failures is limited by economic
cost.
• Difficult to foresee how these minor flaws might
contribute to accidents.
• Failures change constantly:
– Changing technology
– Changing work organization
– Changing efforts to eradicate failures.
www.nasa.gov 6
7. National Aeronautics and Space Administration
5. Complex Systems Run in Degraded Mode
• Complex systems run as broken systems.
• Continues to function because it contains many
redundancies.
• Human operators learn to make it function.
• System operations are dynamic:
– Organization changes
– Human behavior changes
– Technology changes.
www.nasa.gov 7
8. National Aeronautics and Space Administration
6. Catastrophe is Always
Just Around the Corner
• Human operators are in close physical and temporal
proximity to these potential failures.
• Failure can occur at any time and any place.
• It is impossible to eliminate this potential.
• Potential for disaster is always present by the
systems own nature.
www.nasa.gov 8
9. National Aeronautics and Space Administration
7. Post-Accident Attribution to a “Root
Cause” is Fundamentally Wrong
• There is never an isolated cause of an accident.
• Many individual causes that join together to cause
accidents.
• Causes are many times not coupled.
• Evaluations based on finding the “root cause” show a
misunderstanding of the nature of accidents.
• Insistence on a “root cause” reflects the social and
cultural need to blame specific, localized forces for
accidents.
www.nasa.gov 9
10. National Aeronautics and Space Administration
8. Hindsight Biases Post-Accident
Assessments of Human Performance
• Knowledge of the outcome makes the investigator
unable to understand the human factors present at
the time of accident.
• Knowledge of the outcome poisons the ability of the
investigator to recreate the views of the humans
involved.
• Hindsight bias remains the primary obstacle to
accident investigation, especially when expert human
performance is involved.
www.nasa.gov 10
11. National Aeronautics and Space Administration
9. Human Operators have Dual Roles:
Producers and Defenders Against Failure
• Operators work to produce the desired product and
also work to forestall accidents.
• Operators balance production against safety in a
dynamic environment.
• In times of no accidents, production is emphasized.
• After accidents, the defensive role is emphasized.
www.nasa.gov 11
12. National Aeronautics and Space Administration
10. All Practitioner Actions are Gambles
• All decisions are made in the face of uncertainty.
• The degree of uncertainty changes from moment to
moment.
• The “gamble” appears clear after accidents (see 8
above).
• Post hoc analysis of accidents regards these
gambles as poor ones.
• Successful outcomes are also the result of gambles,
but are seen in a much more favorable light.
www.nasa.gov 12
13. National Aeronautics and Space Administration
11. Actions at the Sharp End Resolve All
Ambiguity
• Organizations are ambiguous about the relationship
between:
– Production
– Efficient use of resources
– Economy/costs of operations
– Acceptable risk
• All of this ambiguity is resolved moment by moment
by the operators.
www.nasa.gov 13
14. National Aeronautics and Space Administration
12. Human Practitioners are the Adaptable
Element of Complex Systems
• Operators actively adapt the system to maximize
production and minimize accidents.
• These adaptations include:
– Restructuring the system to reduce exposure of vulnerable
parts to failure
– Concentrating critical resources in areas of high demand
– Providing pathways for retreat or recovery from faults
– Establishing means for early detection of changed system
performance.
www.nasa.gov 14
15. National Aeronautics and Space Administration
13. Human expertise in Complex Systems is
Constantly Changing
• Expertise changes as technology changes.
• Experts are replaced (turnover).
• Operators are being trained and skills refined.
• The cognitive abilities of humans are variable from
moment to moment.
www.nasa.gov 15
16. National Aeronautics and Space Administration
14. Change Introduces New Forms of
Failure
• A low rate of accidents may encourage changes.
• Changes create opportunities for new failure modes.
• New technologies introduce new failure pathways.
• Because failures are low rate, multiple system
changes may occur before an accident, making it
hard to understand the contribution of the new
technology.
www.nasa.gov 16
17. National Aeronautics and Space Administration
15. Views of “Cause” Limit the Effectiveness
of Defenses Against Future Events
• Post-accident remedies for “human error” are usually
predicated on obstructing activities that “cause”
accidents.
• These measure do little to reduce the likelihood of
further accidents.
• Identical accidents are very low because the pattern
of latent failures changes constantly.
• Post-accident remedies usually increase the coupling
and complexity of the system.
www.nasa.gov 17
18. National Aeronautics and Space Administration
16. Safety is a Characteristic of Systems
and not their Components
• Safety is an emergent property.
• It does not reside in any one person, device, or
department with the organization.
• The state of safety is always dynamic.
• The whole is greater than the sum of the parts.
www.nasa.gov 18
19. National Aeronautics and Space Administration
17. People Continuously Create Safety
• Failure free operations are the result of activities of
people who work to keep the system within the
boundaries of tolerable performance.
• These activities are part of normal operations.
• Because system operations are never trouble free,
operators adapt to changing conditions.
• Operators are creating safety from moment to
moment.
• Safety is at the mercy of the operators perception of
the situation.
www.nasa.gov 19
20. National Aeronautics and Space Administration
18. Failure Free Operations Require
Experience with Failure
• Recognizing hazards and successfully manipulating
system operations requires intimate contact with
failure.
• Operators must be able to see the “edge of the
envelope.”
• Improved safety depends on providing operators with
calibrated views of the hazards.
• Training allows errors to be experienced in a
controlled environment.
www.nasa.gov 20