2. Attitude
Icould talk about techniques, tools,
FV
Environments, algorithms, machinery
Languages, suites, training
but I think attitude is more important
than any of those
2
2
3. No Perfect Designs
Nothing is perfect, everything has bugs
– Shortcomings, compromises, defects, design errata, gaffes, goofs,
fumbles, errors, boneheaded mistakes, bobbles, bungles, boo-boos
– But not all bugs are equal!
Can’t test to saturation: schedule matters too
Why is everything always so darned buggy?
– Software…need say no more…
– Why did Titanic not have waterproof compartments?
– Why did Ford Pinto have gas tank in back?
– Why did Challenger fly with leaky O-rings?
– Why did torpedoes not explode in WWII?
Entropy has a preferred direction
Only genius could paint Mona Lisa,
but any small child can destroy it quickly
1000 ways to do things wrong, 1 or 2 that work
4/4/07 Bob Colwell 3
3
5. Accidents Are Inevitable
– It's the nature of engineering
to push designs to edge of
failure (schedule, reliability,
thermals, materials, tools,
judgment of unknowns)
– P(accident) = ε , for ε ≠ 0
– World rewards this behavior
Cool new features + first to
market often preferred to
dependability
Other markets (life-support)
make (or should make) this
trade-off differently!
4/4/07 Bob Colwell 5
5
6. Isn’t that just ?
Close. But Murphy is not
quite right.
1. #Near-misses >> #disasters
2. Competent design/test finds
simple errors
3. Complex sequences & unlikely
4/4/07 event Colwell
Bob cascades survive to prod’n
6
6
7. Failures Getting Worse
Mechanical things usually fail predictably due to physics
– Wings bend, bridges groan, engines rattle, knees ache
– By contrast, computer-based things fail “all over the place”
Helpful Engineering Attitude:
1. Nature does not want your
engineered system to work; will
actively work against you
2. Your design will do only what
you’ve constrained it to do, only
as long as it has to
3. Watch out for…
Normalization of deviance
(Challenger O-rings, Apollo
1 fire)
4/4/07 Bob Colwell 7
7
8. The Steely-Eyed Missile Validator
Apollo 12
2nd try to land on moon, launched 11/14/69
36 seconds after liftoff, spacecraft struck by lightning => power
surge r t ant o
– All telemetry went haywire; book said to abort liftoff
m os t imp T?”
said 3 t w HA
– Both spacecraft pilot and mission controller were furiously considering that option
– But John Aaron was on shift, and thought he’d seen this malfunction beforeas T
once “Wha
During testing 1 ac As
imov e observed test that went off into weeds
year earlier, Aaron ar
e
Isa to investigateien–c him to obscure SCE subsystem
in sc this led
ords
– Aaron took it on himself
w
In critical “abort or not” few seconds, with lives on line, Aaron made one of
most famous calls in NASA history
– “Flight, try SCE to ‘Aux’”
– Neither Flight nor spacecraft pilot Conrad knew what that even meant, but Alan Bean tried it
– Telemetry came right back, vaulted Aaron into validation stardom
He could have blown off earlier test, but he didn’t
His inner validator wanted to know “what just happened?” 8
8
9. Complexity Implies Surprises
…and surprises are bad
Chaos effects in complex µ P’s
– Decomposability is a fundamental tenet of
complex system design
– Butterfly wings ruin decomposability
– “Improve design, get slower performance” not
at all uncommon
We must stop designing large
systems as though small ones simply
scale up
9
– lesson from comm engineers: assume errors 9
10. Thinking about validation
Abilityto think in analogies is highest
form of intelligence
– IQ tests like “a:b :: c:d”
– Hofstadter's book: numerical sequences
Analogies may illuminate a subject in
a way that direct introspection cannot
– They drive our minds to their creative limits
10
10
11. Listen to Your Inner Validator
0, 1, 2, …?
You knew it wouldn’t be 3, didn’t you?
– You sensed something’s not quite as it seems
Answer: 0, 1, 2, 720!, …
= 0, 1, 2, 6!!
D. Hofstadter, Fluid Concepts and Creative Analogies
= 0, 1!, 2!!, 3!!!, …
That was the voice of your inner
validator that you were hearing
11
11
13. What Happened?
Spec was marginal
40’ threaded rods
“too hard”, changed
to 2x20’ by contractor
No simulation, no test
Who goofed?
Engineer, contractor,
inspector…everyone
13
13
14. Therac-25
Medical particle
accelerator
Electrons,
protons, X-rays
Six fatalities
from poor
system/SW
design
– And blind naïve
14
faith in computers!
14
15. Question Everything
Test assumptions as well as design
– If assumptions are broken, design surely is too
– Try to “catch the field goals”
15
15
16. Fight Urge to Relax Requirements
Challenger
– Not ok to slip design assumptions (launch temp,
# of unburnt O-rings) to suit desires
Airbus
– Blaming pilot not reasonable explanation; pilot
is part of system design
Runway “incursions” up 71% since ‘93
– Near-misses are trying to tell us something
Diane Vaughan, The Challenger Launch Decision, Chicago Press
1996; Nancy Leveson, Safeware, Addison-Wesley 1995
16
16
17. If You Didn’t Test It,
It Doesn’t Work
Mir: fire extinguishers bolted to wall
– Still had strong metal launch straps
– Had never been needed before, so never tested
– Discovered with a roaring fire several feet away
17
17
18. Complexity Makes Everything Worse
Some things must be complicated to do their job
– Our brains, for example
But complex sequences are root of most disasters
– Challenger, Bhopal, Chernobyl, FDIV, Exxon Valdez
Where does complexity come from? Why does it
keep increasing? Where are the limits?
– Pentium 4
“in the small” vs “in the large” design (micros vs
comm systems)
What to do? Vigilance, testing, awareness…we are
all validators
4/4/07 Bob Colwell 18
18
19. What To Do
Get the spec right
Design for correctness but…
design knowing perfection is unattainable
Users are part of the system
Formal methods
Pre-production testing and validation
Post-production testing and verification
Education of the public
19
19
20. Roles
Engineers must stand
their ground
– There are always doubts,
incomplete data; don’t let
‘em use those against you
Judgment is crucially
needed -- YOURS
–Remember the Challenger mgt HR engineer
“My God, Thiokol, when do you want me to launch? Next April?”
–Be careful with “data”
“Risk assessment data is like a captured spy; if you torture it long enough, it will tell you
anything you want to know…” (Wm. Ruckelshaus)
–Crushing, conflicting demands are norm
Design must push the envelope w/o ceding responsibility
Validation establishes whether they've pushed it too far
Management must beware overriding tech judgment
Public must understand limits of human design process
All players must value roles of others!
4/4/07 Bob Colwell 20
20
21. Roles cont.
Management
– wants to assume a product is safe
– knows nothing’s ever perfect,
comes a time to “shoot the engineers” or they’ll never
stop tinkering
Validators
– want to prove a product is safe
– assume it is not by default
– only informed arbiters of when product is ready
don’t fall for “might as well sign, we’re 21
21
22. Future Directions:
Public Expectations
Andy Grove’s FDIV epiphany
Paradoxically, the more high tech, the more public expects of product
Users caused Chernobyl, TMI by going “off book”, but prevented many
other disasters with real-time creativity…lessons are subtle
Takes exquisite understanding & judgment to discern
accidents from reasonable risk-taking and
bonehead errors or incompetence
This is what a jury must do.
How?
Can’t keep trending this way
22
22
23. Future of Validation
Multiple Culture Changes Needed
Public needs to stop expecting perfection
Design teams must explicitly limit complexity
and avoid auto-scale-up assumptions
Companies must mature past point of viewing
validation as an unpleasant overhead
does your company have “Validation Fellows?”
Validation is a profession of its own.
Cultivate the Validation Attitude!
23
23