It is a well-known maxim that complexity is an essential property of software. In spite of that, software-implemented functionality has increased dramatically in almost all safety-critical sectors. This increasing reliance on software provides great challenges to traditional practice of system safety, which focuses on the management of system hazards in order to mitigate safety risk. Software safety has emerged as sub-discipline of system safety to help address these challenges. However, the marriage of software and system safety has been an uneasy one. This talk discusses some of the issues that arise when software meets safety. Why safety is a distinct property from quality, reliability and other ilities will be addressed. The impact of software on system safety will be discussed. Finally, the need for safety verification of software-intensive systems will be briefly touched upon.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Software Safety: An Oxymoron? - VanQ 2007
1. Software Safety:
An Oxymoron?
March 29, 2007
Ken Wong, Ph.D., Senior Systems Analyst
McKesson Medical Imaging Group
2. Points to Ponder*
A system can be correct and reliable and yet
unsafe
Software safety is not about bugs
Program testing can be used to show the
presence of bugs, but never to show their
absence
* We will return to these statements in the
discussion
5. Software In the Real World
Therac 25 accidents
Ariane 5 Flight 501 explosion
Titan 4 Centaur/Milstar failure
TCAS collision near Uberlingen, Germany
7. Ariane 501 Events
Destruction of Ariane 501 on 4 June 1996
(from final report):
nominal behaviour of the launcher up to H0 + 36
seconds;
failure of the back-up Inertial Reference System
(SRI) followed immediately by failure of the active
SRI;
9. Safety is a Distinct Property
Safety is a distinct part of the interlocking puzzle
of how to build dependable software
A system can be “correct” and “reliable” and yet
unsafe!
Improved software process alone does not mean a
safer system
Note: These can be a contentious claims even
among safety engineers.
12. “Is it Safe”?
Christian Szell: Is it safe?
Babe: Yes, it's safe, it's very safe, it's so safe you wouldn't
believe it.
- Marathon Man 1976
13. System Safety
“System Safety” is a systematic approach to
safety primarily developed in the US for the
aerospace and defense industries
Spreading to other industries, e.g., health care
Focus on managing system hazards
E.g., FDA Quality System Regulation recommends
“risk analysis” (A.K.A. hazard analysis)
14. System Safety
Hazard ID
Hazard
Analysis
Risk
Assessment
Hazard
Mitigation
Safety
Verification
15. Hazard
A hazard is the system’s potential contribution
to a mishap
E.g., brake failure, engine overheating
Key is understanding the system environment
17. Ariane 501: SRI Bug?
Uncaught exception from floating point
conversion
From high value of BH (Horizontal Bias)
Programming 101!
Conversion check deliberately removed for
performance reasons
SRI reused from Ariane 4
Check not required for Ariane 4 trajectory
18. Safety is a System Property
SRI worked exactly as specified – for Ariane 4!
Ariane 5 trajectory different from Ariane 4
SRI spec did NOT include Ariane 5 trajectory data
SRI NOT tested with Ariane 5 trajectory data
“Safety” cannot be understood without knowing
the operational environment
FDA “use-related” vs “device failure” hazards
E.g., TCAS collision in Germany
19. When Software Met Safety
… there was a definite risk in assuming that critical
equipment such as the SRI had been validated by
qualification on its own, or by previous use on Ariane 4.
ARIANE 5 Flight 501 Failure Report
21. In the beginning (or Europe) …*
Mechanical systems with well understood
designs
Hazards caused by component failure from
random hardware faults
Mitigation through integrity and redundancy
* Myth, but there is underlying truth in all good myths
22. Fault Tree Analysis
Basic Event
Steering Fails
Intermediate
Event
OR
Steering
Assembly Fails Driver
Error
OR
OR
Steering Wheel Fails Drive Shaft Fails Steering Control
Software Fails
23. Is Software Another Component?
What is the probability that the steering
control software fails?
If software is just another component:
1. Software cannot wear out or breakdown like a
mechanical component
2. Only “fault” is a programming bug
3. Assuming programmers do their job, failure rate
should be zero*
*Paraphrased from talk by a system safety engineer
24. Software Revealed
Basic Event
Steering Fails
Intermediate
Event
OR
Steering
Assembly Fails Driver
Error
OR
OR
Steering Wheel Drive Shaft Fails Steering Control
Software Fails
25. The Software Werewolf
Of all the monsters that fill the nightmares of our
folklore, none terrify more than werewolves, because
they transform unexpectedly from the familiar into
horrors … The familiar software project, at least
as seen by the nontechnical manager, has something
of this character …
Frederick P. Brooks, Jr. from No Silver Bullet :
Essence and Accidents of Software Engineering
26. Ariane 501: Safety in Numbers?
In response to “fault”, the Primary SRI was
deliberately shutdown
Attempt made to switch to backup SRI
Typical strategy in face of random failures
However, BOTH SRIs shutdown!
“Fault” due to same design in both SRIs
Exception in non-essential component
27. Safety is an Emergent Property
Software safety is not about “faults”
Many potential “faults” but not all created equal –
most have no impact on safety
“Correct” behaviour can contribute to the
hazard!
Hazards can emerge from complex interactions
between “correct” components
28. When Safety Met Software
An underlying theme in the development of Ariane 5 is
the bias towards the mitigation of random failure.
Board wishes to point out that software is an expression
of a highly detailed design and does not fail in the same
sense as a mechanical system.
ARIANE 5 Flight 501 Failure Report
30. Software and Safety Process
Requirements Hazards
Hazard ID, Analysis
and Mitigation
Design Safety Verification
Verification
Source Code
31. Limits of Testing
Program testing can be used to show the presence of
bugs, but never to show their absence
E. Dijkstra in Structured Programming
32. Hazard-Driven Testing
Focus on hazard – force it to occur
Consider:
Hazard risk (“risk-based testing”)
Mishap scenarios
Hazard causes identified during hazard analysis
Problem reports/issues with safety implications
See Jeffrey J. Joyce and Ken Wong, Hazard-driven Testing of
Safety-Related Software
33. Summary and Conclusions
Safety is a distinct property
Safety is a system property
Operational and development environment factors
Safety is an emergent property
Hazards can emerge from complex interactions
between “correct” components
35. References*
ARIANE 5 Flight 501 Failure Report by the
Inquiry Board, Paris, July 1996
Frederick P. Brooks, Jr., No Silver Bullet : Essence
and Accidents of Software Engineering, Computer
Magazine, April 1987
Jeffrey J. Joyce and Ken Wong, Hazard-driven
Testing of Safety-Related Software, 21st
International System Safety Conference, Ottawa,
Ontario, August 4-8, 2003
*All available on-line