SlideShare una empresa de Scribd logo
1 de 78
Testing Safety Critical Systems
Theory and Experiences
J.vanEkris@Delta-Pi.nl
http://www.slideshare.net/Jaap_van_Ekris/
My Job
Your life’s goal will be to stay out of
the newspapers
Gerard Duin (KEMA)
Worked at
My Projects
Agenda
•
•
•
•

The Goal
The requirements
The challenge
Go with the process flow
– Development Process
– System design
– Testing Techniques

• Trends
• Reality
4
Goals of testing safety critical systems
• Verify contractually agreed functionality
• Verify correct functional safety-behaviour

• Verify safety-behaviour during degraded and
failure conditions
THE REQUIREMENTS
What is so different about safety critical systems?
Some people live on the edge…
How would you feel if you were getting
ready to launch and knew you were
sitting on top of two million parts
-- all built by the lowest bidder on a
government contract.
John Glenn
Actually, we all do…
We might have become overprotective…
The public is mostly unaware of risk…
Until it is too late…
• February 1st 1953
• Spring tide and heavy
winds broke dykes
• Killed 1836 humans
and 30.000 animals
The battle against flood risk…
• Cost €2.500.000.000
• The largest moving
structure on the
planet
• Defends
– 500 km2 land
– 80.000 people

• Partially controlled
by software
Nothing is flawless, by design…
No matter how well the
design has been:
• Some scenarios will be
missed
• Some scenarios are
too expensive to
prevent:
– Accept risk
– Communicate to stakeholders
When is software good enough?
• Dutch Law on
storm surge
barriers

• Equalizes risk
of dying due
to unnatural
causes across
the Netherlands
Risks have to be balanced…

VS.

Availability of the service

Safety of the service
Oosterschelde Storm Surge Barrier
• Chance of
– Failure to close: 10-7
per usage
– Unexpected closure:
10-4 per year
To put things in perspective…
•
•
•
•
•
•
•
•
•

Having a drunk pilot: 10-2 per flight
Hurt yourself when using a chainsaw: 10-3 per use
Dating a supermodel: 10-5 in a lifetime
Drowning in a bathtub: 10-7 in a lifetime
Being hit by falling airplane parts: 10-8 in a lifetime
Being killed by lighting: 10-9 per lifetime
Winning the lottery: 10-10 per lifetime
Your house being hit by a meteor: 10-15 per lifetime
Winning the lottery twice: 10-20 per lifetime
Small chances do happen…
Risk balance does change over time...
9/11...
• Identified a
fundamental (new) risk
to ATC systems
• Changed the ATC
system dramatically
• Doubled our safety
critical scenario’s
Are software risks acceptable?
Software plays a significant role...
The industry statistics are against us…
• Capers-Jones: at least 2 high severity
errors per 10KLoc
• Industry concensus is that software
will never be more reliable than
– 10-5 per usage
– 10-9 per operating hour
THE CHALLENGE
Why is testing safety critical systems so hard?
The value of testing
Program testing can be used to show the
presence of bugs, but never to show
their absence!
Edsger W. Dijkstra
Is just testing enough?
• 64 bits input isn’t that
uncommon
• 264 is the global rice
production in 1000
years, measured in
individual grains
• Fully testing all binary
inputs on a simple 64-bits
stimilus response system
once takes 2 centuries
THE SOFTWARE DEVELOPMENT
PROCESS
Quality and reliability start at conception, not at testing…
IEC 61508: Safety Integrity Level and
acceptable risk
IEC61508: Risk distribution
IEC 61508: A process for safety critical functions
SYSTEM DESIGN
What do safety critical systems look like and what are their most important drivers?
Design Principles
•
•
•
•
•

Keep it simple...
Risk analysis drives design (decissions)
Safety first (production later)
Fail-to-safe
There shall be no single source of
(catastrophic) failure
A simple design of a storm surge barrier
Relais

(€10,00/piece)

Waterdetector
(€17,50)

Design documentation
(Sponsored by Heineken)
Risk analysis
Broken cable
Chance: Medium
Cause: digging, seaguls
Effect: Catastophic

Relais failure
Chance: small
Cause: aging
Effect: catastophic

Waterdetector fails
Change: Huge
Oorzaken:
Rust, driftwood, seaguls
(eating, shitting)
Effect: Catastophic

Measurement errors
Chance: Collossal
Causes: Waves, wind
Effect: False Positive
System Architecture
Risk analysis
Typical risks identified
•
•
•
•
•
•

37

Components making the wrong decissions
Power failure
Hardware failure of PLC’s/Servers
Network failure
Ship hitting water sensors
Human maintenance error
Risk ≠ system crash
• Understandability of
the GUI
• Wrongful functional
behaviour
• Data accuracy
• Lack of response speed
• Tolerance towards
unlogical inputs
• Resistance to hackers
Usability of a MMI is key to safety
Systems do misbehave...
Systems can be late…
Systems aren’t your only problem
StuurX: Component architecture design
Stuurx::Functionality, initial global design
Init
Waterlevel < 3 meter
Waterlevel

Wacht
Waterlevel> 3 meter

Start_D

W_O_D

Sluit_?

“Start” signal to Diesels

“Diesels ready”

“Close Barrier”
Stuurx::Functionality, final global design
Stuurx::Functionality, Wait_For_Diesels
, detailed design
VERIFICATION
What is getting tested, and how?
Design completion...
An example of safety critical components
IEC 61508 SIL4: Required verification activities
Design Validation and Verification
• Peer reviews by
–
–
–
–

System architect
2nd designer
Programmers
Testmanager system testing

• Fault Tree Analysis / Failure Mode and Effect
Analysis
• Performance modeling
• Static Verification/ Dynamic Simulation by
(Twente University)
Programming (in C/C++)
• Coding standard:
– Based on “Safer C”, by Les Hutton
– May only use safe subset of the compiler
– Verified by Lint and 5 other tools

• Code is peer reviewed by 2nd developer
• Certified and calibrated compiler
Unit tests
• Focus on conformance to specifications
• Required coverage: 100% with respect to:
– Code paths
– Input equivalence classes

• Boundary Value analysis
• Probabilistic testing
• Execution:
– Fully automated scripts, running 24x7
– Creates 100Mb/hour of logs and measurement data

• Upon bug detection
– 3 strikes is out  After 3 implementation errors it is build by another developer
– 2 strikes is out  Need for a 2nd rebuild implies a redesign by another designer
Representative testing is difficult
Integration testing
• Focus on
– Functional behaviour of chain of components
– Failure scenarios based on risk analysis

• Required coverage
– 100% coverage on input classes

• Probabilistic testing
• Execution:
– Fully automated scripts, running 24x7, speed times 10
– Creates 250Mb/hour of logs and measurement data

• Upon detection
– Each bug  Rootcause-analysis
Redundancy is a nasty beast
• You do get functional
behaviour of your entire
system
• It is nearly impossible to
see if all components
are working correctly
• Is EVERYTHING working
ok, or is it the safetynet?

56
System testing
• Focus on
– Functional behaviour
– Failure scenarios based on risk analysis

• Required coverage
– 100% complete environment (simultation)
– 100% coverage on input classes

• Execution:
– Fully automated scripts, running 24x7, speed times 10
– Creates 250Mb/hour of logs and measurement data

• Upon detection
– Each bug  Rootcause-analysis
Endurance testing
• Look for the “one in a
million times” problem
• Challenge:
– Software is deterministic
– execution is not (timing,
transmission-errors,
system load)

• Have an automated
script run it over and
over again
Results of Endurance Tests
Reliability Growth of Function M, Project S

Chance of Failure (Logarithmic Scale)

1.E+00
1.E-01
1.E-02
1.E-03
1.E-04

1.E-05
4.35

4.36
Platform Version

4.37
Acceptance testing
• Acceptance testing
1. Functional acceptance
2. Failure behaviour, all top 50 (FMECA) risks tested
3. A year of operational verification

• Execution:
– Tests performed on a working stormsurge barrier
– Creates 250Mb/hour of logs and measurement data

• Upon detection
– Each bug  Root cause-analysis
A risk limit to testing

• Some things are too
dangerous to test

• Some tests introduce
more risks than they
try to mitigate
• There should always be
a safe way out of a test
procedure
Testing safety critical functions is
dangerous...
GUI Acceptance testing

• Looking for

– quality in use for interactive
systems
– Understandability of the
GUI

• Structural investigation of
the performance of the
man-machine interactions
• Looking for “abuse” by the
users
• Looking at real-life handling
of emergency operations
Avalanche testing
• To test the capabilies of
alarming and control
• Usually starts with one
simple trigger
• Generally followed by
millions of alarms
• Generally brings your
network and systems
to the breaking point
Crash and recovery procedure testing
• Validation of system
behaviour after massive
crash and restart
• Usually identifies many
issues about emergency
procedures
• Sometimes identifies issues
around power supply
• Usually identifies some
(combination of) systems
incapable of unattended
recovery...
Production has its challenges…
• Are equipment and
processes optimally
arranged?
• Are the humans up to
their task?
• Does everything
perform as expected?
TRENDS
What is the newest and hottest?
Model Driven Design
A real-life example
A root-cause analysis of this flaw
REALITY
What are the real-life challenges of a testmanager of safety critical systems?
Difference between theory and reality
Working together…
Requires true commitment to results…
• Romans put the architect
under the arches when
removing the scaffolding
• Boeing and Airbus put all
lead-engineers on the first
test-flight
• Dijkstra put his
“rekenmeisjes” on the
opposite dock when
launching ships
It is about keeping your back straight…
• Thomas Andrews, Jr.
• Naval architect in charge of RMS Titanic
• He recognized regulations were
insufficient for ship the size of Titanic
• Decisions “forced upon him” by the client:
– Limit the range of double hulls
– Limit the number of lifeboats

• He was on the maiden voyage to spot
improvements
• He knowingly went down with the ship,
saving as many as he could
It requires a specific breed of people

The faiths of developers and
testers are linked to safety
critical systems into
eternity
Conclusion
• Stop reading newspapers
• Safety Critical Testing is a
lot of work, making sure
nothing happens
• Technically it isn’t that
much different, we’re just
more rigerous and use a
specific breed of
people....
Questions?

Más contenido relacionado

La actualidad más candente

incident analysis - procedure and approach
incident analysis - procedure and approachincident analysis - procedure and approach
incident analysis - procedure and approach
Derek Chang
 
Practical Safety Instrumentation & Emergency Shutdown Systems for Process Ind...
Practical Safety Instrumentation & Emergency Shutdown Systems for Process Ind...Practical Safety Instrumentation & Emergency Shutdown Systems for Process Ind...
Practical Safety Instrumentation & Emergency Shutdown Systems for Process Ind...
Living Online
 
A study of anti virus' response to unknown threats
A study of anti virus' response to unknown threatsA study of anti virus' response to unknown threats
A study of anti virus' response to unknown threats
UltraUploader
 

La actualidad más candente (20)

Safety and security in distributed systems
Safety and security in distributed systemsSafety and security in distributed systems
Safety and security in distributed systems
 
incident analysis - procedure and approach
incident analysis - procedure and approachincident analysis - procedure and approach
incident analysis - procedure and approach
 
Havex Deep Dive (English)
Havex Deep Dive (English)Havex Deep Dive (English)
Havex Deep Dive (English)
 
Internet Accessible ICS in Japan (English)
Internet Accessible ICS in Japan (English)Internet Accessible ICS in Japan (English)
Internet Accessible ICS in Japan (English)
 
Steer and/or sink the supertanker by Andrew Rendell
Steer and/or sink the supertanker by Andrew RendellSteer and/or sink the supertanker by Andrew Rendell
Steer and/or sink the supertanker by Andrew Rendell
 
A Computer Vision Application for In Vitro Diagnostics Devices
A Computer Vision Application for In Vitro Diagnostics DevicesA Computer Vision Application for In Vitro Diagnostics Devices
A Computer Vision Application for In Vitro Diagnostics Devices
 
Sis training course_1
Sis training course_1Sis training course_1
Sis training course_1
 
Real Time Systems
Real Time SystemsReal Time Systems
Real Time Systems
 
Practical Safety Instrumentation & Emergency Shutdown Systems for Process Ind...
Practical Safety Instrumentation & Emergency Shutdown Systems for Process Ind...Practical Safety Instrumentation & Emergency Shutdown Systems for Process Ind...
Practical Safety Instrumentation & Emergency Shutdown Systems for Process Ind...
 
The Great Disconnect of Data Protection: Perception, Reality and Best Practices
The Great Disconnect of Data Protection: Perception, Reality and Best PracticesThe Great Disconnect of Data Protection: Perception, Reality and Best Practices
The Great Disconnect of Data Protection: Perception, Reality and Best Practices
 
Assignment 1
Assignment 1Assignment 1
Assignment 1
 
Safety system
Safety systemSafety system
Safety system
 
Cost-effective software reliability through autonomic tuning of system resources
Cost-effective software reliability through autonomic tuning of system resourcesCost-effective software reliability through autonomic tuning of system resources
Cost-effective software reliability through autonomic tuning of system resources
 
LAYER OF PROTECTION ANALYSIS
LAYER OF PROTECTION ANALYSISLAYER OF PROTECTION ANALYSIS
LAYER OF PROTECTION ANALYSIS
 
GrrCon 2014: Security On the Cheap
GrrCon 2014: Security On the CheapGrrCon 2014: Security On the Cheap
GrrCon 2014: Security On the Cheap
 
A study of anti virus' response to unknown threats
A study of anti virus' response to unknown threatsA study of anti virus' response to unknown threats
A study of anti virus' response to unknown threats
 
Software Testing Basics
Software Testing BasicsSoftware Testing Basics
Software Testing Basics
 
Introduction to Functional Safety and SIL Certification
Introduction to Functional Safety and SIL CertificationIntroduction to Functional Safety and SIL Certification
Introduction to Functional Safety and SIL Certification
 
Verification and Validation of Robotic Assistants
Verification and Validation of Robotic AssistantsVerification and Validation of Robotic Assistants
Verification and Validation of Robotic Assistants
 
Fault tolerance techniques
Fault tolerance techniquesFault tolerance techniques
Fault tolerance techniques
 

Destacado

Destacado (7)

2011-03-12 - PDAtotaal Usergroup meeting - Ervaringen met Windows Phone 7 in ...
2011-03-12 - PDAtotaal Usergroup meeting - Ervaringen met Windows Phone 7 in ...2011-03-12 - PDAtotaal Usergroup meeting - Ervaringen met Windows Phone 7 in ...
2011-03-12 - PDAtotaal Usergroup meeting - Ervaringen met Windows Phone 7 in ...
 
2016 11-15 - nvrb - software betrouwbaarheid
2016 11-15 - nvrb - software betrouwbaarheid2016 11-15 - nvrb - software betrouwbaarheid
2016 11-15 - nvrb - software betrouwbaarheid
 
2016 02-15 - IASTED Innsbruck 2016 - the role and decompesition of delivery ...
2016 02-15 -  IASTED Innsbruck 2016 - the role and decompesition of delivery ...2016 02-15 -  IASTED Innsbruck 2016 - the role and decompesition of delivery ...
2016 02-15 - IASTED Innsbruck 2016 - the role and decompesition of delivery ...
 
What the hack happened to digi notar (28-10-2011)
What the hack happened to digi notar (28-10-2011)What the hack happened to digi notar (28-10-2011)
What the hack happened to digi notar (28-10-2011)
 
2010-09-21 - (ISC)2 - Protecting patient privacy while enabling medical re…
2010-09-21 - (ISC)2 - Protecting patient privacy while enabling medical re…2010-09-21 - (ISC)2 - Protecting patient privacy while enabling medical re…
2010-09-21 - (ISC)2 - Protecting patient privacy while enabling medical re…
 
2011-04-29 - Risk management conference - Technische IT risico's in de praktijk
2011-04-29 - Risk management conference - Technische IT risico's in de praktijk2011-04-29 - Risk management conference - Technische IT risico's in de praktijk
2011-04-29 - Risk management conference - Technische IT risico's in de praktijk
 
Windows Phone 7 and the cloud, the good, the bad and the ugly (17-06-2011, SDN)
Windows Phone 7 and the cloud, the good, the bad and the ugly (17-06-2011, SDN)Windows Phone 7 and the cloud, the good, the bad and the ugly (17-06-2011, SDN)
Windows Phone 7 and the cloud, the good, the bad and the ugly (17-06-2011, SDN)
 

Similar a Testing Safety Critical Systems (10-02-2014, VU amsterdam)

Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Alex Pinto
 
BlackHat_2015_Slides_Krotofil_FINAL
BlackHat_2015_Slides_Krotofil_FINALBlackHat_2015_Slides_Krotofil_FINAL
BlackHat_2015_Slides_Krotofil_FINAL
Marina Krotofil
 

Similar a Testing Safety Critical Systems (10-02-2014, VU amsterdam) (20)

Testing safety critical systems: Practice and Theory (14-05-2013, VU Amsterdam)
Testing safety critical systems: Practice and Theory (14-05-2013, VU Amsterdam)Testing safety critical systems: Practice and Theory (14-05-2013, VU Amsterdam)
Testing safety critical systems: Practice and Theory (14-05-2013, VU Amsterdam)
 
Safety and security in distributed systems
Safety and security in distributed systems Safety and security in distributed systems
Safety and security in distributed systems
 
Risk management and business protection with Coding Standardization & Static ...
Risk management and business protection with Coding Standardization & Static ...Risk management and business protection with Coding Standardization & Static ...
Risk management and business protection with Coding Standardization & Static ...
 
Functional safety by FMEA/FTA
Functional safety by FMEA/FTAFunctional safety by FMEA/FTA
Functional safety by FMEA/FTA
 
Siegel - keynote presentation, 18 may 2013
Siegel  - keynote presentation, 18 may 2013Siegel  - keynote presentation, 18 may 2013
Siegel - keynote presentation, 18 may 2013
 
Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsChaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient Systems
 
2008-10-09 - Bits and Chips Conference - Embedded Systemen Architecture patterns
2008-10-09 - Bits and Chips Conference - Embedded Systemen Architecture patterns2008-10-09 - Bits and Chips Conference - Embedded Systemen Architecture patterns
2008-10-09 - Bits and Chips Conference - Embedded Systemen Architecture patterns
 
Design For Testability
Design For TestabilityDesign For Testability
Design For Testability
 
Fault detection consequence
Fault detection consequenceFault detection consequence
Fault detection consequence
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
 
Chaos engineering
Chaos engineering Chaos engineering
Chaos engineering
 
Testing Is How You Avoid Looking Stupid
Testing Is How You Avoid Looking StupidTesting Is How You Avoid Looking Stupid
Testing Is How You Avoid Looking Stupid
 
Getting Your System to Production and Keeping it There
Getting Your System to Production and Keeping it ThereGetting Your System to Production and Keeping it There
Getting Your System to Production and Keeping it There
 
FTA.pptx
FTA.pptxFTA.pptx
FTA.pptx
 
Arizona State University Test Lecture
Arizona State University Test LectureArizona State University Test Lecture
Arizona State University Test Lecture
 
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
 
BlackHat_2015_Slides_Krotofil_FINAL
BlackHat_2015_Slides_Krotofil_FINALBlackHat_2015_Slides_Krotofil_FINAL
BlackHat_2015_Slides_Krotofil_FINAL
 
SCADA Security: The Five Stages of Cyber Grief
SCADA Security: The Five Stages of Cyber GriefSCADA Security: The Five Stages of Cyber Grief
SCADA Security: The Five Stages of Cyber Grief
 
Intro720T5.pptx
Intro720T5.pptxIntro720T5.pptx
Intro720T5.pptx
 
Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeries
 

Más de Jaap van Ekris

Más de Jaap van Ekris (14)

2021 08-28, QONFEST 2021 - Reliability cenetered maintenance for sleeping giants
2021 08-28, QONFEST 2021 - Reliability cenetered maintenance for sleeping giants2021 08-28, QONFEST 2021 - Reliability cenetered maintenance for sleeping giants
2021 08-28, QONFEST 2021 - Reliability cenetered maintenance for sleeping giants
 
2020 09-08 - sdn - waarom klanten een hekel aan software ontwikkelaars hebben
2020 09-08 - sdn - waarom klanten een hekel aan software ontwikkelaars hebben2020 09-08 - sdn - waarom klanten een hekel aan software ontwikkelaars hebben
2020 09-08 - sdn - waarom klanten een hekel aan software ontwikkelaars hebben
 
2018-11-08 risk and reslience festival
2018-11-08 risk and reslience festival2018-11-08 risk and reslience festival
2018-11-08 risk and reslience festival
 
2015 10-08 Uitwijken, het hoe, waarom en de consequenties
2015 10-08 Uitwijken, het hoe, waarom en de consequenties2015 10-08 Uitwijken, het hoe, waarom en de consequenties
2015 10-08 Uitwijken, het hoe, waarom en de consequenties
 
TOPAAS Versie 2.0, een praktische inleiding
TOPAAS Versie 2.0, een praktische inleidingTOPAAS Versie 2.0, een praktische inleiding
TOPAAS Versie 2.0, een praktische inleiding
 
Cloud Security (11-09-2012, (ISC)2 Secure Amsterdam)
Cloud Security (11-09-2012, (ISC)2 Secure Amsterdam)Cloud Security (11-09-2012, (ISC)2 Secure Amsterdam)
Cloud Security (11-09-2012, (ISC)2 Secure Amsterdam)
 
2010-04-17 - PDAtotaal Usergroup meeting - Introductie in Windows Phone 7
2010-04-17 - PDAtotaal Usergroup meeting - Introductie in Windows Phone 72010-04-17 - PDAtotaal Usergroup meeting - Introductie in Windows Phone 7
2010-04-17 - PDAtotaal Usergroup meeting - Introductie in Windows Phone 7
 
2009-07-09 - DNV - Risico en betrouwbaarheid van ICT systemen
2009-07-09 - DNV - Risico en betrouwbaarheid van ICT systemen2009-07-09 - DNV - Risico en betrouwbaarheid van ICT systemen
2009-07-09 - DNV - Risico en betrouwbaarheid van ICT systemen
 
2009-02-18 - IASTED Innsbruck 2009 - Factors in project management influencin...
2009-02-18 - IASTED Innsbruck 2009 - Factors in project management influencin...2009-02-18 - IASTED Innsbruck 2009 - Factors in project management influencin...
2009-02-18 - IASTED Innsbruck 2009 - Factors in project management influencin...
 
2009-02-12 - VU Amsterdam - Customer Satisfaction and dealing with customers ...
2009-02-12 - VU Amsterdam - Customer Satisfaction and dealing with customers ...2009-02-12 - VU Amsterdam - Customer Satisfaction and dealing with customers ...
2009-02-12 - VU Amsterdam - Customer Satisfaction and dealing with customers ...
 
2008-07-15 - (ISC)2 - Mobile Phone Security, you have to let go in order t…
2008-07-15 - (ISC)2 - Mobile Phone Security, you have to let go in order t…2008-07-15 - (ISC)2 - Mobile Phone Security, you have to let go in order t…
2008-07-15 - (ISC)2 - Mobile Phone Security, you have to let go in order t…
 
2008-06-23 - SDN - Kwaliteit van software, wat is dat nu eigenlijk?
2008-06-23 - SDN - Kwaliteit van software, wat is dat nu eigenlijk?2008-06-23 - SDN - Kwaliteit van software, wat is dat nu eigenlijk?
2008-06-23 - SDN - Kwaliteit van software, wat is dat nu eigenlijk?
 
2008-02-14 - IASTED Innsbruck 2008 - Customer Retention and Delivery Quality ...
2008-02-14 - IASTED Innsbruck 2008 - Customer Retention and Delivery Quality ...2008-02-14 - IASTED Innsbruck 2008 - Customer Retention and Delivery Quality ...
2008-02-14 - IASTED Innsbruck 2008 - Customer Retention and Delivery Quality ...
 
2008-02-07 - VU Amsterdam - Customer Satisfaction and dealing with customers ...
2008-02-07 - VU Amsterdam - Customer Satisfaction and dealing with customers ...2008-02-07 - VU Amsterdam - Customer Satisfaction and dealing with customers ...
2008-02-07 - VU Amsterdam - Customer Satisfaction and dealing with customers ...
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Testing Safety Critical Systems (10-02-2014, VU amsterdam)

  • 1. Testing Safety Critical Systems Theory and Experiences J.vanEkris@Delta-Pi.nl http://www.slideshare.net/Jaap_van_Ekris/
  • 2. My Job Your life’s goal will be to stay out of the newspapers Gerard Duin (KEMA) Worked at
  • 4. Agenda • • • • The Goal The requirements The challenge Go with the process flow – Development Process – System design – Testing Techniques • Trends • Reality 4
  • 5. Goals of testing safety critical systems • Verify contractually agreed functionality • Verify correct functional safety-behaviour • Verify safety-behaviour during degraded and failure conditions
  • 6. THE REQUIREMENTS What is so different about safety critical systems?
  • 7. Some people live on the edge… How would you feel if you were getting ready to launch and knew you were sitting on top of two million parts -- all built by the lowest bidder on a government contract. John Glenn
  • 9. We might have become overprotective…
  • 10. The public is mostly unaware of risk…
  • 11. Until it is too late… • February 1st 1953 • Spring tide and heavy winds broke dykes • Killed 1836 humans and 30.000 animals
  • 12. The battle against flood risk… • Cost €2.500.000.000 • The largest moving structure on the planet • Defends – 500 km2 land – 80.000 people • Partially controlled by software
  • 13. Nothing is flawless, by design… No matter how well the design has been: • Some scenarios will be missed • Some scenarios are too expensive to prevent: – Accept risk – Communicate to stakeholders
  • 14. When is software good enough? • Dutch Law on storm surge barriers • Equalizes risk of dying due to unnatural causes across the Netherlands
  • 15. Risks have to be balanced… VS. Availability of the service Safety of the service
  • 16. Oosterschelde Storm Surge Barrier • Chance of – Failure to close: 10-7 per usage – Unexpected closure: 10-4 per year
  • 17. To put things in perspective… • • • • • • • • • Having a drunk pilot: 10-2 per flight Hurt yourself when using a chainsaw: 10-3 per use Dating a supermodel: 10-5 in a lifetime Drowning in a bathtub: 10-7 in a lifetime Being hit by falling airplane parts: 10-8 in a lifetime Being killed by lighting: 10-9 per lifetime Winning the lottery: 10-10 per lifetime Your house being hit by a meteor: 10-15 per lifetime Winning the lottery twice: 10-20 per lifetime
  • 18. Small chances do happen…
  • 19. Risk balance does change over time...
  • 20. 9/11... • Identified a fundamental (new) risk to ATC systems • Changed the ATC system dramatically • Doubled our safety critical scenario’s
  • 21. Are software risks acceptable?
  • 22. Software plays a significant role...
  • 23. The industry statistics are against us… • Capers-Jones: at least 2 high severity errors per 10KLoc • Industry concensus is that software will never be more reliable than – 10-5 per usage – 10-9 per operating hour
  • 24. THE CHALLENGE Why is testing safety critical systems so hard?
  • 25. The value of testing Program testing can be used to show the presence of bugs, but never to show their absence! Edsger W. Dijkstra
  • 26. Is just testing enough? • 64 bits input isn’t that uncommon • 264 is the global rice production in 1000 years, measured in individual grains • Fully testing all binary inputs on a simple 64-bits stimilus response system once takes 2 centuries
  • 27. THE SOFTWARE DEVELOPMENT PROCESS Quality and reliability start at conception, not at testing…
  • 28. IEC 61508: Safety Integrity Level and acceptable risk
  • 30. IEC 61508: A process for safety critical functions
  • 31. SYSTEM DESIGN What do safety critical systems look like and what are their most important drivers?
  • 32. Design Principles • • • • • Keep it simple... Risk analysis drives design (decissions) Safety first (production later) Fail-to-safe There shall be no single source of (catastrophic) failure
  • 33. A simple design of a storm surge barrier Relais (€10,00/piece) Waterdetector (€17,50) Design documentation (Sponsored by Heineken)
  • 34. Risk analysis Broken cable Chance: Medium Cause: digging, seaguls Effect: Catastophic Relais failure Chance: small Cause: aging Effect: catastophic Waterdetector fails Change: Huge Oorzaken: Rust, driftwood, seaguls (eating, shitting) Effect: Catastophic Measurement errors Chance: Collossal Causes: Waves, wind Effect: False Positive
  • 37. Typical risks identified • • • • • • 37 Components making the wrong decissions Power failure Hardware failure of PLC’s/Servers Network failure Ship hitting water sensors Human maintenance error
  • 38. Risk ≠ system crash • Understandability of the GUI • Wrongful functional behaviour • Data accuracy • Lack of response speed • Tolerance towards unlogical inputs • Resistance to hackers
  • 39. Usability of a MMI is key to safety
  • 41. Systems can be late…
  • 42. Systems aren’t your only problem
  • 44. Stuurx::Functionality, initial global design Init Waterlevel < 3 meter Waterlevel Wacht Waterlevel> 3 meter Start_D W_O_D Sluit_? “Start” signal to Diesels “Diesels ready” “Close Barrier”
  • 47. VERIFICATION What is getting tested, and how?
  • 49. An example of safety critical components
  • 50. IEC 61508 SIL4: Required verification activities
  • 51. Design Validation and Verification • Peer reviews by – – – – System architect 2nd designer Programmers Testmanager system testing • Fault Tree Analysis / Failure Mode and Effect Analysis • Performance modeling • Static Verification/ Dynamic Simulation by (Twente University)
  • 52. Programming (in C/C++) • Coding standard: – Based on “Safer C”, by Les Hutton – May only use safe subset of the compiler – Verified by Lint and 5 other tools • Code is peer reviewed by 2nd developer • Certified and calibrated compiler
  • 53. Unit tests • Focus on conformance to specifications • Required coverage: 100% with respect to: – Code paths – Input equivalence classes • Boundary Value analysis • Probabilistic testing • Execution: – Fully automated scripts, running 24x7 – Creates 100Mb/hour of logs and measurement data • Upon bug detection – 3 strikes is out  After 3 implementation errors it is build by another developer – 2 strikes is out  Need for a 2nd rebuild implies a redesign by another designer
  • 55. Integration testing • Focus on – Functional behaviour of chain of components – Failure scenarios based on risk analysis • Required coverage – 100% coverage on input classes • Probabilistic testing • Execution: – Fully automated scripts, running 24x7, speed times 10 – Creates 250Mb/hour of logs and measurement data • Upon detection – Each bug  Rootcause-analysis
  • 56. Redundancy is a nasty beast • You do get functional behaviour of your entire system • It is nearly impossible to see if all components are working correctly • Is EVERYTHING working ok, or is it the safetynet? 56
  • 57. System testing • Focus on – Functional behaviour – Failure scenarios based on risk analysis • Required coverage – 100% complete environment (simultation) – 100% coverage on input classes • Execution: – Fully automated scripts, running 24x7, speed times 10 – Creates 250Mb/hour of logs and measurement data • Upon detection – Each bug  Rootcause-analysis
  • 58. Endurance testing • Look for the “one in a million times” problem • Challenge: – Software is deterministic – execution is not (timing, transmission-errors, system load) • Have an automated script run it over and over again
  • 59. Results of Endurance Tests Reliability Growth of Function M, Project S Chance of Failure (Logarithmic Scale) 1.E+00 1.E-01 1.E-02 1.E-03 1.E-04 1.E-05 4.35 4.36 Platform Version 4.37
  • 60. Acceptance testing • Acceptance testing 1. Functional acceptance 2. Failure behaviour, all top 50 (FMECA) risks tested 3. A year of operational verification • Execution: – Tests performed on a working stormsurge barrier – Creates 250Mb/hour of logs and measurement data • Upon detection – Each bug  Root cause-analysis
  • 61. A risk limit to testing • Some things are too dangerous to test • Some tests introduce more risks than they try to mitigate • There should always be a safe way out of a test procedure
  • 62. Testing safety critical functions is dangerous...
  • 63. GUI Acceptance testing • Looking for – quality in use for interactive systems – Understandability of the GUI • Structural investigation of the performance of the man-machine interactions • Looking for “abuse” by the users • Looking at real-life handling of emergency operations
  • 64. Avalanche testing • To test the capabilies of alarming and control • Usually starts with one simple trigger • Generally followed by millions of alarms • Generally brings your network and systems to the breaking point
  • 65. Crash and recovery procedure testing • Validation of system behaviour after massive crash and restart • Usually identifies many issues about emergency procedures • Sometimes identifies issues around power supply • Usually identifies some (combination of) systems incapable of unattended recovery...
  • 66. Production has its challenges… • Are equipment and processes optimally arranged? • Are the humans up to their task? • Does everything perform as expected?
  • 67. TRENDS What is the newest and hottest?
  • 70. A root-cause analysis of this flaw
  • 71. REALITY What are the real-life challenges of a testmanager of safety critical systems?
  • 74. Requires true commitment to results… • Romans put the architect under the arches when removing the scaffolding • Boeing and Airbus put all lead-engineers on the first test-flight • Dijkstra put his “rekenmeisjes” on the opposite dock when launching ships
  • 75. It is about keeping your back straight… • Thomas Andrews, Jr. • Naval architect in charge of RMS Titanic • He recognized regulations were insufficient for ship the size of Titanic • Decisions “forced upon him” by the client: – Limit the range of double hulls – Limit the number of lifeboats • He was on the maiden voyage to spot improvements • He knowingly went down with the ship, saving as many as he could
  • 76. It requires a specific breed of people The faiths of developers and testers are linked to safety critical systems into eternity
  • 77. Conclusion • Stop reading newspapers • Safety Critical Testing is a lot of work, making sure nothing happens • Technically it isn’t that much different, we’re just more rigerous and use a specific breed of people....

Notas del editor

  1. I spend the last 15 years working on highly mission and safety critical systems
  2. Voordeel van Glen was dat het maar 1 keer hoefde te werken......En dat waren de 60er jaren (toen kon dat nog), en astronauten hadden nog lefBron: http://www.historicwings.com/features98/mercury/seven-left-bottom.html
  3. Surfers sunbathing in front of one of the most dangerousnuclear power plants of the world: San Onofre
  4. It is carved in stone: nobody asks if we can make the storm surge barriers less safe when the crime rate in Amsterdam goes up…
  5. We have toexplicitlybalance the availability of a service anditssafety.Otherwise we wouldjustpermanently close the barrierandbe happy aboutit!
  6. Pleasenotethat:The first bullet is the reasonwhe have auto-pilots (copilots are gooddrinkingbuddies)Thatpeoplestill are anxioustofly, but don’t get up in the morning thinking they are goingto score a date with a supermodel
  7. Wonin 2012 de jackpot van 10 miljoenIn 2013 de jackpot van 3 miljoenKansen:1 maalwinnen: 1 op 14 miljoen2 maalwinnen: 1 op 195 biljoen
  8. Maar we leven (onwetend) nog steeds in die wereld.....
  9. Voordeel van Glen was dat het maar 1 keer hoefde te werken......Bron: http://www.historicwings.com/features98/mercury/seven-left-bottom.html
  10. Je begint met je primary concernProces is simpel: je hakt je probleemzover op todat je die 2 miljoenonderdelenhebt, en je weetwat de bijdrage is van elke componentJe pakt de belangrijkste 10, of 100 en neemtgerichtmaatregelen
  11. Three Mile Island nuclear disaster28th April 1979Lack of understanding situation let to loss of controll of the #2 reactor, led to partial meltdownSome say it killed approx. 330 people
  12. Tickles security: hard van buiten, boterzacht van binnen
  13. There is a bug in this one: this code is NOT fail-safe because it has a potential catastrophic deadlock (when the Diesels don’t report Ready).....
  14. Please be reminded: the presented code has a deadlock!
  15. Do you know the difference between validation and verification?Validation = meets external expectations, does what it is supposed to doVerification = meets internal expectations, conforming to specs
  16. Note: this is NOT related to the StormSurge Barrier!
  17. You look forsafetycriticalsituations, sometimes without safetynet.Chernobyl was a test of a pump, without safetynet
  18. Three Miles Island is a good example
  19. Most beautifull example: UPSes using too much power to charge, killing all fuses....Current example: found out that identity management server was a single point of failure....Eurocontrol example: control unit wasn’t ready for the CWPs, and after that got overloaded
  20. T5 problems: People couldn’t find their way in the installation…
  21. This is functional nonsense: DirMsgResponse is sent to the output, whatever what.
  22. Reality isn’t nice and clean. Reality is messy, chaotic, stressed and nobody completely understands it….Reality is much more fun 
  23. The Agile movement is right: People beforeprocesses! A processcan’tcompensateforidiotsinside it.Aquaduct of Segovia, peninsula of Iberia. Buildby the Romans in 50AC to 125AC. Architectsoverengineeredtheir equipment thatheavily, it is still standing 2000 years laterPeople do have torealizethat commitment of peopleto get it right the first time is essential.At Eurocontrol, we mentioned a projecteddeathtoll on every bug
  24. Our successes are unknown, our failures make the headlines….When a system fails in production, it is actual blood on our hands. At eurocontrol, each bug had a bodycount attachted to it.....