Introduction to Software Testing

Università degli Studi dell’Aquila

L19: Introduction to Software Testing
Henry Muccini
DISIM, University of L’Aquila
www.henrymuccini.com, henry.muccini@univaq.it

The material in these slides may be freely reproduced and
distributed, partially or totally, as far as an explicit
reference or acknowledge to the material author is
preserved.

With very special thanks to Antonia Bertolino and Debra J.
Richardson which collaborated in previous versions of
these lecture notes

AGENDA

Software Failures: examples
Verification and Validation

Software Testing: Intro
Software Testing: Basics

Testing Process
Type of Testing

The Skype
pervasive
failures:
•Black out for two hours
•Europe, Japan, Australia,
Afghanistan, Sud Africa,
Malesia and Brasil have
been affected

Therac-25 safety failure: Factors:
•approximately 100 times the intended dose of •Overconfidence in
radiation Software
•3 people died, and 6 got injured •Confusing reliability
with safety
•Lack of defensive Design
•Failure to eliminate
fault causes
•Inadequate software
engineering practices
•…

see article at: http://sunnyday.mit.edu/papers/therac.pdf

Ash Cloud–related
stress software
failures

Trains in the Netherlands (march 22, 2012)
Tens of thousands of people around the large cities weren’t able to travel by train
Thursday morning. No trains from and to Amsterdam and Airport Schiphol from
early morning until after the morning rush hour. A failure in the back-up system
was the cause. ProRail said that there was a fault in the ‘switch software’. The
system therefore didn’t start. And then the signals and switches could not be
operated.
Checking some other articles, it simply tries to say that both primary and backup
failed, hence no operations.

Links:
http://www.elsevier.nl/web/Nieuws/Nederland/334086/Oorzaak-van-treinstoring-
blijkt-fout-in-software.htm
http://www.rnw.nl/english/bulletin/trains-amsterdam-running-again

On impact on people:
http://www.dutchnews.nl/news/archives/2012/03/signalling_problems_cause_rail.p
hp

The Poste Italiane business failure:

Il sistema del MIUR per le prove di maturità:
“La maturità 2.0 parte con un flop. Il sistema «commissione web», la novità dell’esame di Stato
2012, non ha funzionato. Il software, messo a punto per consentire alle commissioni di
comunicare in tempo reale col cervellone centrale del Miur tutte le attività connesse con gli
esami, è andato in tilt ancora prima di partire. Nelle scuole di Firenze le commissioni non sono
riuscite ad inserire online i verbali delle riunioni di insediamento che si sono tenute questa
mattina. ”
http://corrierefiorentino.corriere.it/firenze/notizie/cronaca/2012/18-giugno-2012/maturita-20-
partenza-flop-201657781657.shtml

Prenotazioni Trenitalia:
“Il nuovo sistema di Ferrovie dello Stato è un disastro: c'è chi non riesce più a usare il proprio
codice ma non può cancellarsi perché per farlo occorre usare il codice.
Dalle 1 alle 3 di notte non funziona, perché fanno la manutenzione, ma mica te lo dicono …”
http://righedidiomira.blogspot.it/2012/01/sempre-trenitalia-sempre-piu-disservizi.html

Fineco, pagamento IMU F24:
Con la detrazione prima casa, il mio imponibile va sotto zero e il sistema va in tilt.

[Fatal Defect, Ivars Peterson, 1995]

Half book is about failures in
software development

http://www.wired.com/software/coolap
ps/news/2005/11/69355?currentPage=all

http://www.devtopics.com/20-famous-
software-disasters/

NIST (National Institute of Standards and
Technology) study in 2002 [NIST],
→software errors cost the U.S. economy $59.5 billion
every year.
Standish Chaos report [Standish]
→a clear statement of requirements is one of the three
main reasons that lead to project success, as well as
incomplete requirements are one of the main reason of
projects deletion.
[NIST] The economic impacts of inadequate infrastructure for software testing. In
NIST Planning Report 02-3. 2002. http://www.nist.gov/public
affairs/releases/n02-10.htm.
[Standish] The standish group report: Chaos. 1995.
http://www.projectsmart.co.uk/docs/chaos-report.pdf.

Validation:
does the software system meets the user's real needs?
are we building the right software?
(valid with respect to users’ needs)
Verification:
does the software system meets the requirements
specifications?
are we building the software right?
(valid with respect to the system specification)

Software Inspection (static analysis technique)
Debugging (to locate and fix bugs)
Theorem proving
Model checking (to prove a property
correctness)
Software Testing

(None is the absolute perfect solution)

Completeness & Correctness
• Correctness properties are undecidable
• False positive and False negative
Timeliness
• The V&V process stops (most of the time) where there is
no more time
• Time is one of the stopping rules
Cost-effectiveness
• “Select the less that gives you the most”
• V&V is justified especially when failures are expensive

An all-inclusive definition IMP

Software testing consists of:

the dynamic verification of the behavior of a program

on a finite set of test cases

suitably selected from the (in practice infinite) input
domain

against the specified expected behavior

[A. Bertolino]

is not IMP
(citation from Hamlet, 1994):

I've searched hard for defects in this program, found a
lot of them, and repaired them. I can't find any more,
so I'm confident there aren't any.

Testing is NOT exhaustive
⇒ What to test?
⇒ When to stop?

Testing is NOT cheap
⇒ test less and best!!
⇒ When to stop?

(1) Testing Process:

Glossary
Systematic vs. Ad Hoc
Test Selection (Category partition)
Test Execution
Oracle

vs. Ad Hoc
Regression Testing

(2) Type of Testing:
Black Box and White Box
Unit, Integration, System
White Box

IMP
Testing involves several demanding tasks:
→Test selection
─ how to identify a suitable finite set of test cases
→Test execution
─ how to translate test cases into executable runs

→Test oracle
─ Deciding wheter the test outcome is acceptable or not
─ If not, evaluating the impact of the failure and its direct cause (the
fault)
→Testing adequacy
─ Judging wheter the test campaign is sufficient
→Test coverage

Test selection consists in the identification of a
“suitable” and finite set of test cases.

The test selection activity provides guidelines on how to
select test cases. It is driven by a ‘‘test criterion’’ and has
to produce ‘‘suitable’’ test cases

Test Criterion:
A test criterion provides the guidelines, rules, and strategy by which
test cases are selected.
In general, a test criterion is a means of deciding which shall be a
‘‘good’’ set of test cases (Reference 117 of [Muccini08]) .

Suitability:
A test case is suitable if it contributes to discovering
as many failures as possible, according to a test
criterion.
[Muccini08] Henry Muccini, Software Testing: Testing New Software Paradigms and
New Artifacts, in: Wiley Encyclopedia of Computer Science and Engineering, John Wiley &
Sons, Inc., 2008

Test Case:
A test case is a set of inputs, execution conditions, and a
pass/fail criterion (Ref. 116 of [Muccini08]) .
A test case thus includes not only input data but also any
relevant execution conditions and procedures, and includes a
way of determining whether the program has passed or
failed the test on a particular execution (Ref. 8 of
[Muccini08]).

Test Suite:
A test suite is a collection of test cases.

The EasyLine system is
composed of three sub-
systems:
The SP system
The Mobile app
The server-side
application

How to select test cases? (test selection technique)
How many test cases? (when to stop --stopping rule?)

Which artefacts to use for selecting test cases?
(code, spec?)

Ad hoc or Systematic testing?

Tester’s intuition and expertise
• “Ad hoc testing” (sometime quite effective)
• Special cases
Fault-based
Specifications • Error guessing/special cases
• Mutation
• Equivalence partitioning
• Boundary-value analysis Usage
• Decision table • SRET
• Field testing
• Automated derivation from formal specs (conformance t.)
• .... Nature of application, e.g.:
Code • Object Oriented
• Web
• Control-flow based • GUI
• Data-flow based • Real-time, embedded
• Scientific
• .....

No one is the best technique, but a combination of different criteria has empirically
shown to be the most effective approach

(Test) Inputs

Code-based: (code graphs) Source
→Structural/White Box Testing Code
─ Test cases selected based on structure of code
─ Views program /component as white box
(also called glass box testing) Output
Internal behavior
(Test) Inputs

Specification-based: (Input-output)
→Functional/Black Box Testing Bynary
─ Test cases selected based on specification Code or
─ Views program/component as black box Spec

Output

We focus on “systematic” testing:
→Repeatable

→Measurable IMP
─ best tester
─ coverage
→Based on sampling:
─ Infinite input domain, but finite set of test
cases

Two are the main sub-activities to be performed:
→B1) identify those “inputs" which force the execution of the
selected test case,
→B2) put the system in a state from which the specified test
can be launched.
B1 -- Forcing the execution of the test cases derived
according to one criterion might not be obvious
→In code-based testing, we have entry-exit paths over the
graph model and test inputs that execute the corresponding
program paths need be found

B2 -- put the system in a state from which the specified
test can be launched
→Also called, Test Pre-condition
→In Synchronous Systems:
─ Several runs in sequence are required to put the system in the test
pre-condition
→In Concurrent Systems:
─ Non Determinism problem
Replay problem

The EasyLine system is
composed of :
Web services
Sensors
Mobile applications
routing algorithms
…

A test oracle is a mechanism for verifying the behavior of
test execution
→ extremely costly and error prone to verify
→ oracle design is a critical part of test planning
Sources of oracles
→ input/outcome oracle
→ tester decision
→ regression test suites
→ standardized test suites and oracles
→ gold or existing program
→ formal specification
BACK

The Expected Output is
f*(d)

YES, given input d
f(d) = f*(d)

» In some cases easier (e.g., an existing version,
existing formal specification), but generally very
difficult (e.g., operational testing)

» Not enough emphasized research problem

Theoretical notions of test adequacy are usually
defined in terms of adequacy criteria
→ Coverage metrics (sufficient percentage of the program
structure has been exercised)
→ Empirical assurance (failures/test curve flatten out)
→ Error seeding (percentage of seeded faults found is
proportional to the percentage of real faults found)
→ Independent testing (faults found in common are
representative of total population of faults)
Adequacy criteria are evaluated with respect to a test
suite and a program under test

BACK

Black box vs White box [in next lectures]
Unit, Integration, System
Performance, Stress
Regression Testing [in next lectures]
…

Unit:
→The Unit test purpose is to ensure that the unit satisfies its functional
specification and/or that its implemented structure matches the intended
design structure
→Unit tests can also be applied for test interface or local data structure.

Integration:
→Integration testing is specifically aimed at exposing the problems that
arise from the combination of components
→Communicating interfaces among integrated components need to be tested
→Type: big-bang or incremental (top-down, bottom-up, mixed)
System:
→Itattempts to reveal bugs which depends on the environment
→Recovery testing, security testing, stress testing and performance testing

levels of abstraction
User Acceptance
Requirements Testing
plan &
validate
Software /verify
Requirements System
Specification Testing

Architecture
Design Integration
Specification Testing

Component Component integrate
design &
Specifications Testing & test
analyze

Unit Unit
Implementations Testing

time

Stress testing: designed to test the software with abnormal
situations.
→Stress testing attempts to find the limits at which the system will fail
through abnormal quantity or frequency of inputs.
→The test is expected to succeed when the system is stressed with higher
rates of inputs, maximum use of memory or system resources.

Performance testing is usually applied to real-time, embedded
systems in which low performances may have serious impact on
the normal execution.
→Performance testing checks the run-time performance of the system and
may be coupled with stress testing.
→Performance is not strictly related to functional requirements: functional
tests may fail, while performance ones may succeed.

Introduction to Software Testing

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (7)

Similar a Introduction to Software Testing

Similar a Introduction to Software Testing (20)

Más de Henry Muccini

Más de Henry Muccini (20)

Último

Último (20)

Introduction to Software Testing