Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to Increase Reliability

Universiteit Antwerpen
Finding Bugs, Fixing Bugs, Preventing
Bugs — 
Exploiting Automated Tests to
Increase Reliability
Keynote SHIFT — IWSF 2020
The 4th International Workshop on Software Faults (IWSF 2020)
The 2nd Annual International Workshop on Software Hardware Interaction Faults
October 2020 — © Prof. Serge Demeyer

Books
• 2 books
• 3 proceedings (editor) Best Teacher’s Award
Top Publications
Spin Off &
Start Up

Finding Bugs, Fixing Bugs, Preventing Bugs
—
Exploiting Automated Tests to Increase
Reliability
As part of this edition, we will continue to put a
special emphasis on how emerging technologies
such as those based on artiﬁcial intelligence can
be used to detect faults and predict crashes and
system failures.
Artiﬁcial
Intelligence
Inside
Topics of Interest
[…]
especially in the context of Cyber Physical Systems
[…]
Fault diagnosis, analysis, detection and prediction.
Especially at the borders of hardware and software.

Ericsson, Bombardier,
Saab, System Verification,
Empear, Verifyter, KTH,
MDH, RISE Comiq, EfiCode,
Ponsse, Siili,
Qentinel, Symbio,
Uni.Oulu, VTT
Axini, Testwerk,
TNO, Open Uni. AKKA, Expleo,
EKS, FFT,
Fraunhofer,
IFAK, OFFIS,
Parasoft
Alerion,
Prodevelop,,
Uni.Mandragon Kuveyt Bank,
Saha BT
The TESTOMAT project will allow software teams to
increase the development speed without sacrificing quality
To achieve this goal, the project will advance the state-of-the-art in test
automation for software teams moving towards a more agile development
process.

Industry 4.0 Internet of Things

variability
"lot size 1"
safety
IEC 62061 (mechatronics) 
RTCA/DO-178C (avionics)
IEC 62304 (medical) 
ISO 26262 (automotive)
agility
rapid customer feedback
faster release cycles

Six decades into the computer revolution, four
decades since the invention of the microprocessor,
and two decades into the rise of the modern Internet,
all of the technology required to transform industries
through software ﬁnally works and can be widely
delivered at global scale.

—
Reliability
system failures.
Artiﬁcial
Intelligence
Inside
Topics of Interest
[…]
Fault diagnosis, analysis, detection and
prediction. Especially at the borders of
hardware and software.

Software Testing is the process of executing a program
or system with the intent of ﬁnding errors.
(Myers, Glenford J., The art of software testing. Wiley, 1979)

© Serge Demeyer: Keynote SHIFT — IWSF 2020
Continuous Integration Pipeline
13
<<Breaking the Build>>
version
control
build
developer
tests
deploy
scenario
tests
deploy to
production
measure &
validate

[Khom2014] Khomh, F. Adams, B, Dhaliwal, T and Zou, Y Understanding the Impact of
Rapid Releases on Software Quality: The Case of Firefox, Empirical Software
Engineering, Springer. http://link.springer.com/article/10.1007/s10664-014-9308-x
1.0 1.5 2.0 3.0 3.5 3.6 4.0 5.0 7.0
8.0
9.0
Traditional Release Cycle Rapid Release Cycle
(a) Time Line of Major Versions of FireFox
(b) Time Line of Minor Versions of FireFox
Figure 1. Timeline of FireFox versions.
channels are respectively 100,000 for NIGHTLY, 1 million
for AURORA, 10 million for BETA and 100+ millions for
a major Firefox version [11]. NIGHTLY reaches Firefox
developers and contributors, while other channels (i.e., AU-
RORA and BETA) recruit external users for testing. The
source code on AURORA is tested by web developers who
are interested in the latest standards, and by Firefox add-on
developers who are willing to experiment with new browser
APIs. The BETA channel is tested by Firefox’s regular beta
by bug triaging developers and assigned for fixing. When
a developer fixes a bug, he typically submits a patch to
Bugzilla. Once approved, the patch code is integrated into
the source code of Firefox on the corresponding channel and
migrated through the other channels for release. Bugs that
take too long to get fixed and hence miss a scheduled release
are picked up by the next release’s channel.
III. STUDY DESIGN

[Khom2014] Khomh, F. Adams, B, Dhaliwal, T and Zou, Y Understanding the Impact of
Rapid Releases on Software Quality: The Case of Firefox, Empirical Software
Engineering, Springer. http://link.springer.com/article/10.1007/s10664-014-9308-x
✓ bugs are fixed faster
(but … harder bugs propagated to later releases)
✓ amount of pre- & post-release bugs ± the same
✓ the program crashes earlier
(perhaps due to recent features)
3.6 4.0 5.0 7.0
8.0
9.0
Rapid Release Cycle
rs and assigned for fixing. When
he typically submits a patch to
the patch code is integrated into
on the corresponding channel and
er channels for release. Bugs that
nd hence miss a scheduled release
release’s channel.
TUDY DESIGN
earch questions:
ease cycle affect the
erence in the number
control for the time
lease dates. However,
tly lower for versions
les, i.e., failures seem
cycle affect the fixing
ter for versions devel-
e cycle affect software
d release model are
.e., the proportion of
ersions that possibly
5.0 NIGHTLY 6.0 NIGHTLY 7.0 NIGHTLY 8.0 NIGHTLY
5.0 AURORA 6.0 AURORA 7.0 AURORA
5.0 BETA 6.0 BETA
5.0 MAIN
New Feature Development
6 Weeks 6 Weeks 6 Weeks 6 Weeks
Figure 2. Development and Release Process of Mozilla Firefox
major release was made. Figure 1(b) shows the release dates
of the minor versions of Firefox.
With the advent of shorter release cycles in March 2011,
new features need to be tested and delivered to users faster.
To achieve this goal, Firefox changed its development pro-
cess. First, versions are no longer supported in parallel, i.e.,

How strong are your tests?
version
control
build
developer
tests
deploy
scenario
tests
deploy to
production
measure &
validate

import org.junit.Test;
import static org.junit.Assert.assertEquals;
public class TestEmployeeDetails {
EmpBusinessLogic empBusinessLogic = new EmpBusinessLogic();
EmployeeDetails employee = new EmployeeDetails();
//happy day scenario for calculation of appraisal and salary
@Test
public void testCalculateAppriasal() {
employee.setName("Rajeev");
employee.setAge(25);
employee.setMonthlySalary(8000);
double appraisal = empBusinessLogic.calculateAppraisal(employee);
double salary = empBusinessLogic.calculateYearlySalary(employee);
}
}
assertionless
test

int compare(int v1,
int v2)
{if (v1 <v2)
return 1;
return -1;
}
int compare(int v1,
int v2)
{if (v1 >=v2)
return 1;
return -1;
}
🙂
🙁

Operator Description Example
Before After
CBM Mutates the boundary conditions a > b a >= b
IM Mutates increment operators a++ a−−
INM Inverts negation operator −a a
MM
Mutates arithmetic & logical
operators
a & b a | b
NCM Negates a conditional operator a == b a != b
RVM
Mutates the return value of a
function
return true return false
VMCM Removes a void method call voidCall(x) –
Competent Programmer
Hypothesis
(Program is close to correct)
Coupling Effect
(Test suites capable of detecting simple errors 
will also detect complex errors)

Industrial Case Study
22
• 83K lines of code
• Complicated structure
• Lots of legacy code
• Lots of black-box tests
Ali Parsai, Serge Demeyer; “Comparing Mutation Coverage Against Branch Coverage in an Industrial Setting”.
Software Tools for Technology Transfer

http://littledarwin.parsai.net

Industrial Case
24
Unit tests only !
Segmentation
Percentage
020406080100
Mutation Coverage
Branch Coverage

CI
D
evelop
Build
Test
W
ay too slow
We witnessed 48 hours of mutation testing time on a
test suite comprising 272 unit tests and 5,258 lines of
test code for testing a project with 48,873 lines of
production code.
Sten Vercammen, Serge Demeyer, Markus Borg, and Sigrid Eldh; “Speeding up Mutation Testing via the
Cloud: Lessons Learned for Further Optimisations”. Proceedings ESEM 2018

Master
1) Initial test Build
2) ∀ files to mutate:
queue file names
3a) Generate mutants
4a) Execute mutants
3b) Store mutants
4b) Store results
3c) Queue mutant
references
5) Process results

0
1h
2h
3h
LittleDarwin
1 worker
2 workers
4 workers
8 workers
16 workers
0
12h
1d
1d 12h
2d
2d 12h
LittleDarwin
1 worker
2 workers
4 workers
8 workers
16 workers

https://github.com/joakim-brannstrom/dextool

Presence of defect
+ reach the defect
+ infect the program state
+ observable on output
coverage
mutants
Mutation Testing
= Actionable !

—
Reliability
Artiﬁcial
Intelligence
Inside
system failures.
Topics of Interest
[…]

Control
Sensors
Actuators
Plant
Environm
ent
Control Model
Plant Model
Virtual
Actuators
Virtual
Sensors
M
iL
Environm
ent
Deployed Code
Real-Time Plant
H
iL
Environm
ent
Generated Code
SiL&PiL
Environm
ent
Realized Control
Realized Plant
R
ealized
Actuators
R
ealized
Sensors
R
ealEnvironm
ent
Define
Architecture
:SystemRequirements
Con
Environment :Architecture
Pla
Embe
:ModelPlant
:ModelSim
Sensors
:ModelS
Actuato
:MiLSimulation
Pl
Control :
Sensors :ControlDSL
:ModelControl
:ModelSim
Environment
MiLEnvironment :ControlDSL
: ECUDeployment
:HiLSimulation
:ConfigureHiL
Actuators
:
:ConfigureHiL
Sensors
ECU
Sensors :HiLDSEnvironment :HiLDSL
:ConfigureHiL
Environment
RTPlant :PlantDSL
Realize
System
S
:SystemTest
:DefineSpecifications
:Specifcations
Legend
Control Flow Data Flow
:Artifact :ManualActivity :(Semi-)AutoActivity
:DiscretizeControlModel
:GenerateCode
:SiL&PiLSimulation
Contr
:C-Code
: ECUConfiguration
:Co
22
Actual
System
Hardware 
in the loop
Software 
in the loop
Model 
in the loop
Conceptual
Model
Ken Vanherpen; “A Contract-Based Approach for Multi-Viewpoint Consistency in the
Concurrent Design of Cyber-Physical Systems ”. PhD Thesis. Universiteit Antwerpen
Mutation Testing
@ Simulink
Calibrate
Faithfull
“Digital Twin”

—
Reliability
system failures.
Topics of Interest
[…]
Artiﬁcial
Intelligence
Inside

Spectrum Based Fault Localisation
34
wastedeffort
7
46
56
122
48
2
104
124
74
6
109
9
83
68
116
90
86
27
32
97
128
16
115
81
52
57
60
42
62
63
69
75
23
26
88
65
77
84
73
1
133
51
113
76
117
53
61
38
34
105
29
72
78
96
43
103
33
4
87
49
30
85
15
112
25
118
125
82
20
110
132
39
45
89
71
10
114
5
24
64
35
54
40
3
127
70
28
120
98
121
37
19
100
11
99
108
14
17
131
50
95
66
58
41
80
92
93
47
44
91
67
8
126
31
36
79
55
13
21
22
106
119
123
130
102
111
129
12
18
101
94
107
59
7
46
56
122
48
2
104
124
74
6
109
9
83
68
116
90
86
27
32
97
128
16
115
81
52
57
60
42
62
63
69
75
23
26
88
65
77
84
73
1
133
51
113
76
117
53
61
38
34
105
29
72
78
96
43
103
33
4
87
49
30
85
15
112
25
118
125
82
20
110
132
39
45
89
71
10
114
5
24
64
35
54
40
3
127
70
28
120
98
121
37
19
100
11
99
108
14
17
131
50
95
66
58
41
80
92
93
47
44
91
67
8
126
31
36
79
55
13
21
22
106
119
123
130
102
111
129
12
18
101
94
107
59
11513515517519511151140116511901
11513515517519511151140116511901
Patterned Spectrum Analysis
Raw Spectrum Analysis
wastedeffort
3
15
2
8
9
23
1
14
17
5
18
16
4
7
26
21
25
22
6
13
24
10
12
19
20
27
3
15
2
8
9
23
1
14
17
5
18
16
4
7
26
21
25
22
6
13
24
10
12
19
20
27
151101151201251301351401451
151101151201251301351401451
Patterned Spectrum Analysis
Raw Spectrum Analysis
Artiﬁcial
Intelligence
Inside

Test Ampliﬁcation
36
Benjamin Danglot, Oscar Vera-Pérez, Benoit Baudry, Martin Monperrus. Automatic Test Improvement with DSpot: a Study with Ten
Mature Open-Source Projects. Empirical Software Engineering, Springer Verlag, 2019, pp.1-35. 10.1007/s10664-019-09692-y .
Mehrdad Abdi, Henrique Rocha and Serge Demeyer. Test Ampliﬁcation in the Pharo Smalltalk Ecosystem. Proceedings IWST 2019
(International Workshop on Smalltalk Technologies)
input generation
+ assertion generation
testWithdraw
|b|
b := SmallBank new. 
b deposit: 100. 
self assert: b balance equals: 100.
b withdraw: 30. 
self assert: b balance equals: 70
Genetic
Algorithms
Inside
testWithdraw_12
| b |
b := SmallBank new.
b deposit: 100.
b withdraw: SmallInteger maxVal.
self assert: b balance equals: 100

12 pull requests
9 merged
3 pending

Description text Mining
Stack Traces Link to source code
Product/Component
Speciﬁc vocabulary
Suggestions?

Question Cases Precision Recall
Who should fix this bug? Eclipse, Firefox, gcc
eclipse: 57%
firefox: 64%
gcc: 6%
—
How long will it take to fix
this bug? (*)
JBoss
depends on the component
many similar reports: off by one hour
few similar reports: off by 7 hours
What is the severity of this
bug? (**)
Mozilla, Eclipse, Gnome
mozilla, eclipse:67% -
73%
gnome:
75%-82%
mozilla, eclipse:50% -
75%
gnome:
68%-84%
Promising results but …
• how much training is needed? (cross-project training?)
• how reliable is the data? (estimates, severity, assigned-to)
• does this generalise? (on industrial scale?)
replication is needed
(*) In CSMR2012 Proceedings
Who should fix this bug? Eclipse, Firefox, gcc
eclipse: 57%
firefox: 64%
gcc: 6%
—
Irrelevant for
Practitioners
(**) In CSMR2011; MSR 2010 Proceedings
Artificial
Intelligence
Inside

Q&A support
42

In CHI2016 Proceedings
Artiﬁcial
Intelligence
Inside

version
control
build
developer
tests
deploy
scenario
tests
deploy to
production
measure &
validate
Artiﬁcial
Intelligence
Inside
If your AI system cannot motivate its decision
practitioners will not accept it.

Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to Increase Reliability

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (18)

Similar a Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to Increase Reliability

Similar a Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to Increase Reliability (20)

Más de University of Antwerp

Más de University of Antwerp (8)

Último

Último (20)

Finding Bugs, Fixing Bugs, Preventing Bugs — Exploiting Automated Tests to Increase Reliability