3. Executing all test suites
takes too long
3
Often release several times
in one day!
4. Defect models can help QA teams to
allocate limited resources effectively
4
Defect prediction
model
5. Defect models are trained using historical
data to predict the defect-prone modules
5
a
b
c c
a
New!
c
Reason
for change
Changed
modules
Developer
responsible
7. Defect models are trained using
various techniques
7
Simple
techniques
Advanced
techniques
Decision
Trees
Logistic
Regression
+
Logistic
Model Trees
(LMT)
8. Most classification techniques produce
models that achieve similar performance?
8
Decision Trees Logistic Model Trees
(LMT)
+
The performance of 17 of 22
studied techniques are
indistinguishable
Benchmarking classification
models for software defect
prediction
S. Lessmann, B. Baesens,
C. Mues, S. Pietsch
[TSE 2008]
9. Limitations of the prior work
9
Overlapping
statistical ranks
Noisy
data
Limited
scope
10. Do most techniques produce models
with similar performance, when we use:
10
Non-overlapping
statistical ranks
Clean
data
Expanded
scope
Overlapping
statistical ranks
Noisy
data
Limited
scope
11. Do most techniques produce models
with similar performance, when we use:
11
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
12. Do most techniques produce models
with similar performance, when we use:
12
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
13. Our approach to study the impact of
classification techniques on defect models
13
Train and
test models
using
different
techniques
Rank
techniques
using
statistical
clustering
11a
22b
NNz
...
Performance
scores for
each
technique
Rank Tech.
1
2
3
z, …
a,b,…
…
Repeat
100 times
15. Non-overlapping ranks using a
double Scott-Knott test
15
Project 2
Scott-Knott
test (1st run)
...Mean AUC
value
Technique 1
Mean AUC
value
Technique 1
Mean AUC
value
Technique 1
10x
Mean AUC
value
Technique 2
Mean AUC
value
Technique 2
Mean AUC
value
Technique 2
10x
Mean AUC
value
Technique N
Mean AUC
value
Technique N
Mean AUC
value
Technique N
10x
T2, T5, T7
TechniqueRank
1
T1, T102
T3, T4, T63
T8, T94
Project 1
Scott-Knott
test (1st run)
...Mean AUC
value
Technique 1
Mean AUC
value
Technique 1
Mean AUC
value
Technique 1
10x
Mean AUC
value
Technique 2
Mean AUC
value
Technique 2
Mean AUC
value
Technique 2
10x
Mean AUC
value
Technique N
Mean AUC
value
Technique N
Mean AUC
value
Technique N
10x
T3, T7, T8
TechniqueRank
1
T2, T102
T1, T4, T63
T5, T94
Project M
...
16. Non-overlapping ranks using a
double Scott-Knott test
16
Scott-Knott
test (2nd run)
Scott-Knott
test (1st run)
10x
T2, T5, T7
TechniqueRank
1
T1, T102
T3, T4, T63
T8, T94
T2, T5
TechniqueRank
1
T1, T7, T102
T3, T4, T63
T8, T94
Scott-Knott
test (1st run)
10x
T3, T7, T8
TechniqueRank
1
T2, T102
T1, T4, T63
T5, T94
17. 17
Non-overlapping test:
Most techniques have similar performance
Rank
1
2
Ad+NB, EM, RBFs, …
Rsub+SMO, J48, …
Technique
Similar to the prior work,techniques are groupedinto 2 distinct ranks
18. Do most techniques produce models
with similar performance, when we use:
18
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
Yes, techniques
are grouped into
2 distinct ranks
19. Do most techniques produce models
with similar performance, when we use:
19
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
Yes, techniques
are grouped into
2 distinct ranks
20. Clean NASA dataset:
Cleaning criteria of prior work
20
Data Quality: Some Comments on the
NASA Software Defect Datasets
M. Shepperd, Q. Song, Z. Sun, C. Mair
[TSE 2013]
Identical cases
Missing values
Constraint violations
21. Clean NASA dataset:
Many distinct ranks of techniques
21
Rank
1
2
LMT, SL, …
KNN, RBFs, …
Technique
3 J48, K-means, …
4 SMO, Ridor, …
Unlike the prior work,techniques are groupedinto 4 distinct ranks
Top performers are LMTand logistic regression
22. Do most techniques produce models
with similar performance, when we use:
22
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
Yes, techniques
are grouped into
2 distinct ranks
No, unlike theprior work,techniques aregrouped into 4distinct ranks
23. Do most techniques produce models
with similar performance, when we use:
23
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
Yes, techniques
are grouped into
2 distinct ranks
No, unlike theprior work,techniques aregrouped into 4distinct ranks
25. Another dataset:
Four significant ranks of techniques
25
Rank
1
2
LMT, SL, …
KNN, RBFs, …
Technique
3 J48, K-means, …
4 SMO, Ridor, …
Unlike the prior work,techniques are groupedinto 4 distinct ranks
Top performers are LMTand logistic regression
26. Do most techniques produce models
with similar performance, when we use:
26
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
No, similar to the
clean data study,
techniques are
grouped into 4
distinct ranks
Yes, techniques
are grouped into
2 distinct ranks
No, unlike theprior work,techniques aregrouped into 4distinct ranks