SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Revisiting the Impact of Classification
Techniques on the Performance of
Defect Prediction Models
Baljinder
Ghotra
Ahmed E.
Hassan
Shane
McIntosh
Quality assurance teams have
limited resources
Personnel Schedules
2
Executing all test suites
takes too long
3
Often release several times
in one day!
Defect models can help QA teams to
allocate limited resources effectively
4
Defect prediction
model
Defect models are trained using historical
data to predict the defect-prone modules
5
a
b
c c
a
New!
c
Reason
for change
Changed
modules
Developer
responsible
Defect prediction
model
Defect models are trained using historical
data to predict the defect-prone modules
6
abccaNew!c
Low risk
a b
High risk
c
Defect models are trained using
various techniques
7
Simple
techniques
Advanced
techniques
Decision
Trees
Logistic
Regression
+
Logistic
Model Trees
(LMT)
Most classification techniques produce
models that achieve similar performance?
8
Decision Trees Logistic Model Trees
(LMT)
+
The performance of 17 of 22
studied techniques are
indistinguishable
Benchmarking classification
models for software defect
prediction
S. Lessmann, B. Baesens,
C. Mues, S. Pietsch
[TSE 2008]
Limitations of the prior work
9
Overlapping
statistical ranks
Noisy
data
Limited
scope
Do most techniques produce models
with similar performance, when we use:
10
Non-overlapping
statistical ranks
Clean
data
Expanded
scope
Overlapping
statistical ranks
Noisy
data
Limited
scope
Do most techniques produce models
with similar performance, when we use:
11
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
Do most techniques produce models
with similar performance, when we use:
12
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
Our approach to study the impact of
classification techniques on defect models
13
Train and
test models
using
different
techniques
Rank
techniques
using
statistical
clustering
11a
22b
NNz
...
Performance
scores for
each
technique
Rank Tech.
1
2
3
z, …
a,b,…
…
Repeat
100 times
Unfortunately, some projects yield
poorer results than others
14
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
CM1
JM1
KC1
KC3
KC4
MW1
PC1
PC2
PC3
PC4
0.5
0.6
0.7
0.8
0.9
AUC
Performance valuesrarely overlap!
Non-overlapping ranks using a
double Scott-Knott test
15
Project 2
Scott-Knott
test (1st run)
...Mean AUC
value
Technique 1
Mean AUC
value
Technique 1
Mean AUC
value
Technique 1
10x
Mean AUC
value
Technique 2
Mean AUC
value
Technique 2
Mean AUC
value
Technique 2
10x
Mean AUC
value
Technique N
Mean AUC
value
Technique N
Mean AUC
value
Technique N
10x
T2, T5, T7
TechniqueRank
1
T1, T102
T3, T4, T63
T8, T94
Project 1
Scott-Knott
test (1st run)
...Mean AUC
value
Technique 1
Mean AUC
value
Technique 1
Mean AUC
value
Technique 1
10x
Mean AUC
value
Technique 2
Mean AUC
value
Technique 2
Mean AUC
value
Technique 2
10x
Mean AUC
value
Technique N
Mean AUC
value
Technique N
Mean AUC
value
Technique N
10x
T3, T7, T8
TechniqueRank
1
T2, T102
T1, T4, T63
T5, T94
Project M
...
Non-overlapping ranks using a
double Scott-Knott test
16
Scott-Knott
test (2nd run)
Scott-Knott
test (1st run)
10x
T2, T5, T7
TechniqueRank
1
T1, T102
T3, T4, T63
T8, T94
T2, T5
TechniqueRank
1
T1, T7, T102
T3, T4, T63
T8, T94
Scott-Knott
test (1st run)
10x
T3, T7, T8
TechniqueRank
1
T2, T102
T1, T4, T63
T5, T94
17
Non-overlapping test:
Most techniques have similar performance
Rank
1
2
Ad+NB, EM, RBFs, …
Rsub+SMO, J48, …
Technique
Similar to the prior work,techniques are groupedinto 2 distinct ranks
Do most techniques produce models
with similar performance, when we use:
18
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
Yes, techniques
are grouped into
2 distinct ranks
Do most techniques produce models
with similar performance, when we use:
19
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
Yes, techniques
are grouped into
2 distinct ranks
Clean NASA dataset:
Cleaning criteria of prior work
20
Data Quality: Some Comments on the
NASA Software Defect Datasets
M. Shepperd, Q. Song, Z. Sun, C. Mair
[TSE 2013]
Identical cases
Missing values
Constraint violations
Clean NASA dataset:
Many distinct ranks of techniques
21
Rank
1
2
LMT, SL, …
KNN, RBFs, …
Technique
3 J48, K-means, …
4 SMO, Ridor, …
Unlike the prior work,techniques are groupedinto 4 distinct ranks
Top performers are LMTand logistic regression
Do most techniques produce models
with similar performance, when we use:
22
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
Yes, techniques
are grouped into
2 distinct ranks
No, unlike theprior work,techniques aregrouped into 4distinct ranks
Do most techniques produce models
with similar performance, when we use:
23
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
Yes, techniques
are grouped into
2 distinct ranks
No, unlike theprior work,techniques aregrouped into 4distinct ranks
Another dataset:
The PROMISE corpus
24
Another dataset:
Four significant ranks of techniques
25
Rank
1
2
LMT, SL, …
KNN, RBFs, …
Technique
3 J48, K-means, …
4 SMO, Ridor, …
Unlike the prior work,techniques are groupedinto 4 distinct ranks
Top performers are LMTand logistic regression
Do most techniques produce models
with similar performance, when we use:
26
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
No, similar to the
clean data study,
techniques are
grouped into 4
distinct ranks
Yes, techniques
are grouped into
2 distinct ranks
No, unlike theprior work,techniques aregrouped into 4distinct ranks
Classification technique
matters!
27
Decision Trees Logistic Model Trees
(LMT)
+
Low-cost suggestion:
Experiment with the available techniques
28
6,618 packages
are available
on CRAN
148 packagesare available inpackage explorer
shanemcintosh@acm.org

Más contenido relacionado

La actualidad más candente

The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...Chakkrit (Kla) Tantithamthavorn
 
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...InVID Project
 
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTSESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTScsandit
 
Testing with Fewer Resources: An Adaptive Approach to Performance-Aware Test ...
Testing with Fewer Resources: An Adaptive Approach to Performance-Aware Test ...Testing with Fewer Resources: An Adaptive Approach to Performance-Aware Test ...
Testing with Fewer Resources: An Adaptive Approach to Performance-Aware Test ...Sebastiano Panichella
 
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Chakkrit (Kla) Tantithamthavorn
 
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyPareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyAbdel Salam Sayyad
 
An Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source ProjectsAn Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source ProjectsPavneet Singh Kochhar
 
A software fault localization technique based on program mutations
A software fault localization technique based on program mutationsA software fault localization technique based on program mutations
A software fault localization technique based on program mutationsTao He
 
ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016InVID Project
 
Comparison of papers NN-filter
Comparison of papers NN-filterComparison of papers NN-filter
Comparison of papers NN-filtersaman shaheen
 
Diversity Maximization Speedup for Fault Localization
Diversity Maximization Speedup for Fault LocalizationDiversity Maximization Speedup for Fault Localization
Diversity Maximization Speedup for Fault LocalizationLiang Gong
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directionsTao He
 
Decision Support Analyss for Software Effort Estimation by Analogy
Decision Support Analyss for Software Effort Estimation by AnalogyDecision Support Analyss for Software Effort Estimation by Analogy
Decision Support Analyss for Software Effort Estimation by AnalogyTim Menzies
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...Manuel Martín
 

La actualidad más candente (14)

The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
 
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...
 
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTSESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
 
Testing with Fewer Resources: An Adaptive Approach to Performance-Aware Test ...
Testing with Fewer Resources: An Adaptive Approach to Performance-Aware Test ...Testing with Fewer Resources: An Adaptive Approach to Performance-Aware Test ...
Testing with Fewer Resources: An Adaptive Approach to Performance-Aware Test ...
 
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
 
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyPareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
 
An Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source ProjectsAn Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source Projects
 
A software fault localization technique based on program mutations
A software fault localization technique based on program mutationsA software fault localization technique based on program mutations
A software fault localization technique based on program mutations
 
ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016
 
Comparison of papers NN-filter
Comparison of papers NN-filterComparison of papers NN-filter
Comparison of papers NN-filter
 
Diversity Maximization Speedup for Fault Localization
Diversity Maximization Speedup for Fault LocalizationDiversity Maximization Speedup for Fault Localization
Diversity Maximization Speedup for Fault Localization
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
 
Decision Support Analyss for Software Effort Estimation by Analogy
Decision Support Analyss for Software Effort Estimation by AnalogyDecision Support Analyss for Software Effort Estimation by Analogy
Decision Support Analyss for Software Effort Estimation by Analogy
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...
 

Similar a Ghotra icse

Benchmarking search relevance in industry vs academia
Benchmarking search relevance in industry vs academiaBenchmarking search relevance in industry vs academia
Benchmarking search relevance in industry vs academiaNick Craswell
 
Principles of effort estimation
Principles of effort estimationPrinciples of effort estimation
Principles of effort estimationCS, NcState
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSung Kim
 
2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071CS, NcState
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?Michaela Greiler
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Chakkrit (Kla) Tantithamthavorn
 
Ekrem Kocaguneli PhD Defense Presentation
Ekrem Kocaguneli PhD Defense PresentationEkrem Kocaguneli PhD Defense Presentation
Ekrem Kocaguneli PhD Defense PresentationEkrem Kocagüneli
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEbutest
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritizationijsrd.com
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Anubhav Jain
 
This is a heavily data-oriented
This is a heavily data-orientedThis is a heavily data-oriented
This is a heavily data-orientedbutest
 
This is a heavily data-oriented
This is a heavily data-orientedThis is a heavily data-oriented
This is a heavily data-orientedbutest
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingAkin Osman Kazakci
 
Improving Spam Mail Filtering Using Classification Algorithms With Partition ...
Improving Spam Mail Filtering Using Classification Algorithms With Partition ...Improving Spam Mail Filtering Using Classification Algorithms With Partition ...
Improving Spam Mail Filtering Using Classification Algorithms With Partition ...IRJET Journal
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...IOSR Journals
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Lionel Briand
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniquesPoonam Kshirsagar
 
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...Jihun Park
 
Mech vii-operation research [06 me74]-notes
Mech vii-operation research [06 me74]-notesMech vii-operation research [06 me74]-notes
Mech vii-operation research [06 me74]-notesMallikarjunaswamy Swamy
 

Similar a Ghotra icse (20)

Benchmarking search relevance in industry vs academia
Benchmarking search relevance in industry vs academiaBenchmarking search relevance in industry vs academia
Benchmarking search relevance in industry vs academia
 
Principles of effort estimation
Principles of effort estimationPrinciples of effort estimation
Principles of effort estimation
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...
 
Ekrem Kocaguneli PhD Defense Presentation
Ekrem Kocaguneli PhD Defense PresentationEkrem Kocaguneli PhD Defense Presentation
Ekrem Kocaguneli PhD Defense Presentation
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AE
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritization
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
This is a heavily data-oriented
This is a heavily data-orientedThis is a heavily data-oriented
This is a heavily data-oriented
 
This is a heavily data-oriented
This is a heavily data-orientedThis is a heavily data-oriented
This is a heavily data-oriented
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
 
Improving Spam Mail Filtering Using Classification Algorithms With Partition ...
Improving Spam Mail Filtering Using Classification Algorithms With Partition ...Improving Spam Mail Filtering Using Classification Algorithms With Partition ...
Improving Spam Mail Filtering Using Classification Algorithms With Partition ...
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...
 
Mech vii-operation research [06 me74]-notes
Mech vii-operation research [06 me74]-notesMech vii-operation research [06 me74]-notes
Mech vii-operation research [06 me74]-notes
 

Más de SAIL_QU

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...SAIL_QU
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsSAIL_QU
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...SAIL_QU
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...SAIL_QU
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...SAIL_QU
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...SAIL_QU
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...SAIL_QU
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?SAIL_QU
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesSAIL_QU
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesSAIL_QU
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...SAIL_QU
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...SAIL_QU
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...SAIL_QU
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...SAIL_QU
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?SAIL_QU
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...SAIL_QU
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...SAIL_QU
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsSAIL_QU
 

Más de SAIL_QU (20)

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load tests
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log Changes
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
 

Ghotra icse

  • 1. Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models Baljinder Ghotra Ahmed E. Hassan Shane McIntosh
  • 2. Quality assurance teams have limited resources Personnel Schedules 2
  • 3. Executing all test suites takes too long 3 Often release several times in one day!
  • 4. Defect models can help QA teams to allocate limited resources effectively 4 Defect prediction model
  • 5. Defect models are trained using historical data to predict the defect-prone modules 5 a b c c a New! c Reason for change Changed modules Developer responsible
  • 6. Defect prediction model Defect models are trained using historical data to predict the defect-prone modules 6 abccaNew!c Low risk a b High risk c
  • 7. Defect models are trained using various techniques 7 Simple techniques Advanced techniques Decision Trees Logistic Regression + Logistic Model Trees (LMT)
  • 8. Most classification techniques produce models that achieve similar performance? 8 Decision Trees Logistic Model Trees (LMT) + The performance of 17 of 22 studied techniques are indistinguishable Benchmarking classification models for software defect prediction S. Lessmann, B. Baesens, C. Mues, S. Pietsch [TSE 2008]
  • 9. Limitations of the prior work 9 Overlapping statistical ranks Noisy data Limited scope
  • 10. Do most techniques produce models with similar performance, when we use: 10 Non-overlapping statistical ranks Clean data Expanded scope Overlapping statistical ranks Noisy data Limited scope
  • 11. Do most techniques produce models with similar performance, when we use: 11 Non-overlapping statistical ranks Expanded scope Clean data
  • 12. Do most techniques produce models with similar performance, when we use: 12 Non-overlapping statistical ranks Expanded scope Clean data
  • 13. Our approach to study the impact of classification techniques on defect models 13 Train and test models using different techniques Rank techniques using statistical clustering 11a 22b NNz ... Performance scores for each technique Rank Tech. 1 2 3 z, … a,b,… … Repeat 100 times
  • 14. Unfortunately, some projects yield poorer results than others 14 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● CM1 JM1 KC1 KC3 KC4 MW1 PC1 PC2 PC3 PC4 0.5 0.6 0.7 0.8 0.9 AUC Performance valuesrarely overlap!
  • 15. Non-overlapping ranks using a double Scott-Knott test 15 Project 2 Scott-Knott test (1st run) ...Mean AUC value Technique 1 Mean AUC value Technique 1 Mean AUC value Technique 1 10x Mean AUC value Technique 2 Mean AUC value Technique 2 Mean AUC value Technique 2 10x Mean AUC value Technique N Mean AUC value Technique N Mean AUC value Technique N 10x T2, T5, T7 TechniqueRank 1 T1, T102 T3, T4, T63 T8, T94 Project 1 Scott-Knott test (1st run) ...Mean AUC value Technique 1 Mean AUC value Technique 1 Mean AUC value Technique 1 10x Mean AUC value Technique 2 Mean AUC value Technique 2 Mean AUC value Technique 2 10x Mean AUC value Technique N Mean AUC value Technique N Mean AUC value Technique N 10x T3, T7, T8 TechniqueRank 1 T2, T102 T1, T4, T63 T5, T94 Project M ...
  • 16. Non-overlapping ranks using a double Scott-Knott test 16 Scott-Knott test (2nd run) Scott-Knott test (1st run) 10x T2, T5, T7 TechniqueRank 1 T1, T102 T3, T4, T63 T8, T94 T2, T5 TechniqueRank 1 T1, T7, T102 T3, T4, T63 T8, T94 Scott-Knott test (1st run) 10x T3, T7, T8 TechniqueRank 1 T2, T102 T1, T4, T63 T5, T94
  • 17. 17 Non-overlapping test: Most techniques have similar performance Rank 1 2 Ad+NB, EM, RBFs, … Rsub+SMO, J48, … Technique Similar to the prior work,techniques are groupedinto 2 distinct ranks
  • 18. Do most techniques produce models with similar performance, when we use: 18 Non-overlapping statistical ranks Expanded scope Clean data Yes, techniques are grouped into 2 distinct ranks
  • 19. Do most techniques produce models with similar performance, when we use: 19 Non-overlapping statistical ranks Expanded scope Clean data Yes, techniques are grouped into 2 distinct ranks
  • 20. Clean NASA dataset: Cleaning criteria of prior work 20 Data Quality: Some Comments on the NASA Software Defect Datasets M. Shepperd, Q. Song, Z. Sun, C. Mair [TSE 2013] Identical cases Missing values Constraint violations
  • 21. Clean NASA dataset: Many distinct ranks of techniques 21 Rank 1 2 LMT, SL, … KNN, RBFs, … Technique 3 J48, K-means, … 4 SMO, Ridor, … Unlike the prior work,techniques are groupedinto 4 distinct ranks Top performers are LMTand logistic regression
  • 22. Do most techniques produce models with similar performance, when we use: 22 Non-overlapping statistical ranks Expanded scope Clean data Yes, techniques are grouped into 2 distinct ranks No, unlike theprior work,techniques aregrouped into 4distinct ranks
  • 23. Do most techniques produce models with similar performance, when we use: 23 Non-overlapping statistical ranks Expanded scope Clean data Yes, techniques are grouped into 2 distinct ranks No, unlike theprior work,techniques aregrouped into 4distinct ranks
  • 25. Another dataset: Four significant ranks of techniques 25 Rank 1 2 LMT, SL, … KNN, RBFs, … Technique 3 J48, K-means, … 4 SMO, Ridor, … Unlike the prior work,techniques are groupedinto 4 distinct ranks Top performers are LMTand logistic regression
  • 26. Do most techniques produce models with similar performance, when we use: 26 Non-overlapping statistical ranks Expanded scope Clean data No, similar to the clean data study, techniques are grouped into 4 distinct ranks Yes, techniques are grouped into 2 distinct ranks No, unlike theprior work,techniques aregrouped into 4distinct ranks
  • 28. Low-cost suggestion: Experiment with the available techniques 28 6,618 packages are available on CRAN 148 packagesare available inpackage explorer