4. 1. Search within past bug reports
• Find similar bug reports and identify patches
linked to them
1. Search within source – code
• Search comments, method names, variable
names etc to identify code regions with high
text overlap 4
5. No dependence on program sizes, programming
languages, types of faults or the presence of
passing & failing test inputs; unlike existing
program-analysis based approaches:
Program slicing
Statistical debugging / spectra-based techniques
Delta debugging / mutation based approaches
Can be readily applied to jumpstart debugging
Possible Tactic: Identify a small set of files with text
search and feed that as input to a program analysis
based technique to localize to a set of lines
5
6. IR systems proposed in different areas of software
maintenance to recommend relevant artifacts in context of
developer tasks
Hipikat, Lassie, DebugAdvisor
Efficacy of different language models have been evaluated for
fault localization (Rao et al, Marcus et al, Cleary et al)
Vector Space Model, Latent Semantic Indexing, Latent Dirichlet
Allocation, Cluster Based Decision Making
Rao and Kak suggest that IR-based bug localization is at least
as effective as static and dynamic analysis techniques
Enslen proposed Identifier-Splitting to increase vocabulary
overlap between bug reports and code base
E.g., a code word TextFieldTool is split into three words: text, field,
tool. 6
7. Index
Creator
Index
Creator
Query
Creator
Query
Creator
From repository of past
Resolved Bugs
Search For {query}
Search On {created indices}
Incoming Bug
Past Resolved
Bugs and linked
Code repository
Search Results:
Ranked list of files
Bug Index
(BI)
Bug Index
(BI)
Code Index
(CI)
Code Index
(CI)
From code repository
Meta Index
(MI)
Meta Index
(MI)
From code repository –
processed through
identifier splitting
Search
Module
Search
Module
Results
Collator
Results
Collator
Collate Title &
Description (A)
Collate Title &
Description (A)
Boost weight of
Title Words
Boost weight of
Title Words
Boost weight of
Code Words
Boost weight of
Code Words
Indexing Strategies
Querying Strategies
7
8. RQ1 : How do the following search approaches
compare in terms of efficacy? Are they any better than
chance?
Search on past bug reports – Bug Index (BI)
Search on code repository – Code Index (CI)
Search on processed code repository– Meta Index (MI)
RQ2 : Can we combine them to increase efficacy?
RQ3 : How do different features of the source code and
the bugs available in a project impact the effectiveness
of search? 8
9. 4 open source subjects
BIRT, Datatools (Eclipse)
Derby, Hadoop (Apache)
Linking bug reports to change-sets
Mined from references to bug-ids in
commit comments
Tracing JIRA links
Test set has bug reports with at
least one source file associated
with them
1177 bugs in test set
35% of total bugs in chosen releases
3-4% of the bug repositories
9
10. Average Precision, Recall and F1-Score
For each bug in the test set taken as a query, we
calculate precision, recall and F1-score and then
average across the test-set.
Bug Coverage
Percentage of bugs in the test set for which the
search returns at least one file in the
recommendation set matches the ground truth.
10
11. RQ1 : How do the following search approaches
compare in terms of efficacy? Are they any better than
chance?
Search on past bug reports – Bug Index (BI)
Search on code repository – Code Index (CI)
Search on processed code repository– Meta Index (MI)
RQ2 : Can we combine them to increase efficacy?
RQ3 : How do different features of the source code and
the bugs available in a project impact the effectiveness
11
12. CI:A
MI:A
BI:A
PRECISION RECALL F1-SCORE
Increase in recall much slower than drop in precision; so F-score dips beyond result-set size of 3Increase in recall much slower than drop in precision; so F-score dips beyond result-set size of 3
Suggests that search techniques may NOT help in identifying ALL files that needs to be fixedSuggests that search techniques may NOT help in identifying ALL files that needs to be fixed
13. BIRT DATATOOLS
DERBY HADOOP
Bug Coverage Increases with Increase in Result-set SizeBug Coverage Increases with Increase in Result-set Size
None of the techniques emerge as the clear winnerNone of the techniques emerge as the clear winner
MI isn’t any better than CI. Sometimes it performs worseMI isn’t any better than CI. Sometimes it performs worse
Hadoop gives much better results than other 3 subjectsHadoop gives much better results than other 3 subjects
13
14. Compare with efficacy of a user who randomly selects source
files from the code repository as the files to be fixed to resolve
a bug
Think of the code repository as a bin of black and white balls, where the
files that need fix for a bug resolution are white balls; rest are black
balls.
The hyper-geometric distribution gives the probability of choosing
white balls without replacement
probability p of getting at least x files that require a fix by choosing k
files at random from the repository:
If p < 0.05, reject the null hypothesis that search technique is no better
than chance. Do FDR test for multiple hypothesis testing.
14
15. Even if one correct result is returned for a bug, then the
result is usually significant.
Datatools has many queries failing the FDR test
Certain queries have a large number of fixed files (e.g., 491 in 2 bugs)
Record the average number of files in the repository at which
the techniques break even with chance: p >= 0.05
Ranges from 66 in Derby (MI:A) to 158 in Datatools (CI:A)
15
16. RQ1 : How do the following search approaches
compare in terms of efficacy? Are they any better than
chance?
Search on past bug reports – Bug Index (BI)
Search on code repository – Code Index (CI)
Search on processed code repository– Meta Index (MI)
RQ2 : Can we combine them to increase efficacy?
RQ3 : How do different features of the source code and
the bugs available in a project impact the effectiveness
16
17. Fleiss’ Kappa analysis to measure the degree of agreement
amongst the three techniques
Each technique rates a bug: Yes, if technique covers the bug; else No
Code based techniques (CI, MI) are similar, they are quite different
from the bug based technique (BI)
Combine bug based and code based to get better results ?? 17
18. Fire the same query on the 3 different indices and choose
the top X search results using the following ranking
schemes:
RankScore: Rank using the absolute search similarity scores
returned by the search engine
NormScore: Rank using a normalized similarity score - fraction
of maximum score returned by the query
AggregateScore: Rank on the basis of sum of scores from
different techniques
Sample: Pick the top 2*(X/5) search results from the results of
BI:A and CI:A, and the remaining X/5 results from MI:A.
18
19. RankScore works better than the best of the
individual techniques across all subjects
Improvement in bug coverage ranges from 1% to 46%
19
20. RQ1 : How do the following search approaches
compare in terms of efficacy? Are they any better than
chance?
Search on past bug reports – Bug Index (BI)
Search on code repository – Code Index (CI)
Search on processed code repository– Meta Index (MI)
RQ2 : Can we combine them to increase efficacy?
RQ3 : How do different features of the source code and
the bugs available in a project impact the effectiveness
20
21. Since query sizes can become very large, there may
be a need for artificially boosting important words –
TitleWords, CodeWords
TitleBoost helps improve bug coverage
Except in Hadoop where the fraction of titleWords that
come up significant is already high even without boost.
MI MI
MI MI
BI CI BI
BI CI
CI
BI CI
21
22. Compare the efficacy of techniques that directly search
the code repository with those that search over past bug
reports
No clear winner is observed
Bug coverage ranges from 20 to 60% across 4 subjects
Techniques are better than chance
Identifier splitting does not yield much benefit
The techniques are complementary
Bug coverage improves by 1% - 46% by combining them
Favoring title-words help in most cases
24