5. Context: Software Project
Coming back
to the reality...
Program
Comprehension
Source Code
Maintenance
Tasks
Difficult
understanding
Developer
6. Idea
We argue that messages exchanged among contributors/developers
are a useful source of information to help understanding source code.
In such situations developers need to infer knowledge from,
the source code itself
Developer
source code descriptions in external
artifacts.
7. ..................................................
When call the method IndexSplitter.split(File
destDir, String[] segs) from the Lucene cotrib
directory(contrib/misc/src/java/org/apache/luc
ene/index) it creates an index with segments
descriptor file with among contributors/developers
We argue that messages exchanged wrong data. Namely wrong
are a useful source of information to help understanding of segment
is the number representing the name source code.
that would be created next in this index.
Idea
..................................................
In such situations developers need to infer knowledge from,
CLASS: IndexSplitter
the source code itself
Developer
METHOD: split
source code descriptions in external
artifacts.
9. Step 1: Downloading emails/bugs reports and tracing them
onto classes
Two heuristics
Developer
Discussion
The discussion contains a fully-qualified class name (e.g.,
org.apache.lucene.analysis.MappingCharFilter); or the email contains a
file name (e.g., MappingCharFilter.java)
For bug reports, we complement the above heuristic by matching the
bug ID of each closed bug to the commit notes, therefore tracing the bug
report to the files changed in that commit
When call the method IndexSplitter .split(File destDir, String[] segs) from the
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.
public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }
If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.
10. Step 1: Downloading emails/bugs reports and tracing them
onto classes
Two heuristics
Developer
Discussion
The discussion contains a fully-qualified class name (e.g.,
org.apache.lucene.analysis.MappingCharFilter); or the email contains a
file name (e.g., MappingCharFilter.java)
For bug reports, we complement the above heuristic by matching the
bug ID of each closed bug to the commit notes, therefore tracing the bug
report to the files changed in that commit
When call the method IndexSplitter .split(File destDir, String[] segs) from the
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.
CLASS: IndexSplitter
public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }
If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.
11. Step 2: Extracting paragraphs
We consider as paragraphs, text section separated by one or more
white lines
Two heuristics
We prune out paragraph description from source code fragments
and/or stack Traces "by using an approach inspired by the work of
Bacchelli et al.
Developer
Discussion
When call the method IndexSplitter.split(File destDir, String[] segs) from the
PAR
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
1
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.
public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }
PAR
2
If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.
PAR
3
12. Step 2: Extracting paragraphs
We consider as paragraphs, text section separated by one or more
white lines
Two heuristics
We prune out paragraph description from source code fragments
and/or stack Traces "by using an approach inspired by the work of
Bacchelli et al.
Developer
Discussion
When call the method IndexSplitter.split(File destDir, String[] segs) from the
PAR
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
1
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.
public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }
PAR
2
If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.
PAR
3
13. Step 3: Tracing paragraphs onto methods
A) A valid paragraph must contain the keyword “method”
These paragraphs must
respect the following
two conditions:
Developer
Discussion
A)
B) and the method name must be followed by a open parenthesis—
i.e., we match “foo(”
B)
When call the method IndexSplitter.split(File destDir, String[] segs)
PAR
from the Lucene cotrib directory it creates an index with segments1
descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this
index.
CLASS: IndexSplitter
METHOD: split(
......................................................................................
......................................................................................
......................................................................................
......................................................................................
14. Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:
..........................
Problem seems to come from
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods(IProgressMonitor,
IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types subtypes as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................
CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
SCORE
15. Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:
..........................
Problem seems to come from
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods(IProgressMonitor,
IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types subtypes as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................
CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
% parameter = 100% -> s1= 1
SCORE
a) Method parameters:
% of parameters
s1= mentioned in the
paragraphs. Value
between 0 and 1
1 if the
method
does not
have
parameters
16. Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:
..........................
Problem seems to come from
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods(IProgressMonitor,
IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types subtypes as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................
CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
% parameter = 100% -> s1= 1
SCORE =
1+
a) Method parameters:
% of parameters
s1= mentioned in the
paragraphs. Value
between 0 and 1
1 if the
method
does not
have
parameters
b) Syntactic descriptions (mentioning return values):
check whether the
Equal to one if
paragraph contains the
the method is
s2= keyword “return”. If YES
void.
Value equal 1, 0 otherwise
17. Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:
..........................
Problem seems to come from
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods(IProgressMonitor,
IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types subtypes as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................
CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
% parameter = 100% -> s1= 1
SCORE =
1+ 0+ 1 = 2
a) Method parameters:
% of parameters
s1= mentioned in the
paragraphs. Value
between 0 and 1
1 if the
method
does not
have
parameters
b) Syntactic descriptions (mentioning return values):
check whether the
Equal to one if
paragraph contains the
the method is
s2= keyword “return”. If YES
void.
Value equal 1, 0 otherwise
c) Overriding/Overloading:
1 if any of the “overload” or
s3=“override” keywords appears
in the paragraph, 0 otherwise
d) Method invocations:
1 if any of the “call” or
s4=“excecute” keywords appears
in the paragraph, 0 otherwise
18. Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:
We selected paragraphs that have:
1. s1 ≥ thP = 0.5
2. s2 + s3 + s4 ≥ thH = 1
a) Method parameters:
% of parameters
s1= mentioned in the
paragraphs. Value
between 0 and 1
b) Syntactic descriptions (mentioning return values):
check whether the
Equal to one if
paragraph contains the
the method is
s2= keyword “return”. If YES
void.
Value equal 1, 0 otherwise
OK
% parameter = 100% -> s1= 1 ≥ 0.5
SCORE =
1+ 0+ 1 = 2 ≥ 1
1 if the
method
does not
have
parameters
c) Overriding/Overloading:
1 if any of the “overload” or
s3=“override” keywords appears
in the paragraph, 0 otherwise
d) Method invocations:
1 if any of the “call” or
s4=“execute” keywords appears
in the paragraph, 0 otherwise
19. Step 5: Similarity based Filtering
We rank filtered paragraphs through their textual similarity with the method they
are likely describing.
METHOD
PARAGRAPH
SCORE
Similarity
Method_3
Paragraph_4
2.5
96.1%
Method_1
Paragraph_1
2.5
95.6%
Method_2
Paragraph_2
1.5
97.4%
Method_3
Paragraph_3
1.5
86.2%
Method_1
Paragraph_3
1.5
79.0%
Method_3
Paragraph_2
1.5
77.5%
Method_2
Paragraph_4
1.5
64.3%
Method_2
Paragraph_3
1.3
83.2%
Method_3
Paragraph_1
1.3
73.9%
Method_2
Paragraph_1
1.3
68.7%
Method_1
Paragraph_4
1.3
53.6%
Removing:
- English stop words;
- Programming
language keywords
Using:
- Camel Case
splitting the on
remaining words
- Vector Space Model
20. Step 5: Similarity based Filtering
We rank filtered paragraphs through their textual similarity with the method they
are likely describing.
METHOD
PARAGRAPH
SCORE
Similarity
Method_3
Paragraph_4
2.5
96.1%
Method_1
Paragraph_1
2.5
95.6%
Method_2
Paragraph_2
1.5
97.4%
Method_3
Paragraph_3
1.5
86.2%
Method_1
Paragraph_3
1.5
79.0%
Method_3
Paragraph_2
1.5
77.5%
Method_2
Paragraph_4
1.5
64.3%
Method_2
Paragraph_3
1.3
83.2%
Method_3
Paragraph_1
1.3
73.9%
Method_2
Paragraph_1
1.3
68.7%
Method_1
Paragraph_4
1.3
53.6%
Removing:
- English stop words;
- Programming
language keywords
Using:
- Camel Case
splitting the on
remaining words
- Vector Space Model
th>=0.80
21. Empirical Study
• Goal: analyze source code descriptions in developer
discussions
• Purpose: investigating how developer discussions
describe methods of Java Source Code
• Quality focus: find good method description in
messages exchanged among contributors/developers
• Context: Bug-report and mailing lists of two Java
Project
Apache Lucene and Eclipse
23. Research Questions
RQ1 (method coverage): How many methods from
the analyzed software systems are described by the
paragraphs identified by the proposed approach?
RQ2 (precision): How precise is the proposed approach
in identifying method descriptions?
RQ3 (missing descriptions): How many potentially
good method descriptions are missed by the approach?
24. RQ1: How many methods from the analyzed software
systems are described by the paragraphs identified
by the proposed approach?
25. RQ1: How many methods from the analyzed software
systems are described by the paragraphs identified
by the proposed approach?
26. RQ1: How many methods from the analyzed software
systems are described by the paragraphs identified
by the proposed approach?
27. RQ2: How precise is the proposed approach in identifying
method descriptions?
We sampled 250 descriptions from each project
28. RQ2: How precise is the proposed approach in identifying
method descriptions?
We sampled 250 descriptions from each project
29. RQ2: How precise is the proposed approach in identifying
method descriptions?
We sampled 250 descriptions from each project
30. RQ3: How many potentially good method descriptions are
missed by the approach?
We sampled 100 descriptions from each project
TABLE III
The analysis of a sample of 100 paragraphs traced to methods,
but not satisfying the Step 4 heuristic
System
True Negatives
False Negatives
Eclipse
78
22
Apache Lucene
67
33