SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Mining Source Code Descriptions
from Developer Communications

Sebastiano
Panichella

Jairo
Aponte

Massimiliano
Di Penta

Andrian
Marcus

Gerardo
Canfora
Context: Software Project

Sequence
diagram

Documentation

Source Code

Class
diagram

Developer

Program
Comprehension

Maintenance
Tasks
Context: Software Project

Sequence
diagram

Documentation

Source Code

Class
diagram

Difficult
understanding

Developer

Program
Comprehension

Maintenance
Tasks
Context: Software Project
describes

Sequence
diagram

Documentation

Source Code

Class
diagram

Difficult
understanding

understanding

Developer

Program
Comprehension

Maintenance
Tasks
Context: Software Project
Coming back
to the reality...

Program
Comprehension

Source Code
Maintenance
Tasks

Difficult
understanding

Developer
Idea
We argue that messages exchanged among contributors/developers
are a useful source of information to help understanding source code.

In such situations developers need to infer knowledge from,
the source code itself

Developer

source code descriptions in external
artifacts.
..................................................

When call the method IndexSplitter.split(File
destDir, String[] segs) from the Lucene cotrib
directory(contrib/misc/src/java/org/apache/luc
ene/index) it creates an index with segments
descriptor file with among contributors/developers
We argue that messages exchanged wrong data. Namely wrong
are a useful source of information to help understanding of segment
is the number representing the name source code.
that would be created next in this index.

Idea

..................................................

In such situations developers need to infer knowledge from,

CLASS: IndexSplitter
the source code itself

Developer

METHOD: split

source code descriptions in external
artifacts.
A Five Step-Approach for Mining
Method Descriptions

Developer
Step 1: Downloading emails/bugs reports and tracing them
onto classes
Two heuristics

Developer
Discussion

The discussion contains a fully-qualified class name (e.g.,
org.apache.lucene.analysis.MappingCharFilter); or the email contains a
file name (e.g., MappingCharFilter.java)
For bug reports, we complement the above heuristic by matching the
bug ID of each closed bug to the commit notes, therefore tracing the bug
report to the files changed in that commit

When call the method IndexSplitter .split(File destDir, String[] segs) from the
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.

public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }

If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.
Step 1: Downloading emails/bugs reports and tracing them
onto classes
Two heuristics

Developer
Discussion

The discussion contains a fully-qualified class name (e.g.,
org.apache.lucene.analysis.MappingCharFilter); or the email contains a
file name (e.g., MappingCharFilter.java)
For bug reports, we complement the above heuristic by matching the
bug ID of each closed bug to the commit notes, therefore tracing the bug
report to the files changed in that commit

When call the method IndexSplitter .split(File destDir, String[] segs) from the
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.

CLASS: IndexSplitter
public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }

If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.
Step 2: Extracting paragraphs
We consider as paragraphs, text section separated by one or more

white lines
Two heuristics

We prune out paragraph description from source code fragments
and/or stack Traces "by using an approach inspired by the work of
Bacchelli et al.

Developer
Discussion
When call the method IndexSplitter.split(File destDir, String[] segs) from the
PAR
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
1
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.

public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }

PAR
2

If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.

PAR
3
Step 2: Extracting paragraphs
We consider as paragraphs, text section separated by one or more

white lines
Two heuristics

We prune out paragraph description from source code fragments
and/or stack Traces "by using an approach inspired by the work of
Bacchelli et al.

Developer
Discussion
When call the method IndexSplitter.split(File destDir, String[] segs) from the
PAR
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
1
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.

public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }

PAR
2

If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.

PAR
3
Step 3: Tracing paragraphs onto methods
A) A valid paragraph must contain the keyword “method”
These paragraphs must
respect the following
two conditions:

Developer
Discussion

A)

B) and the method name must be followed by a open parenthesis—
i.e., we match “foo(”

B)

When call the method IndexSplitter.split(File destDir, String[] segs)
PAR
from the Lucene cotrib directory it creates an index with segments1
descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this
index.

CLASS: IndexSplitter
METHOD: split(
......................................................................................
......................................................................................
......................................................................................
......................................................................................
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:

..........................
Problem seems to come from
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods(IProgressMonitor,
IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types subtypes as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................

CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
SCORE
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:

..........................
Problem seems to come from
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods(IProgressMonitor,
IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types subtypes as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................

CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
% parameter = 100% -> s1= 1
SCORE

a) Method parameters:
% of parameters
s1= mentioned in the
paragraphs. Value
between 0 and 1

1 if the
method
does not
have
parameters
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:

..........................
Problem seems to come from
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods(IProgressMonitor,
IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types subtypes as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................

CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
% parameter = 100% -> s1= 1
SCORE =

1+

a) Method parameters:
% of parameters
s1= mentioned in the
paragraphs. Value
between 0 and 1

1 if the
method
does not
have
parameters

b) Syntactic descriptions (mentioning return values):
check whether the
Equal to one if
paragraph contains the
the method is
s2= keyword “return”. If YES
void.
Value equal 1, 0 otherwise
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:

..........................
Problem seems to come from
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods(IProgressMonitor,
IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types subtypes as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................

CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
% parameter = 100% -> s1= 1
SCORE =

1+ 0+ 1 = 2

a) Method parameters:
% of parameters
s1= mentioned in the
paragraphs. Value
between 0 and 1

1 if the
method
does not
have
parameters

b) Syntactic descriptions (mentioning return values):
check whether the
Equal to one if
paragraph contains the
the method is
s2= keyword “return”. If YES
void.
Value equal 1, 0 otherwise

c) Overriding/Overloading:
1 if any of the “overload” or
s3=“override” keywords appears
in the paragraph, 0 otherwise
d) Method invocations:
1 if any of the “call” or
s4=“excecute” keywords appears
in the paragraph, 0 otherwise
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:

We selected paragraphs that have:
1. s1 ≥ thP = 0.5
2. s2 + s3 + s4 ≥ thH = 1

a) Method parameters:
% of parameters
s1= mentioned in the
paragraphs. Value
between 0 and 1

b) Syntactic descriptions (mentioning return values):
check whether the
Equal to one if
paragraph contains the
the method is
s2= keyword “return”. If YES
void.
Value equal 1, 0 otherwise

OK

% parameter = 100% -> s1= 1 ≥ 0.5
SCORE =

1+ 0+ 1 = 2 ≥ 1

1 if the
method
does not
have
parameters

c) Overriding/Overloading:
1 if any of the “overload” or
s3=“override” keywords appears
in the paragraph, 0 otherwise
d) Method invocations:
1 if any of the “call” or
s4=“execute” keywords appears
in the paragraph, 0 otherwise
Step 5: Similarity based Filtering
We rank filtered paragraphs through their textual similarity with the method they
are likely describing.
METHOD

PARAGRAPH

SCORE

Similarity

Method_3

Paragraph_4

2.5

96.1%

Method_1

Paragraph_1

2.5

95.6%

Method_2

Paragraph_2

1.5

97.4%

Method_3

Paragraph_3

1.5

86.2%

Method_1

Paragraph_3

1.5

79.0%

Method_3

Paragraph_2

1.5

77.5%

Method_2

Paragraph_4

1.5

64.3%

Method_2

Paragraph_3

1.3

83.2%

Method_3

Paragraph_1

1.3

73.9%

Method_2

Paragraph_1

1.3

68.7%

Method_1

Paragraph_4

1.3

53.6%

Removing:
- English stop words;
- Programming
language keywords
Using:
- Camel Case
splitting the on
remaining words
- Vector Space Model
Step 5: Similarity based Filtering
We rank filtered paragraphs through their textual similarity with the method they
are likely describing.
METHOD

PARAGRAPH

SCORE

Similarity

Method_3

Paragraph_4

2.5

96.1%

Method_1

Paragraph_1

2.5

95.6%

Method_2

Paragraph_2

1.5

97.4%

Method_3

Paragraph_3

1.5

86.2%

Method_1

Paragraph_3

1.5

79.0%

Method_3

Paragraph_2

1.5

77.5%

Method_2

Paragraph_4

1.5

64.3%

Method_2

Paragraph_3

1.3

83.2%

Method_3

Paragraph_1

1.3

73.9%

Method_2

Paragraph_1

1.3

68.7%

Method_1

Paragraph_4

1.3

53.6%

Removing:
- English stop words;
- Programming
language keywords
Using:
- Camel Case
splitting the on
remaining words
- Vector Space Model

th>=0.80
Empirical Study
• Goal: analyze source code descriptions in developer
discussions
• Purpose: investigating how developer discussions
describe methods of Java Source Code
• Quality focus: find good method description in
messages exchanged among contributors/developers
• Context: Bug-report and mailing lists of two Java
Project
 Apache Lucene and Eclipse
Context
Research Questions
 RQ1 (method coverage): How many methods from
the analyzed software systems are described by the
paragraphs identified by the proposed approach?

 RQ2 (precision): How precise is the proposed approach
in identifying method descriptions?

 RQ3 (missing descriptions): How many potentially
good method descriptions are missed by the approach?
RQ1: How many methods from the analyzed software
systems are described by the paragraphs identified
by the proposed approach?
RQ1: How many methods from the analyzed software
systems are described by the paragraphs identified
by the proposed approach?
RQ1: How many methods from the analyzed software
systems are described by the paragraphs identified
by the proposed approach?
RQ2: How precise is the proposed approach in identifying
method descriptions?
We sampled 250 descriptions from each project
RQ2: How precise is the proposed approach in identifying
method descriptions?
We sampled 250 descriptions from each project
RQ2: How precise is the proposed approach in identifying
method descriptions?
We sampled 250 descriptions from each project
RQ3: How many potentially good method descriptions are
missed by the approach?
We sampled 100 descriptions from each project
TABLE III
The analysis of a sample of 100 paragraphs traced to methods,
but not satisfying the Step 4 heuristic

System

True Negatives

False Negatives

Eclipse

78

22

Apache Lucene

67

33
Conclusion
Conclusion
Conclusion
Conclusion

Más contenido relacionado

La actualidad más candente

BENG 108 Final Project
BENG 108 Final ProjectBENG 108 Final Project
BENG 108 Final Project
Jason Trimble
 
Java session04
Java session04Java session04
Java session04
Niit Care
 

La actualidad más candente (14)

GSP 125 Enhance teaching/tutorialrank.com
 GSP 125 Enhance teaching/tutorialrank.com GSP 125 Enhance teaching/tutorialrank.com
GSP 125 Enhance teaching/tutorialrank.com
 
GSP 125 Entire Course NEW
GSP 125 Entire Course NEWGSP 125 Entire Course NEW
GSP 125 Entire Course NEW
 
GSP 125 Education Specialist / snaptutorial.com
  GSP 125 Education Specialist / snaptutorial.com  GSP 125 Education Specialist / snaptutorial.com
GSP 125 Education Specialist / snaptutorial.com
 
GSP 125 Exceptional Education - snaptutorial.com
GSP 125 Exceptional Education - snaptutorial.comGSP 125 Exceptional Education - snaptutorial.com
GSP 125 Exceptional Education - snaptutorial.com
 
Gsp 125 Education Organization -- snaptutorial.com
Gsp 125   Education Organization -- snaptutorial.comGsp 125   Education Organization -- snaptutorial.com
Gsp 125 Education Organization -- snaptutorial.com
 
Gsp 125 Enthusiastic Study / snaptutorial.com
Gsp 125 Enthusiastic Study / snaptutorial.comGsp 125 Enthusiastic Study / snaptutorial.com
Gsp 125 Enthusiastic Study / snaptutorial.com
 
GSP 125 RANK Education for Service--gsp125rank.com
GSP 125 RANK  Education for Service--gsp125rank.comGSP 125 RANK  Education for Service--gsp125rank.com
GSP 125 RANK Education for Service--gsp125rank.com
 
BENG 108 Final Project
BENG 108 Final ProjectBENG 108 Final Project
BENG 108 Final Project
 
Gsp 125 Future Our Mission/newtonhelp.com
Gsp 125 Future Our Mission/newtonhelp.comGsp 125 Future Our Mission/newtonhelp.com
Gsp 125 Future Our Mission/newtonhelp.com
 
C# Unit 2 notes
C# Unit 2 notesC# Unit 2 notes
C# Unit 2 notes
 
Git, Docker, Python Package and Module
Git, Docker, Python Package and ModuleGit, Docker, Python Package and Module
Git, Docker, Python Package and Module
 
Java session04
Java session04Java session04
Java session04
 
ClassesandObjects
ClassesandObjectsClassesandObjects
ClassesandObjects
 
Python reading and writing files
Python reading and writing filesPython reading and writing files
Python reading and writing files
 

Destacado (6)

Mahmoud CV 2015
Mahmoud CV 2015Mahmoud CV 2015
Mahmoud CV 2015
 
(42) viiolencia sexual estudios
(42) viiolencia sexual estudios(42) viiolencia sexual estudios
(42) viiolencia sexual estudios
 
Teletrade
TeletradeTeletrade
Teletrade
 
Ecoexpo
EcoexpoEcoexpo
Ecoexpo
 
Trabajo de investigacion semana 10
Trabajo de investigacion semana 10Trabajo de investigacion semana 10
Trabajo de investigacion semana 10
 
практика Scratch
практика Scratchпрактика Scratch
практика Scratch
 

Similar a 130614 sebastiano panichella - mining source code descriptions from developers communications

Nt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language AnalysisNt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language Analysis
Nicole Gomez
 
Object oriented basics
Object oriented basicsObject oriented basics
Object oriented basics
vamshimahi
 
Java căn bản - Chapter7
Java căn bản - Chapter7Java căn bản - Chapter7
Java căn bản - Chapter7
Vince Vo
 
CSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approachCSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approach
butest
 

Similar a 130614 sebastiano panichella - mining source code descriptions from developers communications (20)

Nt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language AnalysisNt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language Analysis
 
Object oriented basics
Object oriented basicsObject oriented basics
Object oriented basics
 
chapter 5 Objectdesign.ppt
chapter 5 Objectdesign.pptchapter 5 Objectdesign.ppt
chapter 5 Objectdesign.ppt
 
Abap object-oriented-programming-tutorials
Abap object-oriented-programming-tutorialsAbap object-oriented-programming-tutorials
Abap object-oriented-programming-tutorials
 
Java cheat sheet
Java cheat sheet Java cheat sheet
Java cheat sheet
 
Malware analysis
Malware analysisMalware analysis
Malware analysis
 
These questions will be a bit advanced level 2
These questions will be a bit advanced level 2These questions will be a bit advanced level 2
These questions will be a bit advanced level 2
 
Java căn bản - Chapter7
Java căn bản - Chapter7Java căn bản - Chapter7
Java căn bản - Chapter7
 
OOP, Networking, Linux/Unix
OOP, Networking, Linux/UnixOOP, Networking, Linux/Unix
OOP, Networking, Linux/Unix
 
Discover Tasty Query: The library for Scala program analysis
Discover Tasty Query: The library for Scala program analysisDiscover Tasty Query: The library for Scala program analysis
Discover Tasty Query: The library for Scala program analysis
 
Design patterns in Magento
Design patterns in MagentoDesign patterns in Magento
Design patterns in Magento
 
Chapter 7 - Defining Your Own Classes - Part II
Chapter 7 - Defining Your Own Classes - Part IIChapter 7 - Defining Your Own Classes - Part II
Chapter 7 - Defining Your Own Classes - Part II
 
LectureNotes-02-DSA
LectureNotes-02-DSALectureNotes-02-DSA
LectureNotes-02-DSA
 
What's New In Python 2.4
What's New In Python 2.4What's New In Python 2.4
What's New In Python 2.4
 
V_Sound(Visualize Softwar and Understand the Document)
V_Sound(Visualize Softwar  and Understand the Document)V_Sound(Visualize Softwar  and Understand the Document)
V_Sound(Visualize Softwar and Understand the Document)
 
CSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approachCSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approach
 
Deployment
DeploymentDeployment
Deployment
 
Annotation Processing in Android
Annotation Processing in AndroidAnnotation Processing in Android
Annotation Processing in Android
 
The smartpath information systems c plus plus
The smartpath information systems  c plus plusThe smartpath information systems  c plus plus
The smartpath information systems c plus plus
 
JDD2015: ClassIndex - szybka alternatywa dla skanowania klas - Sławek Piotrowski
JDD2015: ClassIndex - szybka alternatywa dla skanowania klas - Sławek PiotrowskiJDD2015: ClassIndex - szybka alternatywa dla skanowania klas - Sławek Piotrowski
JDD2015: ClassIndex - szybka alternatywa dla skanowania klas - Sławek Piotrowski
 

Más de Ptidej Team

Más de Ptidej Team (20)

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software Miniaturisation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel Abdellatif
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh Kermansaravi
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna Abidi
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano Politowski
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisis
 
MIPA
MIPAMIPA
MIPA
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.ppt
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.ppt
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.ppt
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 

Último

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

130614 sebastiano panichella - mining source code descriptions from developers communications

  • 1. Mining Source Code Descriptions from Developer Communications Sebastiano Panichella Jairo Aponte Massimiliano Di Penta Andrian Marcus Gerardo Canfora
  • 2. Context: Software Project Sequence diagram Documentation Source Code Class diagram Developer Program Comprehension Maintenance Tasks
  • 3. Context: Software Project Sequence diagram Documentation Source Code Class diagram Difficult understanding Developer Program Comprehension Maintenance Tasks
  • 4. Context: Software Project describes Sequence diagram Documentation Source Code Class diagram Difficult understanding understanding Developer Program Comprehension Maintenance Tasks
  • 5. Context: Software Project Coming back to the reality... Program Comprehension Source Code Maintenance Tasks Difficult understanding Developer
  • 6. Idea We argue that messages exchanged among contributors/developers are a useful source of information to help understanding source code. In such situations developers need to infer knowledge from, the source code itself Developer source code descriptions in external artifacts.
  • 7. .................................................. When call the method IndexSplitter.split(File destDir, String[] segs) from the Lucene cotrib directory(contrib/misc/src/java/org/apache/luc ene/index) it creates an index with segments descriptor file with among contributors/developers We argue that messages exchanged wrong data. Namely wrong are a useful source of information to help understanding of segment is the number representing the name source code. that would be created next in this index. Idea .................................................. In such situations developers need to infer knowledge from, CLASS: IndexSplitter the source code itself Developer METHOD: split source code descriptions in external artifacts.
  • 8. A Five Step-Approach for Mining Method Descriptions Developer
  • 9. Step 1: Downloading emails/bugs reports and tracing them onto classes Two heuristics Developer Discussion The discussion contains a fully-qualified class name (e.g., org.apache.lucene.analysis.MappingCharFilter); or the email contains a file name (e.g., MappingCharFilter.java) For bug reports, we complement the above heuristic by matching the bug ID of each closed bug to the commit notes, therefore tracing the bug report to the files changed in that commit When call the method IndexSplitter .split(File destDir, String[] segs) from the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment.
  • 10. Step 1: Downloading emails/bugs reports and tracing them onto classes Two heuristics Developer Discussion The discussion contains a fully-qualified class name (e.g., org.apache.lucene.analysis.MappingCharFilter); or the email contains a file name (e.g., MappingCharFilter.java) For bug reports, we complement the above heuristic by matching the bug ID of each closed bug to the commit notes, therefore tracing the bug report to the files changed in that commit When call the method IndexSplitter .split(File destDir, String[] segs) from the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. CLASS: IndexSplitter public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment.
  • 11. Step 2: Extracting paragraphs We consider as paragraphs, text section separated by one or more white lines Two heuristics We prune out paragraph description from source code fragments and/or stack Traces "by using an approach inspired by the work of Bacchelli et al. Developer Discussion When call the method IndexSplitter.split(File destDir, String[] segs) from the PAR Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates 1 an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } PAR 2 If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment. PAR 3
  • 12. Step 2: Extracting paragraphs We consider as paragraphs, text section separated by one or more white lines Two heuristics We prune out paragraph description from source code fragments and/or stack Traces "by using an approach inspired by the work of Bacchelli et al. Developer Discussion When call the method IndexSplitter.split(File destDir, String[] segs) from the PAR Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates 1 an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } PAR 2 If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment. PAR 3
  • 13. Step 3: Tracing paragraphs onto methods A) A valid paragraph must contain the keyword “method” These paragraphs must respect the following two conditions: Developer Discussion A) B) and the method name must be followed by a open parenthesis— i.e., we match “foo(” B) When call the method IndexSplitter.split(File destDir, String[] segs) PAR from the Lucene cotrib directory it creates an index with segments1 descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. CLASS: IndexSplitter METHOD: split( ...................................................................................... ...................................................................................... ...................................................................................... ......................................................................................
  • 14. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score: .......................... Problem seems to come from MainMethodeSearchEngine in org.eclipse.jdt.internal.ui.launcher The Method searchMainMethods(IProgressMonitor, IJavaSearchScope, boolean) ,there's a call to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is ON. This method add all types subtypes as soon as the given scope encloses them without testing if sub-types have a main! After return IType[] before the excecution .......................... CLASS: MainMethodSearchEngine METHOD: serachMainMethods SCORE
  • 15. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score: .......................... Problem seems to come from MainMethodeSearchEngine in org.eclipse.jdt.internal.ui.launcher The Method searchMainMethods(IProgressMonitor, IJavaSearchScope, boolean) ,there's a call to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is ON. This method add all types subtypes as soon as the given scope encloses them without testing if sub-types have a main! After return IType[] before the excecution .......................... CLASS: MainMethodSearchEngine METHOD: serachMainMethods % parameter = 100% -> s1= 1 SCORE a) Method parameters: % of parameters s1= mentioned in the paragraphs. Value between 0 and 1 1 if the method does not have parameters
  • 16. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score: .......................... Problem seems to come from MainMethodeSearchEngine in org.eclipse.jdt.internal.ui.launcher The Method searchMainMethods(IProgressMonitor, IJavaSearchScope, boolean) ,there's a call to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is ON. This method add all types subtypes as soon as the given scope encloses them without testing if sub-types have a main! After return IType[] before the excecution .......................... CLASS: MainMethodSearchEngine METHOD: serachMainMethods % parameter = 100% -> s1= 1 SCORE = 1+ a) Method parameters: % of parameters s1= mentioned in the paragraphs. Value between 0 and 1 1 if the method does not have parameters b) Syntactic descriptions (mentioning return values): check whether the Equal to one if paragraph contains the the method is s2= keyword “return”. If YES void. Value equal 1, 0 otherwise
  • 17. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score: .......................... Problem seems to come from MainMethodeSearchEngine in org.eclipse.jdt.internal.ui.launcher The Method searchMainMethods(IProgressMonitor, IJavaSearchScope, boolean) ,there's a call to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is ON. This method add all types subtypes as soon as the given scope encloses them without testing if sub-types have a main! After return IType[] before the excecution .......................... CLASS: MainMethodSearchEngine METHOD: serachMainMethods % parameter = 100% -> s1= 1 SCORE = 1+ 0+ 1 = 2 a) Method parameters: % of parameters s1= mentioned in the paragraphs. Value between 0 and 1 1 if the method does not have parameters b) Syntactic descriptions (mentioning return values): check whether the Equal to one if paragraph contains the the method is s2= keyword “return”. If YES void. Value equal 1, 0 otherwise c) Overriding/Overloading: 1 if any of the “overload” or s3=“override” keywords appears in the paragraph, 0 otherwise d) Method invocations: 1 if any of the “call” or s4=“excecute” keywords appears in the paragraph, 0 otherwise
  • 18. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score: We selected paragraphs that have: 1. s1 ≥ thP = 0.5 2. s2 + s3 + s4 ≥ thH = 1 a) Method parameters: % of parameters s1= mentioned in the paragraphs. Value between 0 and 1 b) Syntactic descriptions (mentioning return values): check whether the Equal to one if paragraph contains the the method is s2= keyword “return”. If YES void. Value equal 1, 0 otherwise OK % parameter = 100% -> s1= 1 ≥ 0.5 SCORE = 1+ 0+ 1 = 2 ≥ 1 1 if the method does not have parameters c) Overriding/Overloading: 1 if any of the “overload” or s3=“override” keywords appears in the paragraph, 0 otherwise d) Method invocations: 1 if any of the “call” or s4=“execute” keywords appears in the paragraph, 0 otherwise
  • 19. Step 5: Similarity based Filtering We rank filtered paragraphs through their textual similarity with the method they are likely describing. METHOD PARAGRAPH SCORE Similarity Method_3 Paragraph_4 2.5 96.1% Method_1 Paragraph_1 2.5 95.6% Method_2 Paragraph_2 1.5 97.4% Method_3 Paragraph_3 1.5 86.2% Method_1 Paragraph_3 1.5 79.0% Method_3 Paragraph_2 1.5 77.5% Method_2 Paragraph_4 1.5 64.3% Method_2 Paragraph_3 1.3 83.2% Method_3 Paragraph_1 1.3 73.9% Method_2 Paragraph_1 1.3 68.7% Method_1 Paragraph_4 1.3 53.6% Removing: - English stop words; - Programming language keywords Using: - Camel Case splitting the on remaining words - Vector Space Model
  • 20. Step 5: Similarity based Filtering We rank filtered paragraphs through their textual similarity with the method they are likely describing. METHOD PARAGRAPH SCORE Similarity Method_3 Paragraph_4 2.5 96.1% Method_1 Paragraph_1 2.5 95.6% Method_2 Paragraph_2 1.5 97.4% Method_3 Paragraph_3 1.5 86.2% Method_1 Paragraph_3 1.5 79.0% Method_3 Paragraph_2 1.5 77.5% Method_2 Paragraph_4 1.5 64.3% Method_2 Paragraph_3 1.3 83.2% Method_3 Paragraph_1 1.3 73.9% Method_2 Paragraph_1 1.3 68.7% Method_1 Paragraph_4 1.3 53.6% Removing: - English stop words; - Programming language keywords Using: - Camel Case splitting the on remaining words - Vector Space Model th>=0.80
  • 21. Empirical Study • Goal: analyze source code descriptions in developer discussions • Purpose: investigating how developer discussions describe methods of Java Source Code • Quality focus: find good method description in messages exchanged among contributors/developers • Context: Bug-report and mailing lists of two Java Project  Apache Lucene and Eclipse
  • 23. Research Questions  RQ1 (method coverage): How many methods from the analyzed software systems are described by the paragraphs identified by the proposed approach?  RQ2 (precision): How precise is the proposed approach in identifying method descriptions?  RQ3 (missing descriptions): How many potentially good method descriptions are missed by the approach?
  • 24. RQ1: How many methods from the analyzed software systems are described by the paragraphs identified by the proposed approach?
  • 25. RQ1: How many methods from the analyzed software systems are described by the paragraphs identified by the proposed approach?
  • 26. RQ1: How many methods from the analyzed software systems are described by the paragraphs identified by the proposed approach?
  • 27. RQ2: How precise is the proposed approach in identifying method descriptions? We sampled 250 descriptions from each project
  • 28. RQ2: How precise is the proposed approach in identifying method descriptions? We sampled 250 descriptions from each project
  • 29. RQ2: How precise is the proposed approach in identifying method descriptions? We sampled 250 descriptions from each project
  • 30. RQ3: How many potentially good method descriptions are missed by the approach? We sampled 100 descriptions from each project TABLE III The analysis of a sample of 100 paragraphs traced to methods, but not satisfying the Step 4 heuristic System True Negatives False Negatives Eclipse 78 22 Apache Lucene 67 33