SlideShare una empresa de Scribd logo
1 de 114
Descargar para leer sin conexión
logotype of the University
of Luxembourg
1 Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg
2 Department of Software Engineering, Chonbuk National University, South Korea
Kui Liu1, Dongsun Kim1, Tegawendé F. Bissyandé1, Taeyoung Kim2,
Kisub Kim1, Anil Koyuncu1, Suntae Kim2, Yves Le Traon1
Learning to Spot and Refactor
Inconsistent Method Names
29th May 2019
2
Programming with Libraries
2
Programming with Libraries
LibraryA.java
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
Developers often
do not check the inside
of the method.
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
Developers often
do not check the inside
of the method.
2
Programming with Libraries
LibraryA.java
getItem()
setObject(…)
…()
Using a method relies on
its name (+ API document).
Developers often
do not check the inside
of the method.
3
A Method can Disguise
3
A Method can Disguise
getPokemon( … )
3
A Method can Disguise
getPokemon( … )
What I expect
3
A Method can Disguise
getPokemon( … )
getPokemonRealMonster( … )
What I expect
3
A Method can Disguise
getPokemon( … )
getPokemonRealMonster( … )
What I expect
What I actually get
3
A Method can Disguise
getPokemon( … )
getPokemonRealMonster( … )
What I expect
What I actually get
4
Making the name consistent is NOT easy
https://www.itworld.com/article/2833265/
don-t-go-into-programming-if-you-don-t-
have-a-good-thesaurus.html
Naming Things
49%
4
Making the name consistent is NOT easy
https://www.itworld.com/article/2833265/
don-t-go-into-programming-if-you-don-t-
have-a-good-thesaurus.html
Naming Things
49%
5
Consequence of inconsistent names
There are 5K+ questions on naming issues in
stackoverflow.com.
6
Naming bugs are common
We found 183K+ commits addressing naming issues from
GitHub.com by a quick search with simple queries such as
“inconsistent, consistency, misleading, …”
7
Our Goals
Detect inconsistent
method names.
Repair the names to be
consistent.
7
Our Goals
Detect inconsistent
method names.
Repair the names to be
consistent.
7
Our Goals
Detect inconsistent
method names.
Repair the names to be
consistent.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
8
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
9
Idea
Similar implementations would have similar names.
10
Idea
Similar implementations, but different names.
10
Idea
Similar implementations, but different names.
Approach
11
Approach
11
How to find similar
names/implementations?
12
Sim( , ) = ?
13
14
14
What we need!
15
Autoencoder
16
Method Method
Autoencoder
P P
17
Program Vectors
P P
17
Program Vectors
P P
17
Program Vectors
Program
Encoder
P P
17
Program Vectors
Program
Encoder
M1
M2
M3
M4
P P
17
Program Vectors
Program
Encoder
M1
M2
M3
M4
P P
17
Program Vectors
Program
Encoder
M1
M2
M3
M4
P P
17
Program Vectors
Program
Encoder
M1
M2
M3
M4
<9, 2, 3, …>
<7, 1, 6, …>
<2, 8, 3, …>
<0, 1, 8, …>
18
Method = Name + Body
getID(…)
{
for(…)
{
for(…)
{
for(…)
…
Similar
Names
Similar
Bodies
N
B
19
Method Name Embedding
findField
findMatchesHelper
containsTarget
containsField
findInstruction1
find, Field
find, Matches, Helper
contains, Target
contains, Field
find, Instruction1
Tokenized Names
(camel case, underscore)
Method
Names
Embedded
Vectors
Sentence2vec
(PV-DM)
20
return (String[]) list.toArray(new String[0]);
Method Body Embedding
Preprocessing: Program Serialization
Method Body:
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
21
return (String[]) list.toArray(new String[0]);
Method Body:
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
...
Method Body Embedding
Preprocessing: Program Serialization
22
return (String[]) list.toArray(new String[0]);
Method Body:
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
Method Body Embedding
Preprocessing: Program Serialization
23
Vectorization
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
Method Body Embedding
24
Vectorization
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
Method Body Embedding
24
Vectorization
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
Token Embedding
(Word2Vec)
Method Body Embedding
24
Vectorization
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
Token Embedding
(Word2Vec)
<2, 6, 9, …> <8, 4, 1, …> <9, 0, 7, …> <2, 3, 0, …> … <7, 1, 2, …> …
Method Body Embedding
24
Vectorization
[ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray,
ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”]
Serialized AST:
Token Embedding
(Word2Vec)
<2, 6, 9, …> <8, 4, 1, …> <9, 0, 7, …> <2, 3, 0, …> … <7, 1, 2, …> …
Method Body Embedding
25
Encoding (CNN-based)
icle has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.288495
Transactions on Software Engineering
7
…
…
n ! k (a two-dimensional
numeric vector)
Input layer
C1: 4 feature maps
S1: 4 feature maps
C2: 6 feature maps
S2: 6 feature maps
Convolutional layer
Convolutional
layerSubsampling layer
Subsampling
layer Fully connected layers
Output
layer
Dense layer
Output is extracted
features
ReturnStatement
return
ArrayType
String[]
Variable
listVar
Method
toArray
ArrayCreation
new
ArrayType
String[]
NumberLiteral
“0”
0 0 0 0 0 00
0 0 0 0 0 00
Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is
2 6 9 ...
...
...
...
...
...
...
...
...
...
...
...
...
...
8 4 1
9 0 7
2 3 0
7 1 2
... ......
26
Encoding (CNN-based)
icle has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.288495
Transactions on Software Engineering
7
…
…
n ! k (a two-dimensional
numeric vector)
Input layer
C1: 4 feature maps
S1: 4 feature maps
C2: 6 feature maps
S2: 6 feature maps
Convolutional layer
Convolutional
layerSubsampling layer
Subsampling
layer Fully connected layers
Output
layer
Dense layer
Output is extracted
features
ReturnStatement
return
ArrayType
String[]
Variable
listVar
Method
toArray
ArrayCreation
new
ArrayType
String[]
NumberLiteral
“0”
0 0 0 0 0 00
0 0 0 0 0 00
Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is
2 6 9 ...
...
...
...
...
...
...
...
...
...
...
...
...
...
8 4 1
9 0 7
2 3 0
7 1 2
... ......
26
Encoding (CNN-based)
icle has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.288495
Transactions on Software Engineering
7
…
…
n ! k (a two-dimensional
numeric vector)
Input layer
C1: 4 feature maps
S1: 4 feature maps
C2: 6 feature maps
S2: 6 feature maps
Convolutional layer
Convolutional
layerSubsampling layer
Subsampling
layer Fully connected layers
Output
layer
Dense layer
Output is extracted
features
ReturnStatement
return
ArrayType
String[]
Variable
listVar
Method
toArray
ArrayCreation
new
ArrayType
String[]
NumberLiteral
“0”
0 0 0 0 0 00
0 0 0 0 0 00
Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is
2 6 9 ...
...
...
...
...
...
...
...
...
...
...
...
...
...
8 4 1
9 0 7
2 3 0
7 1 2
... ......
27
Inconsistency detection
N
B
findField
{
for(…)
{
…
28
Inconsistency detection
N
B
findField
{
for(…)
{
…
Adjacent
Methods
=
29
Inconsistency detection
N
B
findField
{
for(…)
{
…
30
Inconsistency detection
=
True
False
The method is likely to have
a consistent name.
The method name could be
inconsistent with the implementation.
Suggest a new name.→
31
Name Suggestion
32
Name Suggestion
33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care identical names.
33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care identical names.
R2: Group identical names first,
sort groups based on size.
33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care identical names.
R2: Group identical names first,
sort groups based on size.
R3: Group identical names first,
sort groups based on avg. distance.
33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care identical names.
R2: Group identical names first,
sort groups based on size.
R3: Group identical names first,
sort groups based on avg. distance.
R4: Same with R3, but penalize groups
with size=1.
34
Evaluation
35
Research Questions
RQ1: Inconsistency Identification
RQ2: Suggestion Precision
RQ3: Comparative Study
RQ4: Live Study
35
Research Questions
RQ1: Inconsistency Identification
RQ2: Suggestion Precision
RQ3: Comparative Study
RQ4: Live Study
}Training/testing data from
open-source projects.
35
Research Questions
RQ1: Inconsistency Identification
RQ2: Suggestion Precision
RQ3: Comparative Study
RQ4: Live Study
}Training/testing data from
open-source projects.
Comparing with an approach* with
based on a convolutional attention network.
→
[*] M. Allamanis, H. Peng, and C. Sutton, “A convolutional attention network for extreme summarization of source
code,” in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 2091–2100.
35
Research Questions
RQ1: Inconsistency Identification
RQ2: Suggestion Precision
RQ3: Comparative Study
RQ4: Live Study
}Training/testing data from
open-source projects.
Comparing with an approach* with
based on a convolutional attention network.
→
Submitting our suggestion results as pull-requests
to open-source projects.
→
[*] M. Allamanis, H. Peng, and C. Sutton, “A convolutional attention network for extreme summarization of source
code,” in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 2091–2100.
36
Training/Testing Set
Total: 430 projects
36
Training/Testing Set
Total: 430 projects
36
Training/Testing Set
Total: 430 projects
36
Training/Testing Set
Total: 430 projects
36
Training/Testing Set
Total: 430 projects
Training Data
2,116,413 methods
37
Training/Testing Set
37
Training/Testing Set
37
Training/Testing Set
→
37
Training/Testing Set
→
37
Training/Testing Set
→
Testing
2,805 methods
(name pairs + bodies)
38
RQ1: Inconsistency Identification
# of neighbors to look up k=1 5 10 30
Inconsistent
(%)
Precision 56.8 53.7 53.3 49.9
Recall 84.5 55.9 46.7 28.8
F1 67.9 54.8 49.7 36.5
Consistent
(%)
Precision 72.0 55.9 54.2 51.4
Recall 38.2 53.7 60.7 72.2
F1 49.9 54.8 57.3 60.0
Accuracy (%) 60.9 54.8 53.8 50.9
→
38
RQ1: Inconsistency Identification
# of neighbors to look up k=1 5 10 30
Inconsistent
(%)
Precision 56.8 53.7 53.3 49.9
Recall 84.5 55.9 46.7 28.8
F1 67.9 54.8 49.7 36.5
Consistent
(%)
Precision 72.0 55.9 54.2 51.4
Recall 38.2 53.7 60.7 72.2
F1 49.9 54.8 57.3 60.0
Accuracy (%) 60.9 54.8 53.8 50.9
→
39
RQ2: Name Suggestion
Accuracy (%)
k=thr k=10
R1 R2 R3 R4
First Token
thr=1 23.4 23.2 23.0 24.1
thr=5 35.7 39.4 39.4 39.7
Full Name
thr=1 10.7 11.0 10.9 10.9
thr=5 17.0 18.7 19.0 19.2
39
RQ2: Name Suggestion
Accuracy (%)
k=thr k=10
R1 R2 R3 R4
First Token
thr=1 23.4 23.2 23.0 24.1
thr=5 35.7 39.4 39.4 39.7
Full Name
thr=1 10.7 11.0 10.9 10.9
thr=5 17.0 18.7 19.0 19.2
40
RQ3: Comparison — Name Suggestion
Accuracy (%)
First Token Full Name
thr=1 thr=5 thr=1 thr=5
R1 36.4 47.2 16.5 22.9
R2 34.8 50.2 17.0 25.4
R3 34.7 50.3 16.9 25.5
R4 35.4 50.5 16.0 25.7
conv_attention 22.3 33.6 0.3 0.6
copy_attention 23.5 44.7 0.4 1.1
state-of-the-art
}
40
RQ3: Comparison — Name Suggestion
Accuracy (%)
First Token Full Name
thr=1 thr=5 thr=1 thr=5
R1 36.4 47.2 16.5 22.9
R2 34.8 50.2 17.0 25.4
R3 34.7 50.3 16.9 25.5
R4 35.4 50.5 16.0 25.7
conv_attention 22.3 33.6 0.3 0.6
copy_attention 23.5 44.7 0.4 1.1
state-of-the-art
}
41
Training Data
RQ4: Live Study — Setup
41
Training Data
10%
RQ4: Live Study — Setup
41
Training Data
10%
RQ4: Live Study — Setup
42
Training Data
RQ4: Live Study — Setup
10%
42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
RQ4: Live Study — Setup
10%
42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
RQ4: Live Study — Setup
10%
42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
Create a pull request
RQ4: Live Study — Setup
10%
42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
Create a pull request
RQ4: Live Study — Setup
10%
42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
Create a pull request
Ask a maintainer to refactor
the method names
RQ4: Live Study — Setup
10%
43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
Half of them are public methods.
43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
Half of them are public methods.
Developer feedback includes
43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
Half of them are public methods.
Developer feedback includes
* It should follow project-specific naming conventions.
43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
Half of them are public methods.
Developer feedback includes
* It should follow project-specific naming conventions.
* Some method names should consider class names.
e.g., In “XXXBuilder”, many methods cannot be named as “build()”
even though they return “XXXBuilder” objects.
44
Summary
X
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
Half of them are public methods.
* It should follow project-specific naming conventions.
* Some method names should consider class names.
e.g., In “XXXBuilder”, many methods cannot be named as “build()”
even though they return “XXXBuilder” objects.
X
RQ3: Comparison
Accuracy (%)
First Token Full Name
thr=1 thr=5 thr=1 thr=5
R1 36.4 47.2 16.5 22.9
R2 34.8 50.2 17.0 25.4
R3 34.7 50.3 16.9 25.5
R4 35.4 50.5 16.0 25.7
conv_attention 22.3 33.6 0.3 0.6
copy_attention 23.5 44.7 0.4 1.1
X
Encoding (CNN-based)
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.2884955, IEEE
Transactions on Software Engineering
7
…
…
n ! k (a two-dimensional
numeric vector)
Input layer
C1: 4 feature maps
S1: 4 feature maps
C2: 6 feature maps
S2: 6 feature maps
Convolutional layer
Convolutional
layerSubsampling layer
Subsampling
layer Fully connected layers
Output
layer
Dense layer
Output is extracted
features
ReturnStatement
return
ArrayType
String[]
Variable
listVar
Method
toArray
ArrayCreation
new
ArrayType
String[]
NumberLiteral
“0”
0 0 0 0 0 00
0 0 0 0 0 00
Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is
the first subsampling layer, and S2 is the second one. The output of dense layer is considered as extracted features of code
fragments and will be used to do clustering.
2.4.4 Code Patterns Mining
Although violations can be parsed and converted into two-
dimensional numeric vectors, it is still challenging to mine
code patterns given that noisy information (e.g., specific
meaningless identifiers) can interfere with identifying sim-
ilar violations. Deep learning has recently been shown
promising in various software engineering tasks [18], [47],
[49]. In particular, it offers a major advantage of requiring
less prior knowledge and human effort in feature design for
machine learning applications. Consequently, our method is
designed to deeply learn discriminating features for mining
code patterns of violations. We leverage CNNs to perform
deep learning of violation features with embedded viola-
tions, and also use X-means clustering algorithm to cluster
violations with learned features.
Feature learning with CNNs
Figure 8 shows the CNNs architecture for learning violation
features. The input is two-dimensional numeric vectors
of preprocessed violations. The alternating local-connected
convolutional and subsampling layers are used to capture
the local features of violations. The dense layer compresses
all local features captured by former layers. We select the
output of the dense layer as the learned violation features
to cluster violations. Note that our approach uses CNNs to
of violations from clustered similar code fragments of viola-
tions to show patterns clearly. Note that, the whole process
of mining patterns is automated.
2.5 Mining Common Fix Patterns
Our goal in this step is to summarize how a violation
is resolved by developers. To achieve this goal, we col-
lect violation fixing changes and proceed to identify their
common fix patterns. The approach of mining common fix
patterns is similar to that of mining common code patterns.
The differences lie in the data collection and tokenization
process. Before describing our approach of mining common
fix patterns, we formalize the definitions of patch and fix
pattern.
2.5.1 Preliminaries
A patch represents a modification carried on a program
source code to repair the program which was brought to
an erroneous state at runtime. A patch thus captures some
knowledge on modification behavior, and similar patches
may be associated with similar behavioral changes.
Definition 4. Patch (P): A patch is a pair of source code
fragments, one representing a buggy version and another
as its updated (i.e., bug-fixing) version. In the traditional
GNU diff representation of patches, the buggy version is
2 6 9 ...
...
...
...
...
...
...
...
...
...
...
...
...
...
8 4 1
9 0 7
2 3 0
7 1 2
... ......
X
Making the name consistent is NOT easy
https://www.itworld.com/article/2833265/
don-t-go-into-programming-if-you-don-t-
have-a-good-thesaurus.html
45
https://github.com/SerVal-DTF/debug-method-name
Tool and Data
46
https://www.darkrsw.net http://wwwen.uni.lu/snt/
research/serval
Hire me! Université du Luxembourg
1.1 - logotype of the University
of Luxembourg
The logotype may not be altered under any
circumstances.
It is to be used like this for all communication mediums.
Université du Luxembourg © 03/2013
3.1 - the Interdisciplinary Centre for
Security Reliability and Trust
The SnT uses its own logo. It is used on all external
communication tools in combination with the UL logo.
Design guidelines are available at SnT.
Hiring

Más contenido relacionado

La actualidad más candente

AVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis Violations
AVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis ViolationsAVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis Violations
AVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis ViolationsDongsun Kim
 
Test final jav_aaa
Test final jav_aaaTest final jav_aaa
Test final jav_aaaBagusBudi11
 
Headache from using mathematical software
Headache from using mathematical softwareHeadache from using mathematical software
Headache from using mathematical softwarePVS-Studio
 
Looking for Bugs in MonoDevelop
Looking for Bugs in MonoDevelopLooking for Bugs in MonoDevelop
Looking for Bugs in MonoDevelopPVS-Studio
 
Code Analysis-run time error prediction
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error predictionNIKHIL NAWATHE
 
C, C++ Interview Questions Part - 1
C, C++ Interview Questions Part - 1C, C++ Interview Questions Part - 1
C, C++ Interview Questions Part - 1ReKruiTIn.com
 
Java level 1 Quizzes
Java level 1 QuizzesJava level 1 Quizzes
Java level 1 QuizzesSteven Luo
 
Cppcheck and PVS-Studio compared
Cppcheck and PVS-Studio comparedCppcheck and PVS-Studio compared
Cppcheck and PVS-Studio comparedPVS-Studio
 
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDTEclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDTElena Laskavaia
 
The First C# Project Analyzed
The First C# Project AnalyzedThe First C# Project Analyzed
The First C# Project AnalyzedPVS-Studio
 
Binary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programmingBinary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programmingnong_dan
 
Static analysis works for mission-critical systems, why not yours?
Static analysis works for mission-critical systems, why not yours? Static analysis works for mission-critical systems, why not yours?
Static analysis works for mission-critical systems, why not yours? Rogue Wave Software
 
How to Profit from Static Analysis
How to Profit from Static AnalysisHow to Profit from Static Analysis
How to Profit from Static AnalysisElena Laskavaia
 
SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
SherLog: Error Diagnosis by Connecting Clues from Run-time LogsSherLog: Error Diagnosis by Connecting Clues from Run-time Logs
SherLog: Error Diagnosis by Connecting Clues from Run-time LogsDacong (Tony) Yan
 
Code Analysis and Refactoring with CDT
Code Analysis and Refactoring with CDTCode Analysis and Refactoring with CDT
Code Analysis and Refactoring with CDTdschaefer
 

La actualidad más candente (20)

AVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis Violations
AVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis ViolationsAVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis Violations
AVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis Violations
 
Java Quiz
Java QuizJava Quiz
Java Quiz
 
Test final jav_aaa
Test final jav_aaaTest final jav_aaa
Test final jav_aaa
 
Headache from using mathematical software
Headache from using mathematical softwareHeadache from using mathematical software
Headache from using mathematical software
 
Looking for Bugs in MonoDevelop
Looking for Bugs in MonoDevelopLooking for Bugs in MonoDevelop
Looking for Bugs in MonoDevelop
 
Code Analysis-run time error prediction
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error prediction
 
C, C++ Interview Questions Part - 1
C, C++ Interview Questions Part - 1C, C++ Interview Questions Part - 1
C, C++ Interview Questions Part - 1
 
Java level 1 Quizzes
Java level 1 QuizzesJava level 1 Quizzes
Java level 1 Quizzes
 
c++ lab manual
c++ lab manualc++ lab manual
c++ lab manual
 
Cppcheck and PVS-Studio compared
Cppcheck and PVS-Studio comparedCppcheck and PVS-Studio compared
Cppcheck and PVS-Studio compared
 
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDTEclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT
Eclipse Con 2015: Codan - a C/C++ Code Analysis Framework for CDT
 
The First C# Project Analyzed
The First C# Project AnalyzedThe First C# Project Analyzed
The First C# Project Analyzed
 
Binary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programmingBinary code obfuscation through c++ template meta programming
Binary code obfuscation through c++ template meta programming
 
Static analysis works for mission-critical systems, why not yours?
Static analysis works for mission-critical systems, why not yours? Static analysis works for mission-critical systems, why not yours?
Static analysis works for mission-critical systems, why not yours?
 
Repair dagstuhl jan2017
Repair dagstuhl jan2017Repair dagstuhl jan2017
Repair dagstuhl jan2017
 
Assessing Product Line Derivation Operators Applied to Java Source Code: An E...
Assessing Product Line Derivation Operators Applied to Java Source Code: An E...Assessing Product Line Derivation Operators Applied to Java Source Code: An E...
Assessing Product Line Derivation Operators Applied to Java Source Code: An E...
 
How to Profit from Static Analysis
How to Profit from Static AnalysisHow to Profit from Static Analysis
How to Profit from Static Analysis
 
SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
SherLog: Error Diagnosis by Connecting Clues from Run-time LogsSherLog: Error Diagnosis by Connecting Clues from Run-time Logs
SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
 
Abhik-Satish-dagstuhl
Abhik-Satish-dagstuhlAbhik-Satish-dagstuhl
Abhik-Satish-dagstuhl
 
Code Analysis and Refactoring with CDT
Code Analysis and Refactoring with CDTCode Analysis and Refactoring with CDT
Code Analysis and Refactoring with CDT
 

Similar a Learning to Spot and Refactor Inconsistent Method Names

Designing Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.ProtoDesigning Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.ProtoJoel Falcou
 
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...Anne Nicolas
 
Compiler Construction | Lecture 14 | Interpreters
Compiler Construction | Lecture 14 | InterpretersCompiler Construction | Lecture 14 | Interpreters
Compiler Construction | Lecture 14 | InterpretersEelco Visser
 
Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflowKeon Kim
 
Scala lens: An introduction
Scala lens: An introductionScala lens: An introduction
Scala lens: An introductionKnoldus Inc.
 
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...IRJET Journal
 
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...IRJET Journal
 
Finding Resource Manipulation Bugs in Linux Code
Finding Resource Manipulation Bugs in Linux CodeFinding Resource Manipulation Bugs in Linux Code
Finding Resource Manipulation Bugs in Linux CodeAndrzej Wasowski
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsWee Hyong Tok
 
OpenSees: Future Directions
OpenSees: Future DirectionsOpenSees: Future Directions
OpenSees: Future Directionsopenseesdays
 
ASE2023_SCPatcher_Presentation_V5.pptx
ASE2023_SCPatcher_Presentation_V5.pptxASE2023_SCPatcher_Presentation_V5.pptx
ASE2023_SCPatcher_Presentation_V5.pptxjzyNick
 
Madeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable HardwareMadeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable HardwareESUG
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureMasud Rahman
 
Language Integrated Query - LINQ
Language Integrated Query - LINQLanguage Integrated Query - LINQ
Language Integrated Query - LINQDoncho Minkov
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model佳蓉 倪
 
11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature mapsAlexander Decker
 
Terence Barr - jdk7+8 - 24mai2011
Terence Barr - jdk7+8 - 24mai2011Terence Barr - jdk7+8 - 24mai2011
Terence Barr - jdk7+8 - 24mai2011Agora Group
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELJoel Falcou
 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DMithun Hunsur
 
EVIL: Exploiting Software via Natural Language
EVIL: Exploiting Software via Natural LanguageEVIL: Exploiting Software via Natural Language
EVIL: Exploiting Software via Natural LanguagePietro Liguori
 

Similar a Learning to Spot and Refactor Inconsistent Method Names (20)

Designing Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.ProtoDesigning Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.Proto
 
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
 
Compiler Construction | Lecture 14 | Interpreters
Compiler Construction | Lecture 14 | InterpretersCompiler Construction | Lecture 14 | Interpreters
Compiler Construction | Lecture 14 | Interpreters
 
Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflow
 
Scala lens: An introduction
Scala lens: An introductionScala lens: An introduction
Scala lens: An introduction
 
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
 
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
 
Finding Resource Manipulation Bugs in Linux Code
Finding Resource Manipulation Bugs in Linux CodeFinding Resource Manipulation Bugs in Linux Code
Finding Resource Manipulation Bugs in Linux Code
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
 
OpenSees: Future Directions
OpenSees: Future DirectionsOpenSees: Future Directions
OpenSees: Future Directions
 
ASE2023_SCPatcher_Presentation_V5.pptx
ASE2023_SCPatcher_Presentation_V5.pptxASE2023_SCPatcher_Presentation_V5.pptx
ASE2023_SCPatcher_Presentation_V5.pptx
 
Madeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable HardwareMadeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable Hardware
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
Language Integrated Query - LINQ
Language Integrated Query - LINQLanguage Integrated Query - LINQ
Language Integrated Query - LINQ
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps
 
Terence Barr - jdk7+8 - 24mai2011
Terence Barr - jdk7+8 - 24mai2011Terence Barr - jdk7+8 - 24mai2011
Terence Barr - jdk7+8 - 24mai2011
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in D
 
EVIL: Exploiting Software via Natural Language
EVIL: Exploiting Software via Natural LanguageEVIL: Exploiting Software via Natural Language
EVIL: Exploiting Software via Natural Language
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Learning to Spot and Refactor Inconsistent Method Names

  • 1. logotype of the University of Luxembourg 1 Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg 2 Department of Software Engineering, Chonbuk National University, South Korea Kui Liu1, Dongsun Kim1, Tegawendé F. Bissyandé1, Taeyoung Kim2, Kisub Kim1, Anil Koyuncu1, Suntae Kim2, Yves Le Traon1 Learning to Spot and Refactor Inconsistent Method Names 29th May 2019
  • 9. 2 Programming with Libraries LibraryA.java getItem() setObject(…) …() Using a method relies on its name (+ API document). Developers often do not check the inside of the method.
  • 10. 3 A Method can Disguise
  • 11. 3 A Method can Disguise getPokemon( … )
  • 12. 3 A Method can Disguise getPokemon( … ) What I expect
  • 13. 3 A Method can Disguise getPokemon( … ) getPokemonRealMonster( … ) What I expect
  • 14. 3 A Method can Disguise getPokemon( … ) getPokemonRealMonster( … ) What I expect What I actually get
  • 15. 3 A Method can Disguise getPokemon( … ) getPokemonRealMonster( … ) What I expect What I actually get
  • 16. 4 Making the name consistent is NOT easy https://www.itworld.com/article/2833265/ don-t-go-into-programming-if-you-don-t- have-a-good-thesaurus.html Naming Things 49%
  • 17. 4 Making the name consistent is NOT easy https://www.itworld.com/article/2833265/ don-t-go-into-programming-if-you-don-t- have-a-good-thesaurus.html Naming Things 49%
  • 18. 5 Consequence of inconsistent names There are 5K+ questions on naming issues in stackoverflow.com.
  • 19. 6 Naming bugs are common We found 183K+ commits addressing naming issues from GitHub.com by a quick search with simple queries such as “inconsistent, consistency, misleading, …”
  • 20. 7 Our Goals Detect inconsistent method names. Repair the names to be consistent.
  • 21. 7 Our Goals Detect inconsistent method names. Repair the names to be consistent.
  • 22. 7 Our Goals Detect inconsistent method names. Repair the names to be consistent.
  • 40. How to find similar names/implementations? 12 Sim( , ) = ?
  • 41. 13
  • 42. 14
  • 52. P P 17 Program Vectors Program Encoder M1 M2 M3 M4 <9, 2, 3, …> <7, 1, 6, …> <2, 8, 3, …> <0, 1, 8, …>
  • 53. 18 Method = Name + Body getID(…) { for(…) { for(…) { for(…) … Similar Names Similar Bodies N B
  • 54. 19 Method Name Embedding findField findMatchesHelper containsTarget containsField findInstruction1 find, Field find, Matches, Helper contains, Target contains, Field find, Instruction1 Tokenized Names (camel case, underscore) Method Names Embedded Vectors Sentence2vec (PV-DM)
  • 55. 20 return (String[]) list.toArray(new String[0]); Method Body Embedding Preprocessing: Program Serialization Method Body: [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST:
  • 56. 21 return (String[]) list.toArray(new String[0]); Method Body: [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: ... Method Body Embedding Preprocessing: Program Serialization
  • 57. 22 return (String[]) list.toArray(new String[0]); Method Body: [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Method Body Embedding Preprocessing: Program Serialization
  • 58. 23 Vectorization [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Method Body Embedding
  • 59. 24 Vectorization [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Method Body Embedding
  • 60. 24 Vectorization [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Token Embedding (Word2Vec) Method Body Embedding
  • 61. 24 Vectorization [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Token Embedding (Word2Vec) <2, 6, 9, …> <8, 4, 1, …> <9, 0, 7, …> <2, 3, 0, …> … <7, 1, 2, …> … Method Body Embedding
  • 62. 24 Vectorization [ReturnStatement, return,ArrayType, String[],Variable, listVar, Method, toArray, ArrayCreation, new,ArrayType, String[], NumberLiteral,“0”] Serialized AST: Token Embedding (Word2Vec) <2, 6, 9, …> <8, 4, 1, …> <9, 0, 7, …> <2, 3, 0, …> … <7, 1, 2, …> … Method Body Embedding
  • 63. 25 Encoding (CNN-based) icle has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.288495 Transactions on Software Engineering 7 … … n ! k (a two-dimensional numeric vector) Input layer C1: 4 feature maps S1: 4 feature maps C2: 6 feature maps S2: 6 feature maps Convolutional layer Convolutional layerSubsampling layer Subsampling layer Fully connected layers Output layer Dense layer Output is extracted features ReturnStatement return ArrayType String[] Variable listVar Method toArray ArrayCreation new ArrayType String[] NumberLiteral “0” 0 0 0 0 0 00 0 0 0 0 0 00 Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is 2 6 9 ... ... ... ... ... ... ... ... ... ... ... ... ... ... 8 4 1 9 0 7 2 3 0 7 1 2 ... ......
  • 64. 26 Encoding (CNN-based) icle has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.288495 Transactions on Software Engineering 7 … … n ! k (a two-dimensional numeric vector) Input layer C1: 4 feature maps S1: 4 feature maps C2: 6 feature maps S2: 6 feature maps Convolutional layer Convolutional layerSubsampling layer Subsampling layer Fully connected layers Output layer Dense layer Output is extracted features ReturnStatement return ArrayType String[] Variable listVar Method toArray ArrayCreation new ArrayType String[] NumberLiteral “0” 0 0 0 0 0 00 0 0 0 0 0 00 Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is 2 6 9 ... ... ... ... ... ... ... ... ... ... ... ... ... ... 8 4 1 9 0 7 2 3 0 7 1 2 ... ......
  • 65. 26 Encoding (CNN-based) icle has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.288495 Transactions on Software Engineering 7 … … n ! k (a two-dimensional numeric vector) Input layer C1: 4 feature maps S1: 4 feature maps C2: 6 feature maps S2: 6 feature maps Convolutional layer Convolutional layerSubsampling layer Subsampling layer Fully connected layers Output layer Dense layer Output is extracted features ReturnStatement return ArrayType String[] Variable listVar Method toArray ArrayCreation new ArrayType String[] NumberLiteral “0” 0 0 0 0 0 00 0 0 0 0 0 00 Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is 2 6 9 ... ... ... ... ... ... ... ... ... ... ... ... ... ... 8 4 1 9 0 7 2 3 0 7 1 2 ... ......
  • 69. 30 Inconsistency detection = True False The method is likely to have a consistent name. The method name could be inconsistent with the implementation. Suggest a new name.→
  • 72. 33 Name Suggestion Four ranking strategies: Suggestion = Sorting similar imp.
  • 73. 33 Name Suggestion Four ranking strategies: Suggestion = Sorting similar imp. R1: Sort names based on distance, don’t care identical names.
  • 74. 33 Name Suggestion Four ranking strategies: Suggestion = Sorting similar imp. R1: Sort names based on distance, don’t care identical names. R2: Group identical names first, sort groups based on size.
  • 75. 33 Name Suggestion Four ranking strategies: Suggestion = Sorting similar imp. R1: Sort names based on distance, don’t care identical names. R2: Group identical names first, sort groups based on size. R3: Group identical names first, sort groups based on avg. distance.
  • 76. 33 Name Suggestion Four ranking strategies: Suggestion = Sorting similar imp. R1: Sort names based on distance, don’t care identical names. R2: Group identical names first, sort groups based on size. R3: Group identical names first, sort groups based on avg. distance. R4: Same with R3, but penalize groups with size=1.
  • 78. 35 Research Questions RQ1: Inconsistency Identification RQ2: Suggestion Precision RQ3: Comparative Study RQ4: Live Study
  • 79. 35 Research Questions RQ1: Inconsistency Identification RQ2: Suggestion Precision RQ3: Comparative Study RQ4: Live Study }Training/testing data from open-source projects.
  • 80. 35 Research Questions RQ1: Inconsistency Identification RQ2: Suggestion Precision RQ3: Comparative Study RQ4: Live Study }Training/testing data from open-source projects. Comparing with an approach* with based on a convolutional attention network. → [*] M. Allamanis, H. Peng, and C. Sutton, “A convolutional attention network for extreme summarization of source code,” in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 2091–2100.
  • 81. 35 Research Questions RQ1: Inconsistency Identification RQ2: Suggestion Precision RQ3: Comparative Study RQ4: Live Study }Training/testing data from open-source projects. Comparing with an approach* with based on a convolutional attention network. → Submitting our suggestion results as pull-requests to open-source projects. → [*] M. Allamanis, H. Peng, and C. Sutton, “A convolutional attention network for extreme summarization of source code,” in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 2091–2100.
  • 86. 36 Training/Testing Set Total: 430 projects Training Data 2,116,413 methods
  • 92. 38 RQ1: Inconsistency Identification # of neighbors to look up k=1 5 10 30 Inconsistent (%) Precision 56.8 53.7 53.3 49.9 Recall 84.5 55.9 46.7 28.8 F1 67.9 54.8 49.7 36.5 Consistent (%) Precision 72.0 55.9 54.2 51.4 Recall 38.2 53.7 60.7 72.2 F1 49.9 54.8 57.3 60.0 Accuracy (%) 60.9 54.8 53.8 50.9 →
  • 93. 38 RQ1: Inconsistency Identification # of neighbors to look up k=1 5 10 30 Inconsistent (%) Precision 56.8 53.7 53.3 49.9 Recall 84.5 55.9 46.7 28.8 F1 67.9 54.8 49.7 36.5 Consistent (%) Precision 72.0 55.9 54.2 51.4 Recall 38.2 53.7 60.7 72.2 F1 49.9 54.8 57.3 60.0 Accuracy (%) 60.9 54.8 53.8 50.9 →
  • 94. 39 RQ2: Name Suggestion Accuracy (%) k=thr k=10 R1 R2 R3 R4 First Token thr=1 23.4 23.2 23.0 24.1 thr=5 35.7 39.4 39.4 39.7 Full Name thr=1 10.7 11.0 10.9 10.9 thr=5 17.0 18.7 19.0 19.2
  • 95. 39 RQ2: Name Suggestion Accuracy (%) k=thr k=10 R1 R2 R3 R4 First Token thr=1 23.4 23.2 23.0 24.1 thr=5 35.7 39.4 39.4 39.7 Full Name thr=1 10.7 11.0 10.9 10.9 thr=5 17.0 18.7 19.0 19.2
  • 96. 40 RQ3: Comparison — Name Suggestion Accuracy (%) First Token Full Name thr=1 thr=5 thr=1 thr=5 R1 36.4 47.2 16.5 22.9 R2 34.8 50.2 17.0 25.4 R3 34.7 50.3 16.9 25.5 R4 35.4 50.5 16.0 25.7 conv_attention 22.3 33.6 0.3 0.6 copy_attention 23.5 44.7 0.4 1.1 state-of-the-art }
  • 97. 40 RQ3: Comparison — Name Suggestion Accuracy (%) First Token Full Name thr=1 thr=5 thr=1 thr=5 R1 36.4 47.2 16.5 22.9 R2 34.8 50.2 17.0 25.4 R3 34.7 50.3 16.9 25.5 R4 35.4 50.5 16.0 25.7 conv_attention 22.3 33.6 0.3 0.6 copy_attention 23.5 44.7 0.4 1.1 state-of-the-art }
  • 98. 41 Training Data RQ4: Live Study — Setup
  • 100. 41 Training Data 10% RQ4: Live Study — Setup
  • 101. 42 Training Data RQ4: Live Study — Setup 10%
  • 102. 42 Training Data Identify inconsistent names and suggest new names (sampled 100 cases). RQ4: Live Study — Setup 10%
  • 103. 42 Training Data Identify inconsistent names and suggest new names (sampled 100 cases). RQ4: Live Study — Setup 10%
  • 104. 42 Training Data Identify inconsistent names and suggest new names (sampled 100 cases). Create a pull request RQ4: Live Study — Setup 10%
  • 105. 42 Training Data Identify inconsistent names and suggest new names (sampled 100 cases). Create a pull request RQ4: Live Study — Setup 10%
  • 106. 42 Training Data Identify inconsistent names and suggest new names (sampled 100 cases). Create a pull request Ask a maintainer to refactor the method names RQ4: Live Study — Setup 10%
  • 107. 43 RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100
  • 108. 43 RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100 Half of them are public methods.
  • 109. 43 RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100 Half of them are public methods. Developer feedback includes
  • 110. 43 RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100 Half of them are public methods. Developer feedback includes * It should follow project-specific naming conventions.
  • 111. 43 RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100 Half of them are public methods. Developer feedback includes * It should follow project-specific naming conventions. * Some method names should consider class names. e.g., In “XXXBuilder”, many methods cannot be named as “build()” even though they return “XXXBuilder” objects.
  • 112. 44 Summary X RQ4: Live Study Agreed Agreed but not fixed Disagreed Ignored Total Merged Approved Improved Cannot Won’t 40 26 4 1 2 9 18 100 Half of them are public methods. * It should follow project-specific naming conventions. * Some method names should consider class names. e.g., In “XXXBuilder”, many methods cannot be named as “build()” even though they return “XXXBuilder” objects. X RQ3: Comparison Accuracy (%) First Token Full Name thr=1 thr=5 thr=1 thr=5 R1 36.4 47.2 16.5 22.9 R2 34.8 50.2 17.0 25.4 R3 34.7 50.3 16.9 25.5 R4 35.4 50.5 16.0 25.7 conv_attention 22.3 33.6 0.3 0.6 copy_attention 23.5 44.7 0.4 1.1 X Encoding (CNN-based) This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.2884955, IEEE Transactions on Software Engineering 7 … … n ! k (a two-dimensional numeric vector) Input layer C1: 4 feature maps S1: 4 feature maps C2: 6 feature maps S2: 6 feature maps Convolutional layer Convolutional layerSubsampling layer Subsampling layer Fully connected layers Output layer Dense layer Output is extracted features ReturnStatement return ArrayType String[] Variable listVar Method toArray ArrayCreation new ArrayType String[] NumberLiteral “0” 0 0 0 0 0 00 0 0 0 0 0 00 Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is the first subsampling layer, and S2 is the second one. The output of dense layer is considered as extracted features of code fragments and will be used to do clustering. 2.4.4 Code Patterns Mining Although violations can be parsed and converted into two- dimensional numeric vectors, it is still challenging to mine code patterns given that noisy information (e.g., specific meaningless identifiers) can interfere with identifying sim- ilar violations. Deep learning has recently been shown promising in various software engineering tasks [18], [47], [49]. In particular, it offers a major advantage of requiring less prior knowledge and human effort in feature design for machine learning applications. Consequently, our method is designed to deeply learn discriminating features for mining code patterns of violations. We leverage CNNs to perform deep learning of violation features with embedded viola- tions, and also use X-means clustering algorithm to cluster violations with learned features. Feature learning with CNNs Figure 8 shows the CNNs architecture for learning violation features. The input is two-dimensional numeric vectors of preprocessed violations. The alternating local-connected convolutional and subsampling layers are used to capture the local features of violations. The dense layer compresses all local features captured by former layers. We select the output of the dense layer as the learned violation features to cluster violations. Note that our approach uses CNNs to of violations from clustered similar code fragments of viola- tions to show patterns clearly. Note that, the whole process of mining patterns is automated. 2.5 Mining Common Fix Patterns Our goal in this step is to summarize how a violation is resolved by developers. To achieve this goal, we col- lect violation fixing changes and proceed to identify their common fix patterns. The approach of mining common fix patterns is similar to that of mining common code patterns. The differences lie in the data collection and tokenization process. Before describing our approach of mining common fix patterns, we formalize the definitions of patch and fix pattern. 2.5.1 Preliminaries A patch represents a modification carried on a program source code to repair the program which was brought to an erroneous state at runtime. A patch thus captures some knowledge on modification behavior, and similar patches may be associated with similar behavioral changes. Definition 4. Patch (P): A patch is a pair of source code fragments, one representing a buggy version and another as its updated (i.e., bug-fixing) version. In the traditional GNU diff representation of patches, the buggy version is 2 6 9 ... ... ... ... ... ... ... ... ... ... ... ... ... ... 8 4 1 9 0 7 2 3 0 7 1 2 ... ...... X Making the name consistent is NOT easy https://www.itworld.com/article/2833265/ don-t-go-into-programming-if-you-don-t- have-a-good-thesaurus.html
  • 114. 46 https://www.darkrsw.net http://wwwen.uni.lu/snt/ research/serval Hire me! Université du Luxembourg 1.1 - logotype of the University of Luxembourg The logotype may not be altered under any circumstances. It is to be used like this for all communication mediums. Université du Luxembourg © 03/2013 3.1 - the Interdisciplinary Centre for Security Reliability and Trust The SnT uses its own logo. It is used on all external communication tools in combination with the UL logo. Design guidelines are available at SnT. Hiring