To ensure code readability and facilitate software maintenance, program methods must be named properly. In particular, method names must be consistent with the corresponding method implementations. Debugging method names remains an important topic in the literature, where various approaches analyze commonalities among method names in a large dataset to detect inconsistent method names and suggest better ones. We note that the state-of-the-art does not analyze the implemented code itself to assess consistency. We thus propose a novel automated approach to debugging method names based on the analysis of consistency between method names and method code. The approach leverages deep feature representation techniques adapted to the nature of each artifact. Experimental results on over 2.1 million Java methods show that we can achieve up to 15 percentage points improvement over the state-of-the-art, establishing a record performance of 67.9% F1-measure in identifying inconsistent method names. We further demonstrate that our approach yields up to 25% accuracy in suggesting full names, while the state-of-the-art lags far behind at 1.1% accuracy. Finally, we report on our success in fixing 66 inconsistent method names in a live study on projects in the wild.
Learning to Spot and Refactor Inconsistent Method Names
1. logotype of the University
of Luxembourg
1 Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg
2 Department of Software Engineering, Chonbuk National University, South Korea
Kui Liu1, Dongsun Kim1, Tegawendé F. Bissyandé1, Taeyoung Kim2,
Kisub Kim1, Anil Koyuncu1, Suntae Kim2, Yves Le Traon1
Learning to Spot and Refactor
Inconsistent Method Names
29th May 2019
13. 3
A Method can Disguise
getPokemon( … )
getPokemonRealMonster( … )
What I expect
14. 3
A Method can Disguise
getPokemon( … )
getPokemonRealMonster( … )
What I expect
What I actually get
15. 3
A Method can Disguise
getPokemon( … )
getPokemonRealMonster( … )
What I expect
What I actually get
16. 4
Making the name consistent is NOT easy
https://www.itworld.com/article/2833265/
don-t-go-into-programming-if-you-don-t-
have-a-good-thesaurus.html
Naming Things
49%
17. 4
Making the name consistent is NOT easy
https://www.itworld.com/article/2833265/
don-t-go-into-programming-if-you-don-t-
have-a-good-thesaurus.html
Naming Things
49%
19. 6
Naming bugs are common
We found 183K+ commits addressing naming issues from
GitHub.com by a quick search with simple queries such as
“inconsistent, consistency, misleading, …”
73. 33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care identical names.
74. 33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care identical names.
R2: Group identical names first,
sort groups based on size.
75. 33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care identical names.
R2: Group identical names first,
sort groups based on size.
R3: Group identical names first,
sort groups based on avg. distance.
76. 33
Name Suggestion
Four ranking strategies:
Suggestion = Sorting similar imp.
R1: Sort names based on distance,
don’t care identical names.
R2: Group identical names first,
sort groups based on size.
R3: Group identical names first,
sort groups based on avg. distance.
R4: Same with R3, but penalize groups
with size=1.
79. 35
Research Questions
RQ1: Inconsistency Identification
RQ2: Suggestion Precision
RQ3: Comparative Study
RQ4: Live Study
}Training/testing data from
open-source projects.
80. 35
Research Questions
RQ1: Inconsistency Identification
RQ2: Suggestion Precision
RQ3: Comparative Study
RQ4: Live Study
}Training/testing data from
open-source projects.
Comparing with an approach* with
based on a convolutional attention network.
→
[*] M. Allamanis, H. Peng, and C. Sutton, “A convolutional attention network for extreme summarization of source
code,” in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 2091–2100.
81. 35
Research Questions
RQ1: Inconsistency Identification
RQ2: Suggestion Precision
RQ3: Comparative Study
RQ4: Live Study
}Training/testing data from
open-source projects.
Comparing with an approach* with
based on a convolutional attention network.
→
Submitting our suggestion results as pull-requests
to open-source projects.
→
[*] M. Allamanis, H. Peng, and C. Sutton, “A convolutional attention network for extreme summarization of source
code,” in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 2091–2100.
102. 42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
RQ4: Live Study — Setup
10%
103. 42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
RQ4: Live Study — Setup
10%
104. 42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
Create a pull request
RQ4: Live Study — Setup
10%
105. 42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
Create a pull request
RQ4: Live Study — Setup
10%
106. 42
Training Data Identify inconsistent names
and suggest new names
(sampled 100 cases).
Create a pull request
Ask a maintainer to refactor
the method names
RQ4: Live Study — Setup
10%
107. 43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
108. 43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
Half of them are public methods.
109. 43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
Half of them are public methods.
Developer feedback includes
110. 43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
Half of them are public methods.
Developer feedback includes
* It should follow project-specific naming conventions.
111. 43
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
Half of them are public methods.
Developer feedback includes
* It should follow project-specific naming conventions.
* Some method names should consider class names.
e.g., In “XXXBuilder”, many methods cannot be named as “build()”
even though they return “XXXBuilder” objects.
112. 44
Summary
X
RQ4: Live Study
Agreed Agreed but not fixed
Disagreed Ignored Total
Merged Approved Improved Cannot Won’t
40 26 4 1 2 9 18 100
Half of them are public methods.
* It should follow project-specific naming conventions.
* Some method names should consider class names.
e.g., In “XXXBuilder”, many methods cannot be named as “build()”
even though they return “XXXBuilder” objects.
X
RQ3: Comparison
Accuracy (%)
First Token Full Name
thr=1 thr=5 thr=1 thr=5
R1 36.4 47.2 16.5 22.9
R2 34.8 50.2 17.0 25.4
R3 34.7 50.3 16.9 25.5
R4 35.4 50.5 16.0 25.7
conv_attention 22.3 33.6 0.3 0.6
copy_attention 23.5 44.7 0.4 1.1
X
Encoding (CNN-based)
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSE.2018.2884955, IEEE
Transactions on Software Engineering
7
…
…
n ! k (a two-dimensional
numeric vector)
Input layer
C1: 4 feature maps
S1: 4 feature maps
C2: 6 feature maps
S2: 6 feature maps
Convolutional layer
Convolutional
layerSubsampling layer
Subsampling
layer Fully connected layers
Output
layer
Dense layer
Output is extracted
features
ReturnStatement
return
ArrayType
String[]
Variable
listVar
Method
toArray
ArrayCreation
new
ArrayType
String[]
NumberLiteral
“0”
0 0 0 0 0 00
0 0 0 0 0 00
Fig. 8: CNN architecture for extracting clustering features. C1 is the first convolutional layer, and C2 is the second one. S1 is
the first subsampling layer, and S2 is the second one. The output of dense layer is considered as extracted features of code
fragments and will be used to do clustering.
2.4.4 Code Patterns Mining
Although violations can be parsed and converted into two-
dimensional numeric vectors, it is still challenging to mine
code patterns given that noisy information (e.g., specific
meaningless identifiers) can interfere with identifying sim-
ilar violations. Deep learning has recently been shown
promising in various software engineering tasks [18], [47],
[49]. In particular, it offers a major advantage of requiring
less prior knowledge and human effort in feature design for
machine learning applications. Consequently, our method is
designed to deeply learn discriminating features for mining
code patterns of violations. We leverage CNNs to perform
deep learning of violation features with embedded viola-
tions, and also use X-means clustering algorithm to cluster
violations with learned features.
Feature learning with CNNs
Figure 8 shows the CNNs architecture for learning violation
features. The input is two-dimensional numeric vectors
of preprocessed violations. The alternating local-connected
convolutional and subsampling layers are used to capture
the local features of violations. The dense layer compresses
all local features captured by former layers. We select the
output of the dense layer as the learned violation features
to cluster violations. Note that our approach uses CNNs to
of violations from clustered similar code fragments of viola-
tions to show patterns clearly. Note that, the whole process
of mining patterns is automated.
2.5 Mining Common Fix Patterns
Our goal in this step is to summarize how a violation
is resolved by developers. To achieve this goal, we col-
lect violation fixing changes and proceed to identify their
common fix patterns. The approach of mining common fix
patterns is similar to that of mining common code patterns.
The differences lie in the data collection and tokenization
process. Before describing our approach of mining common
fix patterns, we formalize the definitions of patch and fix
pattern.
2.5.1 Preliminaries
A patch represents a modification carried on a program
source code to repair the program which was brought to
an erroneous state at runtime. A patch thus captures some
knowledge on modification behavior, and similar patches
may be associated with similar behavioral changes.
Definition 4. Patch (P): A patch is a pair of source code
fragments, one representing a buggy version and another
as its updated (i.e., bug-fixing) version. In the traditional
GNU diff representation of patches, the buggy version is
2 6 9 ...
...
...
...
...
...
...
...
...
...
...
...
...
...
8 4 1
9 0 7
2 3 0
7 1 2
... ......
X
Making the name consistent is NOT easy
https://www.itworld.com/article/2833265/
don-t-go-into-programming-if-you-don-t-
have-a-good-thesaurus.html