Paper: Mining Java Class Naming Conventions
Authors: Simon Butler, Michel Wermelinger, Yijun Yu and Helen Sharp
Session: Research Track 4 - Natural Language Analysis
Powerful Google developer tools for immediate impact! (2023-24 C)
Natural Language Analysis - Mining Java Class Naming Conventions
1. Mining Java Class Naming Conventions
Simon Butler, Michel Wermelinger, Yijun Yu & Helen Sharp
Centre for Research in Computing
The Open University
27 September 2011
Centre for
Research in Computing m.a.wermelinger@open.ac.uk
Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 1/7
2. Class Identifier Names
Despite the importance of
class identifier names AbstractCollection Set
knowledge of their structure
is limited
adjective ∗ noun +
approximation found to be AbstractSet
useful, but not universal
What other part-of-speech
patterns are commonly used?
How are component words
EnumSet HashSet TreeSet
repeated? How often?
Are there project-specific
naming conventions?
Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 2/7
3. Distribution of Java Classes in Inheritance Categories
0.7
0.6
Proportion of inheritance categories per project
0.5
0.4
0.3
0.2
0.1
0.0
E0I0 E0I1 E0In E1I0 E1I1 E1In
Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 3/7
4. Part-of-Speech Patterns
Relative frequency of most common PoS patterns
noun +
adjective + verb +
noun + + adjective +
noun noun +
noun +
E0 I 0 0.85 0.08 0.01 0.01
E0 I 1 0.73 0.15 0.02 0.02
E0 I n 0.75 0.15 0.03 0.01
E1 I 0 0.68 0.12 0.04 0.03
E1 I 1 0.70 0.15 0.04 0.02
E1 I n 0.75 0.14 0.04 0.02
4 basic patterns account for 90% of class identifier names
85% of E0 I0 class identifier names are composed of nouns
The adjective ∗ noun + approximation includes 85% of class
identifier names
Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 4/7
5. Component Word Inheritance
Relative frequency distribution of name inheritance
Super Class Name Interface Name
Category All Fragment All Fragment Both
E0 I1 - - 0.39 0.37 -
E0 In - - 0.38 0.40 -
E1 I0 0.23 0.58 - - -
E1 I1 0.14 0.53 0.24 0.21 0.27
E1 In 0.11 0.50 0.15 0.25 0.18
Fragments of super class name most commonly repeated
Most common patterns:
E0 I1 & E0 I1 : noun + interface name , noun + interface fragment
E1 I0 : noun + super class fragment , noun + super class name
E1 I1 & E1 In : noun + super class fragment ,
interface name super class fragment , noun + super class name
Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 5/7
6. Case Study - Freemind
652 class identifier names
53 (8%) with uncommon PoS patterns
Each class inspected with questions:
1. Is the class identifier name a clear description of the class?
2. Can the class identifier name be refactored to a more common PoS
pattern?
3. Can the class be refactored into classes that could be more
conventionally named?
We found:
Class identifier names describing GUI actions initiated by the user, e.g.
SelectAllAction ( verb determiner noun )
Class identifier names that conform to local naming conventions
7 class identifier names were candidates for name refactoring
1 class was a candidate for refactoring
Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 6/7
7. Conclusions
Contributions
Identification of common PoS structures found in praxis
Identification of common patterns of component word repetition
Unconventional class names:
may conform to local naming conventions
may be candidates for refactoring
may indicate smells
Practical Applications
Recovery of class naming conventions
Identification of unconventionally named classes
Class identifier name recommendation systems
Butler et al. (The Open University) Mining Java Class Naming Conventions 27 September 2011 7/7