SlideShare una empresa de Scribd logo
1 de 72
Descargar para leer sin conexión
Not	 Only	 Statements:	 

The	 Role	 of	 Textual	 Analysis	 
in	 Software	 Quality
Rocco Oliveto
rocco.oliveto@unimol.it
University of Molise
2nd Workshop on Mining Unstructured Data

October 17th, 2012 - Kingston, Canada
Textual	 analysis	 is...
...the	 process	 of	 deriving	 high-quality	 information	 from	 text

Text is Software Too
Alexander Dekhtyar
Dept. Computer Science
University of Kentucky
dekhtyar@cs.uky.edu
Jane Hu↵man Hayes
Dept. Computer Science
University of Kentucky
hayes@cs.uky.edu
Tim Menzies
Dept. Computer Science,
Portland State University,
tim@menzies.us
Abstract
Software compiles and therefore is characterized by a
parseable grammar. Natural language text rarely conforms
to prescriptive grammars and therefore is much harder to
parse. Mining parseable structures is easier than mining
less structured entities. Therefore, most work on mining
repositories focuses on software, not natural language text.
Here, we report experiments with mining natural language
text (requirements documents) suggesting that: (a) mining
natural language is not too di cult, so (b) software repos-
itories should routinely be augmented with all the natural
language text used to develop that software.
1 Introduction
“I have seen the future of software engineering, and it
is......Text?”
Much of the work done in the past has focused on the
mining of software repositories that contain structured, eas-
ily parseable artifacts. Even when non-structured artifacts
existed (or portions of structured artifacts that were non-
structured), researchers ignored them. These items tended
to be ”exclusions from consideration” in research papers.
We argue that these non-structured artifacts are rich
in semantic information that cannot be extracted from
the nice-to-parse syntactic structures such as source code.
Much useful information can be obtained by treating text
as software, or at least, as part of the software repository,
and by developing techniques for its e cient mining.
To date, we have found that information retrieval (IR)
methods can be used to support the processing of textual
software artifacts. Specifically, these methods can be used
to facilitate the tracing of software artifacts to each other
(such as tracing design elements to requirements). We have
found that we can generate candidate links in an automated
fashion faster than humans; we can retrieve more true links
than humans; and we can allow the analyst to participate
in the process in a limited way and realize vast results im-
provements [10,11].
In this paper, we discuss:
• The kinds of text seen in software;
• Problems with using non-textual methods;
• The importance of early life cycle artifacts;
• The mining of software repositories with an emphasis
on natural language text; and
• Results from work that we have performed thus far on
mining of textual artifacts.
2 Text in Software Engineering
Textual artifacts associated with software can roughly
be partitioned into two large categories:
1. Text produced during the initial development and then
maintained, such as requirements, design specifica-
tions, user manuals and comments in the code;
2. Text produced after the software is fielded, such as
problem reports, reviews, messages posted to on-line
software user group forums, modification requests, etc.
Both categories of artifacts can help us analyze software
itself, although di↵erent approaches may be employed. In
this paper, we discuss how lifecycle development documents
can be used to mine traceability information for Indepen-
dent Validation & Verification (IV&V) analysts and how
artifacts (e.g., textual interface requirements) can be used
to study and predict software faults.
3 If not text..
One way to assess our proposal would be to assess what
can be learned from alternative representations. In the soft-
ware verification world, reasoning about two represenations
are common: formal models and static code measures.
A formal model has two parts: a system model and a
properties model. The system model describes how the pro-
gram can change the values of variables while the properties
model describes global invariants that must be maintained
when the system executes. Often, a temporal logic1
is used
1Temporal logic is classical logic augmented with some tem-
poral operators such as ⇤X (always X is true); ⌃X (eventually
X is true); X (X is true at the next time point); X
S
Y (X is
true until Y is true).
Non-structured artifacts are
rich in semantic information that
cannot be extracted from the
nice-to-parse syntactic
structures such as source code
...TA	 in	 SE...
traceability recovery (Antoniol et al. TSE 2002, Marcus and Maletic ICSE 2003)
change impact analysis (Canfora et al. Metrics 2005)
feature location (Poshyvanyk et al. TSE 2007)
program comprehension (Haiduc et al. ICSE 2010, Hindle et al. MSR 2011)
bug localization (Lo et al. ICSE 2012)
clone detection (Marcus et al ASE 2001)
...
Textual	 Analysis	 

Applications
Why	 Textual	 Analysis

for	 Software	 Quality
Why
for
lightweight (as it does not require parsing)
provide complementary information to what
traditional code analysis could provide
Textual	 analysis	 for	 
software	 quality
...process	 overview...
source code
entity
source code
entity
source code
entity
text
normalization
identifier
normalization
term
weighting
application
of NLP/IR
new
knwoledge
new
knwoledge
new
knwoledge
Textual	 Analysis	 to...
...measure	 class	 cohesion
Given a class
1. compute the textual similarity between all the
pairs of methods
2. compute the average texual similary (value
between 0 and 1)
3. the higher the similarity the higher the
cohesion
A. Marcus, D. Poshyvanyk, R. Ferenc: Using the Conceptual Cohesion of Classes for Fault Prediction in Object-
Oriented Systems. IEEETransanctions Software Engineering. 34(2): 287-300 (2008)
Textual	 Analysis	 to...
...measure	 class	 coupling
Given two classes A and B
1. compute the textual similarity between all
unordered pairs of methods from class A and
class B
2. compute the average texual similary (value
between 0 and 1)
3. the higher the similarity the higher the coupling
D. Poshyvanyk,A. Marcus, R. Ferenc,T. Gyimóthy: Using information retrieval based coupling measures for impact
analysis. Empirical Software Engineering 14(1): 5-32 (2009)
Yet	 another	 metric?
PC1 PC2 PC3 PC4 PC5 PC6
Proportion 29,6 20,9 10,1 10 17 8,5
Cumulative 29,6 50,5 60,6 70,7 87,7 96,2
C3 -0,06 -0,03 -0,01 0,99 -0,04 0
LCOM1 0,92 0 0,05 -0,03 0,31 -0,01
LCOM2 0,91 -0,01 0,04 -0,02 0,33 0
LCOM3 0,6 -0,12 0,05 -0,04 0,73 -0,13
LCOM4 0,2 -0,19 0 -0,03 0,93 -0,1
LCOM5 0,08 0,03 0,99 -0,01 0,01 -0,04
ICH 0,91 0,05 0,06 -0,05 -0,06 -0,14
TCC -0,02 0,93 -0,03 0 -0,11 0,28
LCC 0,04 0,96 0,07 -0,05 -0,13 0,09
Coh -0,11 0,47 -0,06 0,01 -0,17 0,84
Yet	 another	 metric?
PC1 PC2 PC3 PC4 PC5 PC6
Proportion 29,6 20,9 10,1 10 17 8,5
Cumulative 29,6 50,5 60,6 70,7 87,7 96,2
C3 -0,06 -0,03 -0,01 0,99 -0,04 0
LCOM1 0,92 0 0,05 -0,03 0,31 -0,01
LCOM2 0,91 -0,01 0,04 -0,02 0,33 0
LCOM3 0,6 -0,12 0,05 -0,04 0,73 -0,13
LCOM4 0,2 -0,19 0 -0,03 0,93 -0,1
LCOM5 0,08 0,03 0,99 -0,01 0,01 -0,04
ICH 0,91 0,05 0,06 -0,05 -0,06 -0,14
TCC -0,02 0,93 -0,03 0 -0,11 0,28
LCC 0,04 0,96 0,07 -0,05 -0,13 0,09
Coh -0,11 0,47 -0,06 0,01 -0,17 0,84
So	 what?
Improve

defect	 prediction
...some	 numbers...
Metrics Precision Correctness R2 value
LCOM1 61,9 74,39 0,1
LCOM3 62,59 70,55 0,1
LCOM2 62,05 75,93 0,1
LCOM4 59,75 66,36 0,07
C3 62,05 61,35 0,07
ICH 60,92 73,52 0,06
Coh 61,21 59,33 0,03
LCOM5 56,56 54,48 0,03
...some	 numbers...
Metrics Precision Correctness R2 value
C3+LCOM3 66,2 68,47 0,16
C3+LCOM1 65,23 68,23 0,15
C3+LCOM2 64,88 67,54 0,15
C3+LCOM4 64,98 66,2 0,14
C3+ICH 63,71 64,74 0,12
LCOM4+ICH 63,32 72,87 0,11
LCOM3+ICH 63,46 72,61 0,11
LCOM1+LCOM3 63,27 74,16 0,11
...some	 numbers...
Metrics Precision Correctness R2 value
C3+LCOM3 66,2 68,47 0,16
C3+LCOM1 65,23 68,23 0,15
C3+LCOM2 64,88 67,54 0,15
C3+LCOM4 64,98 66,2 0,14
C3+ICH 63,71 64,74 0,12
LCOM4+ICH 63,32 72,87 0,11
LCOM3+ICH 63,46 72,61 0,11
LCOM1+LCOM3 63,27 74,16 0,11
The use of C3 improves the
prediction accuracy of models
based only on structural
metrics
But	 also	 refactoring...
Class C
method-by-method
matrix construction
m1m2 ........ mn
m1
m2.
.
.
.
.
.
.
.

mn
SSM CIM CSM
Structural Similarity

between Methods
Call-based Interaction

between Methods
Conceptual Similarity

between Methods
n methods
...the	 approach...
G. Bavota,A. De Lucia,A. Marcus, R. Oliveto:A two-step technique for extract class refactoring.ASE 2010: 151-154
G. Bavota,A. De Lucia, R. Oliveto: Identifying Extract Class refactoring opportunities using structural and semantic cohesion measures.
Journal of Systems and Software 84(3): 397-414 (2011)
public class UserManagement {
//String representing the table user in the database
private static final String TABLE_USER = "user";
//String representing the table teaching in the database
private static final String TABLE_TEACHING = "teaching";
/* Insert a new user in TABLE_USER */
public void insertUser(User pUser){
boolean check = checkMandatoryFieldsUser(pUser);
...
String sql = "INSERT INTO " + UserManagement.TABLE_USER + " ... ";
...
}
/* Update an existing user in TABLE_USER */
public void updateUser(User pUser){
boolean check = checkMandatoryFieldsUser(pUser);
...
String sql = "UPDATE " + UserManagement.TABLE_USER + " ... ";
...
}
/* Delete an existing user in TABLE_USER */
public void deleteUser(User pUser){
...
String sql = "DELETE FROM " + UserManagement.TABLE_USER + " ... ";
...
}
/* Verify if in TABLE_USER exists the user pUser */
public void existsUser(User pUser){
...
String sql = "SELECT FROM " + UserManagement.TABLE_USER + " ... ";
...
}
/* Check the mandatory fields in pUser */
public boolean checkMandatoryFieldsUser(User pUser){
...
}
/* Insert a new teaching in TABLE_TEACHING */
public void insertTeaching(Teaching pTeaching){
boolean check = checkMandatoryFieldsTeaching(pTeaching);
...
String sql = "INSERT INTO " + UserManagement.TABLE_TEACHING + " ... ";
...
}
/* Update an existing teaching in TABLE_TEACHING */
public void updateTeaching(Teaching pTeaching){
boolean check = checkMandatoryFieldsTeaching(pTeaching);
...
String sql = "UPDATE " + UserManagement.TABLE_TEACHING + " ... ";
...
}
/* Delete an existing teaching in TABLE_USER */
public void deleteTeaching(Teaching pTeaching){
...
String sql = "DELETE FROM " + UserManagement.TABLE_TEACHING + " ... ";
...
}
/* Check the mandatory fields in pTeaching */
public boolean checkMandatoryFieldsTeaching(Teaching pTeaching){
...
}
}
0 0 0 10.5 00 0.50
00 000 0100
0 00 0 0.5100 0
0
0
0
0
0.5
0
0
0
0
0
0
0
0
0
0
0
0
0
10 00 00
0 10.5 00.5 0
00 00 1 0
0 00 1 00
0.500 0 01
00.50001
CDM similarity
SSM similarity
CSM similarity
IU UU IT UT CT
IU
UU
DU
EU
CU
IT
method-by-method matrix
wCDM = 0.2
wSSM = 0.5
wCSM = 0.3
IU = insertUser - UU = updateUser - DU = deleteUser - EU = existsUser - CU = checkMandatoryFieldsUser
IT = insertTeaching - UT = updateTeaching - DU = deleteTeaching - CT = checkMandatoryFieldsTeaching
DU EU CU DT
UT
DT
CT
0 0 0 10 00 00
00 100 0110
0 10 0 0110 0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
10 00 00
0 10 00 0
01 11 1 0
1 01 1 01
011 1 01
001111
IU UU IT UT CT
IU
UU
DU
EU
CU
IT
DU EU CU DT
UT
DT
CT
0 0 0 10.5 0.20 0.30.1
00 0.300.1 0.210.40
0.1 0.30.1 0 0.510.50 0
0
0
0
0.1
0.5
0
0.4
0
0
0
0
0.1
0
0
0.1
0.5
0.1
0
10 00.2 00
0.1 10.2 00.1 0.1
0.10.3 0.30.5 1 0
0.3 01 0.10.5
0.10.30.7 0.4 01
0.20.20.50.50.71
IU UU IT UT CT
IU
UU
DU
EU
CU
IT
DU EU CU DT
UT
DT
CT
0 0 0 10.3 0.10 0.30
00 0.600 0.110.60
0 0.60 0 0.310.70 0
0
0
0
0
0.3
0
0.6
0
0
0
0
0
0
0
0
0.7
0
0
10 00.1 00
0 10.2 00.1 0
00.6 0.60.7 1 0
0.6 00.6 1 00.7
0.10.60.7 0.6 01
0.10.20.70.70.71
IU UU IT UT CT
IU
UU
DU
EU
CU
IT
DU EU CU DT
UT
DT
CT
DU
UU
CU
IU
0.6
0.7
Candidate Chain C1
Candidate Chain C2
Trivial Chain T1
UUIU DU
Candidate Class C1
DTIT UT CT
Candidate Class C2
EU
Method-by-method Relationships before Filtering Method-by-method Relationships after Filtering Proposed Refactoring
0.7
EU
0.7
0.2
IT
0.1
0.6
0.1
0.6
UT
DT
CT
0.7
0.6
0.3
0.6
0.3
0.1
DU
UU
CU
IU
0.6
0.7
0.7
EU
0.7
IT
0.6
0.6
UT
DT
CT
0.7
0.6
0.3
0.6
0.3
CU
method-by-method matrix
after transitive closure
proposed refactoring
...the	 approach...
DU
UU
CU
IU
0.6
0.7
Candidate Chain C1
Candidate Chain C2
Trivial Chain T1
UUIU DU
Candidate Class C1
DTIT UT CT
Candidate Class C2
EU
Method-by-method Relationships before Filtering Method-by-method Relationships after Filtering Proposed Refactoring
0.7
EU
0.7
0.2
IT
0.1
0.6
0.1
0.6
UT
DT
CT
0.7
0.6
0.3
0.6
0.3
0.1
DU
UU
CU
IU
0.6
0.7
0.7
EU
0.7
IT
0.6
0.6
UT
DT
CT
0.7
0.6
0.3
0.6
0.3
CU
method-by-method matrix
after transitive closure
proposed refactoring
...the	 approach...
Conceptual cohesion plays a crucial role
Refactoring operations make
sense for developers
The	 developer	 point	 of	 view...
Do	 measures	 reflect	 the	 quality	 perceived	 by	 developers?

...the	 study...
How does class coupling align
with developers’ perception of coupling?
Four types of source of information
structural
dynamic
semantic
historical
The study involved 90 subjects
G. Bavota, B. Dit, R. Oliveto, M. Di Penta, D. Poshynanyk,A. De Lucia.An Empirical Study on the Developers'
Perception of Software Coupling. Submitted to ICSE 2013.
...take	 away...
Coupling cannot be captured and measured using only
structural information, such as method calls
Different sourceS of information are needed
Semantic coupling seems to reflect the developers’ mental
model when identifying interaction between entities
Semantic coupling is able to capture “latent coupling
relationships” incapsulated in identifiers and comments
Inconsistentcy	 between	 
code	 and	 comments...
Not	 only	 quality	 
measure...
Inconsistency	 between	 
code	 and	 comments...
...the	 study...
QALP Score: the similarity between a module’s
comment and its code
Used to evaluate the quality of source code but it can
be also used to predict faults
0.0
0.2
0.4
0.6
0.8
1.0
0 2 4 6 8 10 12 14
QALPScore
Defect Count
Mozilla
MP
Figure 2. Maximum QALP score per defect
count for both programs.
Second, many of the com
used to make up for a lack of
outward looking. In the firs
that are not easily understoo
are required to explain the c
ments are intended for users
internal functionality of the
and comments have few wor
low QALP score. For examp
shows an example of both ty
determines whether there is
contained in the variable m
clear from the called functi
it is simply a whitespace te
the reader of this; thus, the c
D. Binkley, H. Feild, D. Lawrie, and M. Pighin,“Software fault prediction using language processing,” in Proceedings
of theTesting:Academic and Industrial Conference Practice and ResearchTechniques, 2007, pp. 99–110.
Inconsistent	 naming...
path? Is it a relative path or an absolute path?
And what about if it is used as both relative and absolute?
...the	 study...
Term entropy: the physical dispersion of terms in a
program.The higher the entropy, the more scattered
across the program the terms
Context coverage: the conceptual dispersion of terms.
The higher their context coverage, the more unrelated the
methods using them
The use of identical terms in different
contexts may increase the risk of faults
V.Arnaoudova, L. M. Eshkevari, R. Oliveto,Y.-G. Guéhéneuc, G.Antoniol: Physical and conceptual identifier
dispersion: Measures and relation to fault proneness. ICSM 2010: 1-5
...take	 away...
Term entropy and context coverage only

partially correlate with size
The number of high entropy and high context coverage
terms contained in a method or attribute helps to explain
the probability of it being faulty
If a Rhino (ArgoUML) method contains an identifier with a
term having high entropy and high context its probability of
being faulty is six (two) times higher
see also
S. Lemma Abebe,V.Arnaoudova, P.Tonella, G.Antoniol andY.-G. Guéhéneuc.
Can Lexicon Bad Smells improve fault prediction? WCRE 2013
Challenges...
Source	 code	 vocabulary...
How	 to	 induce	 
developers	 to	 use	 
meaningful	 identifiers?
Reverse engineering, used with
evolving software development
technologies, will provide
significant incremental
enhancements to our productivity
Reverse engineering, used
evolving software development
technologies
significant incremental
enhancements to our productivity
Continuous
Textual Analysis
COCONUT...
1. The Administrator activates the add member function in the terminal of the system
and correctly enters his login and password identifying him as an Administrator.
2. The system responds by presenting a form to the Administrator on a terminal
screen. The form includes the first and last name, the address, and contact
information (phone, email and fax) of the customer, as well as the fidelity index.
The fidelity index can be: New Member, Silver Member, and Gold Member. After
50 rentals the member is considered as Silver Member, while after 150 rentals the
member becomes a Gold Member. The system also displays the membership fee
to be paid.
3. The Administrator fills the form and then confirms all the requested form
information is correct.
addmember.txt
COCONUT...
COCONUT...
COCONUT...
1. The Administrator activates the add member function in the terminal
of the system and correctly enters his login and password identifying
him as an Administrator.
2. The system responds by presenting a form to the Administrator on a
terminal screen. The form includes the first and last name, the
address, and contact information (phone, email and fax) of the
customer, as well as the fidelity index. The fidelity index can be: New
Member, Silver Member, and Gold Member. After 50 rentals the
member is considered as Silver Member, while after 150 rentals the
member is a Gold Member. The system also displays the
membership fee to be paid.
3. The Administrator fills the form and then confirms all the requested
form information is correct.
addmember.txt
What	 about	 if	 traceability	 	 
links	 are	 not	 available?
Query	 assessment...
IR engine
2 3
Textual Query
INPUT
INPUT
OUTPUT
Source Code
Class C1Class C1Class C1Class C1
Relevant Classes
CONCEPTLOCATION
IR engine
Textual Query
INPUT
INPUT
OUTPUT
Source Code
QUERYASSESSMENT
Query Quality
Good Query Bad Query
Good Query Bad Query
# Method Class Score
1 insertUser
Manager
User
0.99
2 deleteUser
Manager
User
0.95
3 assignUser
Manager
Role
0.88
4 util Utility 0.84
5 getUsers
Manager
User
0.79
Good Query Bad Query
# Method Class Score
1 insertUser
Manager
User
0.99
2 deleteUser
Manager
User
0.95
3 assignUser
Manager
Role
0.88
4 util Utility 0.84
5 getUsers
Manager
User
0.79
Useful results on
top of the list
Good Query Bad Query
# Method Class Score
1 insertUser
Manager
User
0.99
2 deleteUser
Manager
User
0.95
3 assignUser
Manager
Role
0.88
4 util Utility 0.84
5 getUsers
Manager
User
0.79
# Method Class Score
1 util Utility 0.93
2 dbConnect
Manager
Db
0.90
3 insertUser
Manager
User
0.86
4 networking Utility 0.76
5 loadRs
Manager
Db
0.73
False positives on
top of the list
Useful results on
top of the list
How	 to	 use	 query	 
assessment	 for	 
improving	 code	 
vocabulary?
IR engine
Textual Query
INPUT
INPUT
OUTPUT
Source Code
Query Quality
IR engine
Source Code
INPUT
INPUT
OUTPUT
Documents
Code Quality
What	 about	 
comments?
Automatic	 generation...
Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori L. Pollock, K.Vijay-Shanker:Towards automatically generating
summary comments for Java methods.ASE 2010: 43-52
Source	 code	 pre-processing...
...problems...
how to remove the noise in source code?
which elements should be indexed?
identifier splitting and expansion
task-based pre-processing
NLP/IR	 techniques...
...problems...
how to set the parameters of some
technqiues (e.g., LSI)?
do we need customized versions of NLP/IR
techniques?
are the different techniques equivalent?
task-specific techniques?
New	 horizons...
Linguistic	 antipatterns...
Common	 practices,	 from	 linguistic	 aspect,	 in	 the	 source	 code	 that	 
decrease	 the	 quality	 of	 the	 software	 (Arnaoudova	 WCRE	 2010)

Linguistic
Common	 practices,	 from	 linguistic	 aspect,	 in	 the	 source	 code	 that	 
decrease	 the	 quality	 of	 the	 software	 (Arnaoudova	 WCRE	 2010)

How to define linguistic antipatterns?
How to identify them?
Which is the impact of linguistic antipatterns
on software development and maintenance?
How to prevent linguistic antipatterns?
0 0
0 00 0
00 0
01 10 1
1 1 1
1 1 1
0 0 0 01 1 1
0
Software	 testing...
0 0
0 00 0
00 0
01 10 1
1 1 1
1 1 1
0 0 0 01 1 1
0
Software
Can textual analysis be used during

test case selection?
Can textual analysis be used to improve

search-based test case generation?
Can textual analysis be used to capture

testing complexity of source code?
Empirical	 studies...
Empirical
When and why does textual analysis complement
traditional source code analysis techniques?
Studies with users are needed?
Conclusion...
Not Only Statements: The Role of Textual Analysis in Software Quality
Not Only Statements: The Role of Textual Analysis in Software Quality

Más contenido relacionado

La actualidad más candente

A history of clu
A history of cluA history of clu
A history of clu
sugeladi
 

La actualidad más candente (12)

J017446568
J017446568J017446568
J017446568
 
(Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing (Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing
 
Generator of pseudorandom sequences
Generator of pseudorandom sequences Generator of pseudorandom sequences
Generator of pseudorandom sequences
 
Features of genetic algorithm for plain text encryption
Features of genetic algorithm for plain text encryption Features of genetic algorithm for plain text encryption
Features of genetic algorithm for plain text encryption
 
H017445260
H017445260H017445260
H017445260
 
Double layered dna based cryptography
Double layered dna based cryptographyDouble layered dna based cryptography
Double layered dna based cryptography
 
The quality of image encryption techniques by reasoned logic
The quality of image encryption techniques by reasoned logicThe quality of image encryption techniques by reasoned logic
The quality of image encryption techniques by reasoned logic
 
Security analysis of fbdk block cipher for digital
Security analysis of fbdk block cipher for digitalSecurity analysis of fbdk block cipher for digital
Security analysis of fbdk block cipher for digital
 
A Modified Technique For Performing Data Encryption & Data Decryption
A Modified Technique For Performing Data Encryption & Data DecryptionA Modified Technique For Performing Data Encryption & Data Decryption
A Modified Technique For Performing Data Encryption & Data Decryption
 
A history of clu
A history of cluA history of clu
A history of clu
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
 
USE OF MARKOV CHAIN FOR EARLY DETECTING DDOS ATTACKS
USE OF MARKOV CHAIN FOR EARLY DETECTING DDOS ATTACKSUSE OF MARKOV CHAIN FOR EARLY DETECTING DDOS ATTACKS
USE OF MARKOV CHAIN FOR EARLY DETECTING DDOS ATTACKS
 

Similar a Not Only Statements: The Role of Textual Analysis in Software Quality

Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models
IJECEIAES
 
Class quality evaluation using class quality scorecards
Class quality evaluation using class quality scorecardsClass quality evaluation using class quality scorecards
Class quality evaluation using class quality scorecards
IAEME Publication
 
Class quality evaluation using class quality
Class quality evaluation using class qualityClass quality evaluation using class quality
Class quality evaluation using class quality
IAEME Publication
 
An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...
IJERA Editor
 

Similar a Not Only Statements: The Role of Textual Analysis in Software Quality (20)

Put Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowPut Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and How
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
 
Class quality evaluation using class quality scorecards
Class quality evaluation using class quality scorecardsClass quality evaluation using class quality scorecards
Class quality evaluation using class quality scorecards
 
Class quality evaluation using class quality
Class quality evaluation using class qualityClass quality evaluation using class quality
Class quality evaluation using class quality
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug Prediction
 
Fake Reviews Detection using Supervised Machine Learning
Fake Reviews Detection using Supervised Machine LearningFake Reviews Detection using Supervised Machine Learning
Fake Reviews Detection using Supervised Machine Learning
 
Investigating the Effect of BD-CRAFT to Text Detection Algorithms
Investigating the Effect of BD-CRAFT to Text Detection AlgorithmsInvestigating the Effect of BD-CRAFT to Text Detection Algorithms
Investigating the Effect of BD-CRAFT to Text Detection Algorithms
 
INVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMS
INVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMSINVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMS
INVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMS
 
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on HmmEquirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
 
H1803044651
H1803044651H1803044651
H1803044651
 
V1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.docV1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.doc
 
Improvement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining TechniquesImprovement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining Techniques
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining  A Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining
 
Deep Learning in Text Recognition and Text Detection : A Review
Deep Learning in Text Recognition and Text Detection : A ReviewDeep Learning in Text Recognition and Text Detection : A Review
Deep Learning in Text Recognition and Text Detection : A Review
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data MiningA Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining
 
76201929
7620192976201929
76201929
 
Software Refactoring Under Uncertainty: A Robust Multi-Objective Approach
Software Refactoring Under Uncertainty:  A Robust Multi-Objective ApproachSoftware Refactoring Under Uncertainty:  A Robust Multi-Objective Approach
Software Refactoring Under Uncertainty: A Robust Multi-Objective Approach
 
A Survey on Design Pattern Detection Approaches
A Survey on Design Pattern Detection ApproachesA Survey on Design Pattern Detection Approaches
A Survey on Design Pattern Detection Approaches
 
An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...
 

Más de Rocco Oliveto

Más de Rocco Oliveto (10)

ATTICUS - Premio FORUM PA Sanità 2019 (Presentazione)
ATTICUS - Premio FORUM PA Sanità 2019 (Presentazione)ATTICUS - Premio FORUM PA Sanità 2019 (Presentazione)
ATTICUS - Premio FORUM PA Sanità 2019 (Presentazione)
 
ATTICUS - Premio FORUM PA Sanità 2019
ATTICUS - Premio FORUM PA Sanità 2019ATTICUS - Premio FORUM PA Sanità 2019
ATTICUS - Premio FORUM PA Sanità 2019
 
Il Corso di Laurea in Informatica incontra il Mondo del Lavoro - Presentazion...
Il Corso di Laurea in Informatica incontra il Mondo del Lavoro - Presentazion...Il Corso di Laurea in Informatica incontra il Mondo del Lavoro - Presentazion...
Il Corso di Laurea in Informatica incontra il Mondo del Lavoro - Presentazion...
 
ICPC 2015 - MIP Introduction
ICPC 2015 - MIP IntroductionICPC 2015 - MIP Introduction
ICPC 2015 - MIP Introduction
 
SST 2015 - Welcome from the chairs
SST 2015 - Welcome from the chairsSST 2015 - Welcome from the chairs
SST 2015 - Welcome from the chairs
 
ICPC 2015 - Welcome from the chairs
ICPC 2015 - Welcome from the chairsICPC 2015 - Welcome from the chairs
ICPC 2015 - Welcome from the chairs
 
Le nuove frontiere dell'Informatica
Le nuove frontiere dell'InformaticaLe nuove frontiere dell'Informatica
Le nuove frontiere dell'Informatica
 
Presentazione CdL in Informatica @UNIMOL - 2014
Presentazione CdL in Informatica @UNIMOL - 2014Presentazione CdL in Informatica @UNIMOL - 2014
Presentazione CdL in Informatica @UNIMOL - 2014
 
SCAM 2014 - A few notes from the Program Chairs
SCAM 2014 - A few notes from the Program ChairsSCAM 2014 - A few notes from the Program Chairs
SCAM 2014 - A few notes from the Program Chairs
 
Presentation of 2015 edition of ICPC
Presentation of 2015 edition of ICPCPresentation of 2015 edition of ICPC
Presentation of 2015 edition of ICPC
 

Último

Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cherry
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Cherry
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Cherry
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Cherry
 

Último (20)

Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptx
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 

Not Only Statements: The Role of Textual Analysis in Software Quality

  • 1. Not Only Statements: 
 The Role of Textual Analysis in Software Quality Rocco Oliveto rocco.oliveto@unimol.it University of Molise 2nd Workshop on Mining Unstructured Data
 October 17th, 2012 - Kingston, Canada
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7. Textual analysis is... ...the process of deriving high-quality information from text

  • 8. Text is Software Too Alexander Dekhtyar Dept. Computer Science University of Kentucky dekhtyar@cs.uky.edu Jane Hu↵man Hayes Dept. Computer Science University of Kentucky hayes@cs.uky.edu Tim Menzies Dept. Computer Science, Portland State University, tim@menzies.us Abstract Software compiles and therefore is characterized by a parseable grammar. Natural language text rarely conforms to prescriptive grammars and therefore is much harder to parse. Mining parseable structures is easier than mining less structured entities. Therefore, most work on mining repositories focuses on software, not natural language text. Here, we report experiments with mining natural language text (requirements documents) suggesting that: (a) mining natural language is not too di cult, so (b) software repos- itories should routinely be augmented with all the natural language text used to develop that software. 1 Introduction “I have seen the future of software engineering, and it is......Text?” Much of the work done in the past has focused on the mining of software repositories that contain structured, eas- ily parseable artifacts. Even when non-structured artifacts existed (or portions of structured artifacts that were non- structured), researchers ignored them. These items tended to be ”exclusions from consideration” in research papers. We argue that these non-structured artifacts are rich in semantic information that cannot be extracted from the nice-to-parse syntactic structures such as source code. Much useful information can be obtained by treating text as software, or at least, as part of the software repository, and by developing techniques for its e cient mining. To date, we have found that information retrieval (IR) methods can be used to support the processing of textual software artifacts. Specifically, these methods can be used to facilitate the tracing of software artifacts to each other (such as tracing design elements to requirements). We have found that we can generate candidate links in an automated fashion faster than humans; we can retrieve more true links than humans; and we can allow the analyst to participate in the process in a limited way and realize vast results im- provements [10,11]. In this paper, we discuss: • The kinds of text seen in software; • Problems with using non-textual methods; • The importance of early life cycle artifacts; • The mining of software repositories with an emphasis on natural language text; and • Results from work that we have performed thus far on mining of textual artifacts. 2 Text in Software Engineering Textual artifacts associated with software can roughly be partitioned into two large categories: 1. Text produced during the initial development and then maintained, such as requirements, design specifica- tions, user manuals and comments in the code; 2. Text produced after the software is fielded, such as problem reports, reviews, messages posted to on-line software user group forums, modification requests, etc. Both categories of artifacts can help us analyze software itself, although di↵erent approaches may be employed. In this paper, we discuss how lifecycle development documents can be used to mine traceability information for Indepen- dent Validation & Verification (IV&V) analysts and how artifacts (e.g., textual interface requirements) can be used to study and predict software faults. 3 If not text.. One way to assess our proposal would be to assess what can be learned from alternative representations. In the soft- ware verification world, reasoning about two represenations are common: formal models and static code measures. A formal model has two parts: a system model and a properties model. The system model describes how the pro- gram can change the values of variables while the properties model describes global invariants that must be maintained when the system executes. Often, a temporal logic1 is used 1Temporal logic is classical logic augmented with some tem- poral operators such as ⇤X (always X is true); ⌃X (eventually X is true); X (X is true at the next time point); X S Y (X is true until Y is true). Non-structured artifacts are rich in semantic information that cannot be extracted from the nice-to-parse syntactic structures such as source code ...TA in SE...
  • 9. traceability recovery (Antoniol et al. TSE 2002, Marcus and Maletic ICSE 2003) change impact analysis (Canfora et al. Metrics 2005) feature location (Poshyvanyk et al. TSE 2007) program comprehension (Haiduc et al. ICSE 2010, Hindle et al. MSR 2011) bug localization (Lo et al. ICSE 2012) clone detection (Marcus et al ASE 2001) ... Textual Analysis 
 Applications
  • 10. Why Textual Analysis
 for Software Quality
  • 11. Why for lightweight (as it does not require parsing) provide complementary information to what traditional code analysis could provide
  • 12. Textual analysis for software quality
  • 13. ...process overview... source code entity source code entity source code entity text normalization identifier normalization term weighting application of NLP/IR new knwoledge new knwoledge new knwoledge
  • 14. Textual Analysis to... ...measure class cohesion Given a class 1. compute the textual similarity between all the pairs of methods 2. compute the average texual similary (value between 0 and 1) 3. the higher the similarity the higher the cohesion A. Marcus, D. Poshyvanyk, R. Ferenc: Using the Conceptual Cohesion of Classes for Fault Prediction in Object- Oriented Systems. IEEETransanctions Software Engineering. 34(2): 287-300 (2008)
  • 15. Textual Analysis to... ...measure class coupling Given two classes A and B 1. compute the textual similarity between all unordered pairs of methods from class A and class B 2. compute the average texual similary (value between 0 and 1) 3. the higher the similarity the higher the coupling D. Poshyvanyk,A. Marcus, R. Ferenc,T. Gyimóthy: Using information retrieval based coupling measures for impact analysis. Empirical Software Engineering 14(1): 5-32 (2009)
  • 16. Yet another metric? PC1 PC2 PC3 PC4 PC5 PC6 Proportion 29,6 20,9 10,1 10 17 8,5 Cumulative 29,6 50,5 60,6 70,7 87,7 96,2 C3 -0,06 -0,03 -0,01 0,99 -0,04 0 LCOM1 0,92 0 0,05 -0,03 0,31 -0,01 LCOM2 0,91 -0,01 0,04 -0,02 0,33 0 LCOM3 0,6 -0,12 0,05 -0,04 0,73 -0,13 LCOM4 0,2 -0,19 0 -0,03 0,93 -0,1 LCOM5 0,08 0,03 0,99 -0,01 0,01 -0,04 ICH 0,91 0,05 0,06 -0,05 -0,06 -0,14 TCC -0,02 0,93 -0,03 0 -0,11 0,28 LCC 0,04 0,96 0,07 -0,05 -0,13 0,09 Coh -0,11 0,47 -0,06 0,01 -0,17 0,84
  • 17. Yet another metric? PC1 PC2 PC3 PC4 PC5 PC6 Proportion 29,6 20,9 10,1 10 17 8,5 Cumulative 29,6 50,5 60,6 70,7 87,7 96,2 C3 -0,06 -0,03 -0,01 0,99 -0,04 0 LCOM1 0,92 0 0,05 -0,03 0,31 -0,01 LCOM2 0,91 -0,01 0,04 -0,02 0,33 0 LCOM3 0,6 -0,12 0,05 -0,04 0,73 -0,13 LCOM4 0,2 -0,19 0 -0,03 0,93 -0,1 LCOM5 0,08 0,03 0,99 -0,01 0,01 -0,04 ICH 0,91 0,05 0,06 -0,05 -0,06 -0,14 TCC -0,02 0,93 -0,03 0 -0,11 0,28 LCC 0,04 0,96 0,07 -0,05 -0,13 0,09 Coh -0,11 0,47 -0,06 0,01 -0,17 0,84
  • 20. ...some numbers... Metrics Precision Correctness R2 value LCOM1 61,9 74,39 0,1 LCOM3 62,59 70,55 0,1 LCOM2 62,05 75,93 0,1 LCOM4 59,75 66,36 0,07 C3 62,05 61,35 0,07 ICH 60,92 73,52 0,06 Coh 61,21 59,33 0,03 LCOM5 56,56 54,48 0,03
  • 21. ...some numbers... Metrics Precision Correctness R2 value C3+LCOM3 66,2 68,47 0,16 C3+LCOM1 65,23 68,23 0,15 C3+LCOM2 64,88 67,54 0,15 C3+LCOM4 64,98 66,2 0,14 C3+ICH 63,71 64,74 0,12 LCOM4+ICH 63,32 72,87 0,11 LCOM3+ICH 63,46 72,61 0,11 LCOM1+LCOM3 63,27 74,16 0,11
  • 22. ...some numbers... Metrics Precision Correctness R2 value C3+LCOM3 66,2 68,47 0,16 C3+LCOM1 65,23 68,23 0,15 C3+LCOM2 64,88 67,54 0,15 C3+LCOM4 64,98 66,2 0,14 C3+ICH 63,71 64,74 0,12 LCOM4+ICH 63,32 72,87 0,11 LCOM3+ICH 63,46 72,61 0,11 LCOM1+LCOM3 63,27 74,16 0,11 The use of C3 improves the prediction accuracy of models based only on structural metrics
  • 24. Class C method-by-method matrix construction m1m2 ........ mn m1 m2.
.
.
.
.
.
.
.
 mn SSM CIM CSM Structural Similarity
 between Methods Call-based Interaction
 between Methods Conceptual Similarity
 between Methods n methods ...the approach... G. Bavota,A. De Lucia,A. Marcus, R. Oliveto:A two-step technique for extract class refactoring.ASE 2010: 151-154 G. Bavota,A. De Lucia, R. Oliveto: Identifying Extract Class refactoring opportunities using structural and semantic cohesion measures. Journal of Systems and Software 84(3): 397-414 (2011)
  • 25. public class UserManagement { //String representing the table user in the database private static final String TABLE_USER = "user"; //String representing the table teaching in the database private static final String TABLE_TEACHING = "teaching"; /* Insert a new user in TABLE_USER */ public void insertUser(User pUser){ boolean check = checkMandatoryFieldsUser(pUser); ... String sql = "INSERT INTO " + UserManagement.TABLE_USER + " ... "; ... } /* Update an existing user in TABLE_USER */ public void updateUser(User pUser){ boolean check = checkMandatoryFieldsUser(pUser); ... String sql = "UPDATE " + UserManagement.TABLE_USER + " ... "; ... } /* Delete an existing user in TABLE_USER */ public void deleteUser(User pUser){ ... String sql = "DELETE FROM " + UserManagement.TABLE_USER + " ... "; ... } /* Verify if in TABLE_USER exists the user pUser */ public void existsUser(User pUser){ ... String sql = "SELECT FROM " + UserManagement.TABLE_USER + " ... "; ... } /* Check the mandatory fields in pUser */ public boolean checkMandatoryFieldsUser(User pUser){ ... } /* Insert a new teaching in TABLE_TEACHING */ public void insertTeaching(Teaching pTeaching){ boolean check = checkMandatoryFieldsTeaching(pTeaching); ... String sql = "INSERT INTO " + UserManagement.TABLE_TEACHING + " ... "; ... } /* Update an existing teaching in TABLE_TEACHING */ public void updateTeaching(Teaching pTeaching){ boolean check = checkMandatoryFieldsTeaching(pTeaching); ... String sql = "UPDATE " + UserManagement.TABLE_TEACHING + " ... "; ... } /* Delete an existing teaching in TABLE_USER */ public void deleteTeaching(Teaching pTeaching){ ... String sql = "DELETE FROM " + UserManagement.TABLE_TEACHING + " ... "; ... } /* Check the mandatory fields in pTeaching */ public boolean checkMandatoryFieldsTeaching(Teaching pTeaching){ ... } } 0 0 0 10.5 00 0.50 00 000 0100 0 00 0 0.5100 0 0 0 0 0 0.5 0 0 0 0 0 0 0 0 0 0 0 0 0 10 00 00 0 10.5 00.5 0 00 00 1 0 0 00 1 00 0.500 0 01 00.50001 CDM similarity SSM similarity CSM similarity IU UU IT UT CT IU UU DU EU CU IT method-by-method matrix wCDM = 0.2 wSSM = 0.5 wCSM = 0.3 IU = insertUser - UU = updateUser - DU = deleteUser - EU = existsUser - CU = checkMandatoryFieldsUser IT = insertTeaching - UT = updateTeaching - DU = deleteTeaching - CT = checkMandatoryFieldsTeaching DU EU CU DT UT DT CT 0 0 0 10 00 00 00 100 0110 0 10 0 0110 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 10 00 00 0 10 00 0 01 11 1 0 1 01 1 01 011 1 01 001111 IU UU IT UT CT IU UU DU EU CU IT DU EU CU DT UT DT CT 0 0 0 10.5 0.20 0.30.1 00 0.300.1 0.210.40 0.1 0.30.1 0 0.510.50 0 0 0 0 0.1 0.5 0 0.4 0 0 0 0 0.1 0 0 0.1 0.5 0.1 0 10 00.2 00 0.1 10.2 00.1 0.1 0.10.3 0.30.5 1 0 0.3 01 0.10.5 0.10.30.7 0.4 01 0.20.20.50.50.71 IU UU IT UT CT IU UU DU EU CU IT DU EU CU DT UT DT CT 0 0 0 10.3 0.10 0.30 00 0.600 0.110.60 0 0.60 0 0.310.70 0 0 0 0 0 0.3 0 0.6 0 0 0 0 0 0 0 0 0.7 0 0 10 00.1 00 0 10.2 00.1 0 00.6 0.60.7 1 0 0.6 00.6 1 00.7 0.10.60.7 0.6 01 0.10.20.70.70.71 IU UU IT UT CT IU UU DU EU CU IT DU EU CU DT UT DT CT
  • 26. DU UU CU IU 0.6 0.7 Candidate Chain C1 Candidate Chain C2 Trivial Chain T1 UUIU DU Candidate Class C1 DTIT UT CT Candidate Class C2 EU Method-by-method Relationships before Filtering Method-by-method Relationships after Filtering Proposed Refactoring 0.7 EU 0.7 0.2 IT 0.1 0.6 0.1 0.6 UT DT CT 0.7 0.6 0.3 0.6 0.3 0.1 DU UU CU IU 0.6 0.7 0.7 EU 0.7 IT 0.6 0.6 UT DT CT 0.7 0.6 0.3 0.6 0.3 CU method-by-method matrix after transitive closure proposed refactoring ...the approach...
  • 27. DU UU CU IU 0.6 0.7 Candidate Chain C1 Candidate Chain C2 Trivial Chain T1 UUIU DU Candidate Class C1 DTIT UT CT Candidate Class C2 EU Method-by-method Relationships before Filtering Method-by-method Relationships after Filtering Proposed Refactoring 0.7 EU 0.7 0.2 IT 0.1 0.6 0.1 0.6 UT DT CT 0.7 0.6 0.3 0.6 0.3 0.1 DU UU CU IU 0.6 0.7 0.7 EU 0.7 IT 0.6 0.6 UT DT CT 0.7 0.6 0.3 0.6 0.3 CU method-by-method matrix after transitive closure proposed refactoring ...the approach... Conceptual cohesion plays a crucial role Refactoring operations make sense for developers
  • 28. The developer point of view... Do measures reflect the quality perceived by developers?

  • 29. ...the study... How does class coupling align with developers’ perception of coupling? Four types of source of information structural dynamic semantic historical The study involved 90 subjects G. Bavota, B. Dit, R. Oliveto, M. Di Penta, D. Poshynanyk,A. De Lucia.An Empirical Study on the Developers' Perception of Software Coupling. Submitted to ICSE 2013.
  • 30. ...take away... Coupling cannot be captured and measured using only structural information, such as method calls Different sourceS of information are needed Semantic coupling seems to reflect the developers’ mental model when identifying interaction between entities Semantic coupling is able to capture “latent coupling relationships” incapsulated in identifiers and comments
  • 31. Inconsistentcy between code and comments... Not only quality measure...
  • 32. Inconsistency between code and comments...
  • 33. ...the study... QALP Score: the similarity between a module’s comment and its code Used to evaluate the quality of source code but it can be also used to predict faults 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 14 QALPScore Defect Count Mozilla MP Figure 2. Maximum QALP score per defect count for both programs. Second, many of the com used to make up for a lack of outward looking. In the firs that are not easily understoo are required to explain the c ments are intended for users internal functionality of the and comments have few wor low QALP score. For examp shows an example of both ty determines whether there is contained in the variable m clear from the called functi it is simply a whitespace te the reader of this; thus, the c D. Binkley, H. Feild, D. Lawrie, and M. Pighin,“Software fault prediction using language processing,” in Proceedings of theTesting:Academic and Industrial Conference Practice and ResearchTechniques, 2007, pp. 99–110.
  • 34. Inconsistent naming... path? Is it a relative path or an absolute path? And what about if it is used as both relative and absolute?
  • 35. ...the study... Term entropy: the physical dispersion of terms in a program.The higher the entropy, the more scattered across the program the terms Context coverage: the conceptual dispersion of terms. The higher their context coverage, the more unrelated the methods using them The use of identical terms in different contexts may increase the risk of faults V.Arnaoudova, L. M. Eshkevari, R. Oliveto,Y.-G. Guéhéneuc, G.Antoniol: Physical and conceptual identifier dispersion: Measures and relation to fault proneness. ICSM 2010: 1-5
  • 36. ...take away... Term entropy and context coverage only
 partially correlate with size The number of high entropy and high context coverage terms contained in a method or attribute helps to explain the probability of it being faulty If a Rhino (ArgoUML) method contains an identifier with a term having high entropy and high context its probability of being faulty is six (two) times higher see also S. Lemma Abebe,V.Arnaoudova, P.Tonella, G.Antoniol andY.-G. Guéhéneuc. Can Lexicon Bad Smells improve fault prediction? WCRE 2013
  • 39. How to induce developers to use meaningful identifiers?
  • 40. Reverse engineering, used with evolving software development technologies, will provide significant incremental enhancements to our productivity
  • 41. Reverse engineering, used evolving software development technologies significant incremental enhancements to our productivity Continuous Textual Analysis
  • 42. COCONUT... 1. The Administrator activates the add member function in the terminal of the system and correctly enters his login and password identifying him as an Administrator. 2. The system responds by presenting a form to the Administrator on a terminal screen. The form includes the first and last name, the address, and contact information (phone, email and fax) of the customer, as well as the fidelity index. The fidelity index can be: New Member, Silver Member, and Gold Member. After 50 rentals the member is considered as Silver Member, while after 150 rentals the member becomes a Gold Member. The system also displays the membership fee to be paid. 3. The Administrator fills the form and then confirms all the requested form information is correct. addmember.txt
  • 45. COCONUT... 1. The Administrator activates the add member function in the terminal of the system and correctly enters his login and password identifying him as an Administrator. 2. The system responds by presenting a form to the Administrator on a terminal screen. The form includes the first and last name, the address, and contact information (phone, email and fax) of the customer, as well as the fidelity index. The fidelity index can be: New Member, Silver Member, and Gold Member. After 50 rentals the member is considered as Silver Member, while after 150 rentals the member is a Gold Member. The system also displays the membership fee to be paid. 3. The Administrator fills the form and then confirms all the requested form information is correct. addmember.txt
  • 46. What about if traceability links are not available?
  • 48. IR engine 2 3 Textual Query INPUT INPUT OUTPUT Source Code Class C1Class C1Class C1Class C1 Relevant Classes CONCEPTLOCATION
  • 49. IR engine Textual Query INPUT INPUT OUTPUT Source Code QUERYASSESSMENT Query Quality
  • 50. Good Query Bad Query
  • 51. Good Query Bad Query # Method Class Score 1 insertUser Manager User 0.99 2 deleteUser Manager User 0.95 3 assignUser Manager Role 0.88 4 util Utility 0.84 5 getUsers Manager User 0.79
  • 52. Good Query Bad Query # Method Class Score 1 insertUser Manager User 0.99 2 deleteUser Manager User 0.95 3 assignUser Manager Role 0.88 4 util Utility 0.84 5 getUsers Manager User 0.79 Useful results on top of the list
  • 53. Good Query Bad Query # Method Class Score 1 insertUser Manager User 0.99 2 deleteUser Manager User 0.95 3 assignUser Manager Role 0.88 4 util Utility 0.84 5 getUsers Manager User 0.79 # Method Class Score 1 util Utility 0.93 2 dbConnect Manager Db 0.90 3 insertUser Manager User 0.86 4 networking Utility 0.76 5 loadRs Manager Db 0.73 False positives on top of the list Useful results on top of the list
  • 54. How to use query assessment for improving code vocabulary?
  • 58. Automatic generation... Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori L. Pollock, K.Vijay-Shanker:Towards automatically generating summary comments for Java methods.ASE 2010: 43-52
  • 60. ...problems... how to remove the noise in source code? which elements should be indexed? identifier splitting and expansion task-based pre-processing
  • 62. ...problems... how to set the parameters of some technqiues (e.g., LSI)? do we need customized versions of NLP/IR techniques? are the different techniques equivalent? task-specific techniques?
  • 64. Linguistic antipatterns... Common practices, from linguistic aspect, in the source code that decrease the quality of the software (Arnaoudova WCRE 2010)

  • 65. Linguistic Common practices, from linguistic aspect, in the source code that decrease the quality of the software (Arnaoudova WCRE 2010)
 How to define linguistic antipatterns? How to identify them? Which is the impact of linguistic antipatterns on software development and maintenance? How to prevent linguistic antipatterns?
  • 66. 0 0 0 00 0 00 0 01 10 1 1 1 1 1 1 1 0 0 0 01 1 1 0 Software testing...
  • 67. 0 0 0 00 0 00 0 01 10 1 1 1 1 1 1 1 0 0 0 01 1 1 0 Software Can textual analysis be used during
 test case selection? Can textual analysis be used to improve
 search-based test case generation? Can textual analysis be used to capture
 testing complexity of source code?
  • 69. Empirical When and why does textual analysis complement traditional source code analysis techniques? Studies with users are needed?