Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Late Propagation in Software Clones
1. Late Propagation
in Software Clones
Liliane Barbour, Foutse Khomh,
and Ying Zou
2. Late Propagation (LP)
• Definition: An inconsistent change that diverges a
clone pair, later followed by a consistent, re-
synchronizing change.
• It can be risky because failure to propagate changes
between clones in a clone pair can lead to faults
• In our work, we found that 8-21% of genealogies
contain a late propagation
2
3. LP With Propagation Example from
ArgoUML
//Clone A, Revision 595
add Field(new UMLComboBox(typeModel),1,0,0);
//Clone B, Revision 595
add Field(new UMLComboBox(classifierModel),2,0,0);
//Diverging Change: Clone A, Revision 602
add Field(new UMLComboBoxNavigator(this,”NavClass”,
new UMLComboBox(typeModel)),1,0,0);
//Re-synchronizing Change: Clone B, Revision 604
add Field(new UMLComboBoxNavigator (this,”NavClass”,
new UMLComboBox(classifierModel)),2,0,0);
Clone A Clone B
Revision 595
Revision 602 Diverging
Change
Re-synchronizing
Revision 604 Change 3
4. LP Without Propagation Example
from Ant
//Clone A, Revision 270250 Clone A Clone B
if( destFile == null )
{ Revision
destFile = new File(destDir,file.getName()); 270250
}
//Clone B, Revision 270250 Revision Diverging
if (destFile == null ) { 270264 Change
destFile = new File(destDir,file.getName());
}
Revision Re-synchronizing
// Diverging Change: Clone A, Revision 270264 271109 Change
if ( m_destFile == null )
{
m_destFile = new File(m_destDir,m_file.getName());
}
//Re-synchronizing Change: Clone A, Revision 271109
if ( destFile == null ) {
destFile = new File(destDir,file.getName());
}
4
5. Types of Late Propagation
Propagation LP Modified During Modified During Modified During
Category Type Diverging Change the Period of Re-synchronizing
Divergence Change
Propagation LP1 A A B
Always Occurs LP2 A A and B B
LP3 A A A and B
Propagation May LP4 A A and B A
or May Not LP5 A A and B A and B
Occur
LP6 A and B A and B A or B
LP7 A and B A and B A and B
Propagation LP8 A A A
Never Occurs
5
6. Research Questions
RQ1: Are there different types of LP?
RQ2: Are some types of LP more fault-prone than
others?
RQ3: Which type of LP experiences the highest
proportion of faults?
6
7. Subject Systems
# Gen # LP # Gen # LP
System # LOC # Revisions CCFinder CCFinder Simian Simian
ArgoUML 3.1M 18k 14k 1.1k 111 23
Ant 2.3M 1.0M 30k 4.7k 461 80
7
9. Mining the SVN
• Use J-Rex to mine the SVN
• Heuristics used to identify reason for commit
(Mockus et al., 2000)
• Snapshots of all revisions to each Java file are stored
in an XML file
• Test files are removed
9
10. Clone Detection
• Contents of each method revision extracted into
individual files
• Perform clone detection once on all snapshots
• Two existing clone detection tools are used
– Simian (text-based) and CCFinder (token-based)
10
11. Building Clone Genealogies
• Build clone genealogies using the existing clone list
• Query the SVN using diff to track changes to each
clone in a clone pair over time.
• If a change modifies one of the clones in a clone
pair, query the clone list for a matching clone
11
13. RQ1: Are there different types of LP?
Breakdown of LP Type by System
80%
Percentage of All LP Occurrences
70%
60%
50%
40%
30%
20%
10%
0%
LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8
LP Types
ArgoUML - Simian ArgoUML - CCFinder Ant - Simian Ant - CCFinder
There is representation from multiple types of LP
and across all categories of LP. 13
14. RQ2: Are some types of LP more fault-
prone than others?
Part 1: Is Late Propagation fault-prone?
Part 2: Are specific types of late propagation more
fault-prone?
14
15. Part 1: Is Late Propagation Fault-
prone?
LP vs. Non-LP
Odds Ratios
4
ArgoUML – Simian
Odds Ratio
3
is omitted because
2
it is not statistically
1 significant
0
Ant - Simian ArgoUML - CCFinder Ant - CCFinder
In all significant cases, the odds ratio is greater than 1.
Therefore, LP genealogies are more fault prone than
non-LP genealogies.
15
16. Part 2: Are specific types of late
propagation more fault-prone?
Odds Ratios Between Each LP Type
and Non-LP Genealogies
16
14
12
Odds Ratio
10
8
6
4
2
0
LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8
LP Type
Ant - Simian ArgoUML - CCFinder Ant - CCFinder
Note: ArgoUML – Simian is omitted because it is not statistically significant 16
17. RQ2 Observations
• In general, some LP types are not more fault-prone
than non-LP genealogies (i.e. odds ratio < 1)
• Some types that make up a small proportion of LP
instances have a very high odds ratio
• LP7 and LP8 occur frequently but have low odds
ratios.
Each type of LP has a different level of fault-proneness.
17
18. RQ3: Which type of LP experiences
the highest proportion of faults?
18
19. RQ3: Which type of LP experiences
the highest proportion of faults?
Percentage of Fault Occurrences
Broken Down by LP Type
Percentage of Fault Occurrences
80%
60%
40%
20%
0%
LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8
LP Type
Ant - Simian ArgoUML - CCFinder Ant - CCFinder
Note: ArgoUML – Simian is omitted because it is not statistically significant 19
20. RQ3 Observations
• LP7 and LP8 contribute a large proportion of the
faults but have lower odds ratios (RQ2)
– When faults occur, they occur in large numbers
• Overall, LP7 and LP8 are the most dangerous, with
the other types being system dependent in their
fault-proneness.
The proportion of faults is different for
each LP type.
20
21. Conclusion
• In general, LP genealogies are more fault-prone than
non-LP genealogies
• LP7 and LP8 are the riskiest, in terms of their fault-
proneness and magnitude of faults.
– LP8 contains no propagation of changes
– LP7 may or may not contain any propagation of
changes
• The fault-proneness and fault-occurrence is
dependent on the LP type and is system-dependent.
21