SlideShare a Scribd company logo
1 of 22
Late Propagation
  in Software Clones
Liliane Barbour, Foutse Khomh,
          and Ying Zou
Late Propagation (LP)
• Definition: An inconsistent change that diverges a
  clone pair, later followed by a consistent, re-
  synchronizing change.
• It can be risky because failure to propagate changes
  between clones in a clone pair can lead to faults
• In our work, we found that 8-21% of genealogies
  contain a late propagation




                                                         2
LP With Propagation Example from
                ArgoUML
//Clone A, Revision 595
add Field(new UMLComboBox(typeModel),1,0,0);

//Clone B, Revision 595
add Field(new UMLComboBox(classifierModel),2,0,0);

//Diverging Change: Clone A, Revision 602
add Field(new UMLComboBoxNavigator(this,”NavClass”,
         new UMLComboBox(typeModel)),1,0,0);

//Re-synchronizing Change: Clone B, Revision 604
add Field(new UMLComboBoxNavigator (this,”NavClass”,
         new UMLComboBox(classifierModel)),2,0,0);
                                                          Clone A   Clone B

                                                Revision 595



                                                Revision 602              Diverging
                                                                          Change


                                                                          Re-synchronizing
                                                Revision 604              Change    3
LP Without Propagation Example
               from Ant
//Clone A, Revision 270250                                  Clone A   Clone B
if( destFile == null )
{                                                    Revision
   destFile = new File(destDir,file.getName());      270250
}

//Clone B, Revision 270250                           Revision              Diverging
if (destFile == null ) {                             270264                Change
   destFile = new File(destDir,file.getName());
}
                                                   Revision                Re-synchronizing
// Diverging Change: Clone A, Revision 270264      271109                  Change
if ( m_destFile == null )
{
   m_destFile = new File(m_destDir,m_file.getName());
}

//Re-synchronizing Change: Clone A, Revision 271109
if ( destFile == null ) {
   destFile = new File(destDir,file.getName());
}



                                                                                   4
Types of Late Propagation
Propagation       LP     Modified During Modified During   Modified During
Category          Type   Diverging Change the Period of    Re-synchronizing
                                          Divergence       Change
Propagation        LP1          A               A                  B
Always Occurs      LP2          A             A and B              B
                   LP3          A               A               A and B
Propagation May    LP4          A             A and B              A
or May Not         LP5          A             A and B           A and B
Occur
                   LP6       A and B          A and B            A or B
                   LP7       A and B          A and B           A and B
Propagation        LP8          A               A                  A
Never Occurs



                                                                              5
Research Questions
RQ1: Are there different types of LP?

RQ2: Are some types of LP more fault-prone than
  others?

RQ3: Which type of LP experiences the highest
    proportion of faults?



                                                  6
Subject Systems


                             # Gen    # LP     # Gen    # LP
System   # LOC # Revisions   CCFinder CCFinder Simian   Simian
ArgoUML 3.1M       18k         14k      1.1k     111      23
  Ant    2.3M     1.0M         30k      4.7k     461      80




                                                                 7
Our Approach




               8
Mining the SVN




• Use J-Rex to mine the SVN
• Heuristics used to identify reason for commit
  (Mockus et al., 2000)
• Snapshots of all revisions to each Java file are stored
  in an XML file
• Test files are removed
                                                            9
Clone Detection




• Contents of each method revision extracted into
  individual files
• Perform clone detection once on all snapshots
• Two existing clone detection tools are used
   – Simian (text-based) and CCFinder (token-based)
                                                      10
Building Clone Genealogies




• Build clone genealogies using the existing clone list
• Query the SVN using diff to track changes to each
  clone in a clone pair over time.
• If a change modifies one of the clones in a clone
  pair, query the clone list for a matching clone
                                                          11
RQ1: Are there different types of LP?




                                    12
RQ1: Are there different types of LP?
                                            Breakdown of LP Type by System
                                   80%
Percentage of All LP Occurrences



                                   70%
                                   60%
                                   50%
                                   40%
                                   30%
                                   20%
                                   10%
                                    0%
                                          LP1     LP2       LP3     LP4     LP5       LP6     LP7     LP8
                                                                      LP Types
                                   ArgoUML - Simian     ArgoUML - CCFinder     Ant - Simian   Ant - CCFinder


                There is representation from multiple types of LP
                          and across all categories of LP.                                                     13
RQ2: Are some types of LP more fault-
         prone than others?




      Part 1: Is Late Propagation fault-prone?

 Part 2: Are specific types of late propagation more
                       fault-prone?

                                                       14
Part 1: Is Late Propagation Fault-
                  prone?
                              LP vs. Non-LP
                               Odds Ratios
                   4
                                                                     ArgoUML – Simian
      Odds Ratio




                   3
                                                                    is omitted because
                   2
                                                                    it is not statistically
                   1                                                      significant
                   0
               Ant - Simian   ArgoUML - CCFinder   Ant - CCFinder


In all significant cases, the odds ratio is greater than 1.
 Therefore, LP genealogies are more fault prone than
                    non-LP genealogies.
                                                                                      15
Part 2: Are specific types of late
 propagation more fault-prone?
                    Odds Ratios Between Each LP Type
                        and Non-LP Genealogies
               16
               14
               12
  Odds Ratio




               10
                8
                6
                4
                2
                0
                      LP1     LP2   LP3    LP4    LP5    LP6   LP7     LP8
                                             LP Type
                    Ant - Simian    ArgoUML - CCFinder    Ant - CCFinder

Note: ArgoUML – Simian is omitted because it is not statistically significant   16
RQ2 Observations
• In general, some LP types are not more fault-prone
  than non-LP genealogies (i.e. odds ratio < 1)
• Some types that make up a small proportion of LP
  instances have a very high odds ratio
• LP7 and LP8 occur frequently but have low odds
  ratios.
Each type of LP has a different level of fault-proneness.



                                                       17
RQ3: Which type of LP experiences
 the highest proportion of faults?




                                     18
RQ3: Which type of LP experiences
 the highest proportion of faults?
                                          Percentage of Fault Occurrences
                                             Broken Down by LP Type
  Percentage of Fault Occurrences




                                    80%

                                    60%

                                    40%

                                    20%

                                    0%
                                           LP1   LP2    LP3    LP4    LP5   LP6    LP7    LP8
                                                                 LP Type

                                      Ant - Simian     ArgoUML - CCFinder    Ant - CCFinder

Note: ArgoUML – Simian is omitted because it is not statistically significant                   19
RQ3 Observations
• LP7 and LP8 contribute a large proportion of the
  faults but have lower odds ratios (RQ2)
   – When faults occur, they occur in large numbers
• Overall, LP7 and LP8 are the most dangerous, with
  the other types being system dependent in their
  fault-proneness.


       The proportion of faults is different for
                   each LP type.

                                                      20
Conclusion
• In general, LP genealogies are more fault-prone than
  non-LP genealogies
• LP7 and LP8 are the riskiest, in terms of their fault-
  proneness and magnitude of faults.
   – LP8 contains no propagation of changes
   – LP7 may or may not contain any propagation of
     changes
• The fault-proneness and fault-occurrence is
  dependent on the LP type and is system-dependent.

                                                       21
22

More Related Content

More from Foutse Khomh

Talk-Foutse-SrangeLoop.pdf
Talk-Foutse-SrangeLoop.pdfTalk-Foutse-SrangeLoop.pdf
Talk-Foutse-SrangeLoop.pdfFoutse Khomh
 
Foutse_MSR Vision keynote.pptx
Foutse_MSR Vision keynote.pptxFoutse_MSR Vision keynote.pptx
Foutse_MSR Vision keynote.pptxFoutse Khomh
 
Stack overflow code_laundering
Stack overflow code_launderingStack overflow code_laundering
Stack overflow code_launderingFoutse Khomh
 
Mining the Relationship between Anti-patterns Dependencies and Fault-Proneness
Mining the Relationship between Anti-patterns Dependencies and Fault-PronenessMining the Relationship between Anti-patterns Dependencies and Fault-Proneness
Mining the Relationship between Anti-patterns Dependencies and Fault-PronenessFoutse Khomh
 
Predicting bugs using antipatterns
Predicting bugs using antipatternsPredicting bugs using antipatterns
Predicting bugs using antipatternsFoutse Khomh
 
How does Context affect the Distribution of Software Maintainability Metrics?
How does Context affect the Distribution of Software Maintainability Metrics?How does Context affect the Distribution of Software Maintainability Metrics?
How does Context affect the Distribution of Software Maintainability Metrics?Foutse Khomh
 
On Rapid Releases and Software Testing
On Rapid Releases and Software TestingOn Rapid Releases and Software Testing
On Rapid Releases and Software TestingFoutse Khomh
 
Adapting Linux for Mobile Platforms: An Empirical Study of Android
Adapting Linux for Mobile Platforms: An Empirical Study of AndroidAdapting Linux for Mobile Platforms: An Empirical Study of Android
Adapting Linux for Mobile Platforms: An Empirical Study of AndroidFoutse Khomh
 
Recovering Commit Dependencies for Selective Code Integration in Software Pro...
Recovering Commit Dependencies for Selective Code Integration in Software Pro...Recovering Commit Dependencies for Selective Code Integration in Software Pro...
Recovering Commit Dependencies for Selective Code Integration in Software Pro...Foutse Khomh
 
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...Foutse Khomh
 
Do Faster Releases Improve Software Quality?
Do Faster Releases Improve Software Quality? Do Faster Releases Improve Software Quality?
Do Faster Releases Improve Software Quality? Foutse Khomh
 

More from Foutse Khomh (12)

Talk-Foutse-SrangeLoop.pdf
Talk-Foutse-SrangeLoop.pdfTalk-Foutse-SrangeLoop.pdf
Talk-Foutse-SrangeLoop.pdf
 
Foutse_Khomh.pptx
Foutse_Khomh.pptxFoutse_Khomh.pptx
Foutse_Khomh.pptx
 
Foutse_MSR Vision keynote.pptx
Foutse_MSR Vision keynote.pptxFoutse_MSR Vision keynote.pptx
Foutse_MSR Vision keynote.pptx
 
Stack overflow code_laundering
Stack overflow code_launderingStack overflow code_laundering
Stack overflow code_laundering
 
Mining the Relationship between Anti-patterns Dependencies and Fault-Proneness
Mining the Relationship between Anti-patterns Dependencies and Fault-PronenessMining the Relationship between Anti-patterns Dependencies and Fault-Proneness
Mining the Relationship between Anti-patterns Dependencies and Fault-Proneness
 
Predicting bugs using antipatterns
Predicting bugs using antipatternsPredicting bugs using antipatterns
Predicting bugs using antipatterns
 
How does Context affect the Distribution of Software Maintainability Metrics?
How does Context affect the Distribution of Software Maintainability Metrics?How does Context affect the Distribution of Software Maintainability Metrics?
How does Context affect the Distribution of Software Maintainability Metrics?
 
On Rapid Releases and Software Testing
On Rapid Releases and Software TestingOn Rapid Releases and Software Testing
On Rapid Releases and Software Testing
 
Adapting Linux for Mobile Platforms: An Empirical Study of Android
Adapting Linux for Mobile Platforms: An Empirical Study of AndroidAdapting Linux for Mobile Platforms: An Empirical Study of Android
Adapting Linux for Mobile Platforms: An Empirical Study of Android
 
Recovering Commit Dependencies for Selective Code Integration in Software Pro...
Recovering Commit Dependencies for Selective Code Integration in Software Pro...Recovering Commit Dependencies for Selective Code Integration in Software Pro...
Recovering Commit Dependencies for Selective Code Integration in Software Pro...
 
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
 
Do Faster Releases Improve Software Quality?
Do Faster Releases Improve Software Quality? Do Faster Releases Improve Software Quality?
Do Faster Releases Improve Software Quality?
 

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Late Propagation in Software Clones

  • 1. Late Propagation in Software Clones Liliane Barbour, Foutse Khomh, and Ying Zou
  • 2. Late Propagation (LP) • Definition: An inconsistent change that diverges a clone pair, later followed by a consistent, re- synchronizing change. • It can be risky because failure to propagate changes between clones in a clone pair can lead to faults • In our work, we found that 8-21% of genealogies contain a late propagation 2
  • 3. LP With Propagation Example from ArgoUML //Clone A, Revision 595 add Field(new UMLComboBox(typeModel),1,0,0); //Clone B, Revision 595 add Field(new UMLComboBox(classifierModel),2,0,0); //Diverging Change: Clone A, Revision 602 add Field(new UMLComboBoxNavigator(this,”NavClass”, new UMLComboBox(typeModel)),1,0,0); //Re-synchronizing Change: Clone B, Revision 604 add Field(new UMLComboBoxNavigator (this,”NavClass”, new UMLComboBox(classifierModel)),2,0,0); Clone A Clone B Revision 595 Revision 602 Diverging Change Re-synchronizing Revision 604 Change 3
  • 4. LP Without Propagation Example from Ant //Clone A, Revision 270250 Clone A Clone B if( destFile == null ) { Revision destFile = new File(destDir,file.getName()); 270250 } //Clone B, Revision 270250 Revision Diverging if (destFile == null ) { 270264 Change destFile = new File(destDir,file.getName()); } Revision Re-synchronizing // Diverging Change: Clone A, Revision 270264 271109 Change if ( m_destFile == null ) { m_destFile = new File(m_destDir,m_file.getName()); } //Re-synchronizing Change: Clone A, Revision 271109 if ( destFile == null ) { destFile = new File(destDir,file.getName()); } 4
  • 5. Types of Late Propagation Propagation LP Modified During Modified During Modified During Category Type Diverging Change the Period of Re-synchronizing Divergence Change Propagation LP1 A A B Always Occurs LP2 A A and B B LP3 A A A and B Propagation May LP4 A A and B A or May Not LP5 A A and B A and B Occur LP6 A and B A and B A or B LP7 A and B A and B A and B Propagation LP8 A A A Never Occurs 5
  • 6. Research Questions RQ1: Are there different types of LP? RQ2: Are some types of LP more fault-prone than others? RQ3: Which type of LP experiences the highest proportion of faults? 6
  • 7. Subject Systems # Gen # LP # Gen # LP System # LOC # Revisions CCFinder CCFinder Simian Simian ArgoUML 3.1M 18k 14k 1.1k 111 23 Ant 2.3M 1.0M 30k 4.7k 461 80 7
  • 9. Mining the SVN • Use J-Rex to mine the SVN • Heuristics used to identify reason for commit (Mockus et al., 2000) • Snapshots of all revisions to each Java file are stored in an XML file • Test files are removed 9
  • 10. Clone Detection • Contents of each method revision extracted into individual files • Perform clone detection once on all snapshots • Two existing clone detection tools are used – Simian (text-based) and CCFinder (token-based) 10
  • 11. Building Clone Genealogies • Build clone genealogies using the existing clone list • Query the SVN using diff to track changes to each clone in a clone pair over time. • If a change modifies one of the clones in a clone pair, query the clone list for a matching clone 11
  • 12. RQ1: Are there different types of LP? 12
  • 13. RQ1: Are there different types of LP? Breakdown of LP Type by System 80% Percentage of All LP Occurrences 70% 60% 50% 40% 30% 20% 10% 0% LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8 LP Types ArgoUML - Simian ArgoUML - CCFinder Ant - Simian Ant - CCFinder There is representation from multiple types of LP and across all categories of LP. 13
  • 14. RQ2: Are some types of LP more fault- prone than others? Part 1: Is Late Propagation fault-prone? Part 2: Are specific types of late propagation more fault-prone? 14
  • 15. Part 1: Is Late Propagation Fault- prone? LP vs. Non-LP Odds Ratios 4 ArgoUML – Simian Odds Ratio 3 is omitted because 2 it is not statistically 1 significant 0 Ant - Simian ArgoUML - CCFinder Ant - CCFinder In all significant cases, the odds ratio is greater than 1. Therefore, LP genealogies are more fault prone than non-LP genealogies. 15
  • 16. Part 2: Are specific types of late propagation more fault-prone? Odds Ratios Between Each LP Type and Non-LP Genealogies 16 14 12 Odds Ratio 10 8 6 4 2 0 LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8 LP Type Ant - Simian ArgoUML - CCFinder Ant - CCFinder Note: ArgoUML – Simian is omitted because it is not statistically significant 16
  • 17. RQ2 Observations • In general, some LP types are not more fault-prone than non-LP genealogies (i.e. odds ratio < 1) • Some types that make up a small proportion of LP instances have a very high odds ratio • LP7 and LP8 occur frequently but have low odds ratios. Each type of LP has a different level of fault-proneness. 17
  • 18. RQ3: Which type of LP experiences the highest proportion of faults? 18
  • 19. RQ3: Which type of LP experiences the highest proportion of faults? Percentage of Fault Occurrences Broken Down by LP Type Percentage of Fault Occurrences 80% 60% 40% 20% 0% LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8 LP Type Ant - Simian ArgoUML - CCFinder Ant - CCFinder Note: ArgoUML – Simian is omitted because it is not statistically significant 19
  • 20. RQ3 Observations • LP7 and LP8 contribute a large proportion of the faults but have lower odds ratios (RQ2) – When faults occur, they occur in large numbers • Overall, LP7 and LP8 are the most dangerous, with the other types being system dependent in their fault-proneness. The proportion of faults is different for each LP type. 20
  • 21. Conclusion • In general, LP genealogies are more fault-prone than non-LP genealogies • LP7 and LP8 are the riskiest, in terms of their fault- proneness and magnitude of faults. – LP8 contains no propagation of changes – LP7 may or may not contain any propagation of changes • The fault-proneness and fault-occurrence is dependent on the LP type and is system-dependent. 21
  • 22. 22