SlideShare una empresa de Scribd logo
1 de 28
Rongxin Wu, Hongyu Zhang, Sunghum Kim, Shi-chi Cheung
                 Tsinghua University, China
The Hong Kong University of Science and Technology, Hong Kong   1
• The links between fixed bugs and committed
  changes are important:
  – for measuring software quality
  – for constructing defect prediction models

                                           Committed
Fixed                                      Changes
Bugs
        BugZilla                 CVS/SVN


                                                  2
• To discover the links:
        Mining software repository!
• Heuristics traditionally used to collect links
  between bugs and changes:
   Searching for keywords (such as “Fixed” or
     “Bug”) and Bug IDs
                                          Bugzilla    Mailings
                                     Source
                                                   CVS/      Execution
                                      Code
                                                   SVN         traces
                                                             Crash
                               Require-   Developer
                                ments                 Logs
                                                                     … 3
Defective




        4
Missing Links!




Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets”, FSE 2009   a5
• Missing bug reference in change log




• Irregular bug reference formats
   “issue 681” , “bug 232”, “Fixed for #239”, “see
   #149”, “solve problem 681”,
   Typos: “Fic 239”
                                                      6
• To recover the missing links, we studied many
  bug reports (including comments) and change
  logs
• We have identified the following features of links:
   – Time interval: the bug-fix time and change committed
     time are close




                                                            7
• Time interval between bug-fix time and
  change committed time




                                           8
• Through empirical studies, we have identified
  the following features of links:


  – Bug owner and change committer: they are often
    the same person, or have mapping relationships




                                                     9
Mapping
• Bug owner and change committer                            relationship


       Bug Owner            Change Committer      Project

  dswitkin@gmail.com            dswitkin           ZXing

  dswitkin@google.com      dswitkin@google.com     ZXing

   srowen@gmail.com              srowen            ZXing
 pelili0101@googlemail.c
                                peli0101         Openintents
           om
       Will Rowe                 Wrowe             Apache

       Erik Abele               Erikabele          Apache
                                                                     10
Bug owner and change committer




                                 11
• Through empirical studies, we have identified
  the following features of links:




  – Text similarity: the textual descriptions in the bug
    report are often similar to those in the change
    logs.

                                                       12
• Text similarity       Texts are
                         similar!




                        Using IR
                     technology to
                    measure similarity
                                    13
14
• To determine the criteria of features, we learn
  from the explicit links that can be identified
  through traditional heuristics:
  – For the time interval feature and the text similarity
    feature, we exhaustively search for the optimal
    combination of these two values so that the
    maximum F-measure can be achieved.
  – For the mappings between bug owners and
    change committers, we also learn them from the
    explicit links.

                                                       15
• Determine time interval and similarity threshold
                                   Step by step search the
                                      optimal similarity
                                     threshold and time
                                       interval values
• Determine mapping relationship between bug
  owners and change committers

                                To find the possible mappings
                                     from the explicit links
• To obtain the ground truth (“golden set” of links)
  • For ZXing and OpenIntents, we manually identify the links
  • For Apache, we use the data provided by Bird et al. (annotated
    by an Apache core developer)
• Four possible outcomes
  –   A link we identify is a true link → TP
  –   A link we identify is not a true link → FP
  –   A link we miss is a true link → FN
  –   A link we miss is not a true link → TN
• Evaluation Metrics
                      TP                       TP
       Precision                  Recall
                    TP FP                    TP FN

                    2 * Precision * Recall
       FMeasure
                     Precision Recall                19
F-measure




    Recall                                                          ReLink
                                                                    Traditional



 Precision



             0.65     0.7      0.75      0.8     0.85         0.9

                    Performance of ReLink in Apache Project
21
• What can we do with the recovered links?
  – Improving Maintainability Measurement
    The percentage of bug-fixing changes
    The percentage of buggy files
    Mean time to fix
  – Constructing better software defect
    prediction models
• Maintainability Measurement:




                                 23
24
• Defect Prediction




  ReLink can improve the performance of defect prediction!
• The quality of golden set of links can’t be
  completely assured

• All the datasets are collected from open source
  projects

• The approach needs to be verified in more
  projects

                                                26
• We propose ReLink to recover the missing
  links
• The recovered links have positive impact on
  the follow-up software maintenance studies
  including defect prediction and maintainability
  measurement.
• Future work:
   Further improving the performance of ReLink
   Applying to more projects including industrial
   projects
                                                     27
Thank you!

Dr Hongyu Zhang
School of Software, Tsinghua University
Beijing 100084, China
Email: hongyu@tsinghua.edu.cn
Web: http://sites.google.com/site/hongyujohn/


                                                28

Más contenido relacionado

Similar a ReLink: Recovering Links between Bugs and Changes (ESEC/FSE 2011)

Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...
Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...
Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...OdessaJS Conf
 
An Empirical Study of Unspecified Dependencies in Make-Based Build Systems
An Empirical Study of Unspecified Dependencies in Make-Based Build SystemsAn Empirical Study of Unspecified Dependencies in Make-Based Build Systems
An Empirical Study of Unspecified Dependencies in Make-Based Build Systemscorpaulbezemer
 
REST vs. GraphQL: Critical Look
REST vs. GraphQL: Critical LookREST vs. GraphQL: Critical Look
REST vs. GraphQL: Critical LookNordic APIs
 
Measuring Your Code
Measuring Your CodeMeasuring Your Code
Measuring Your CodeNate Abele
 
Streamlined Geek Talk
Streamlined Geek TalkStreamlined Geek Talk
Streamlined Geek TalkSarah Allen
 
Concurrent Ruby Application Servers
Concurrent Ruby Application ServersConcurrent Ruby Application Servers
Concurrent Ruby Application ServersLin Jen-Shin
 
An Efficient Approach for Requirement Traceability Integrated With Software R...
An Efficient Approach for Requirement Traceability Integrated With Software R...An Efficient Approach for Requirement Traceability Integrated With Software R...
An Efficient Approach for Requirement Traceability Integrated With Software R...IOSR Journals
 
Put Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowPut Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowMassimiliano Di Penta
 
Package Repositories: The Unsung Heroes of Configuration and Release Managem...
Package Repositories:  The Unsung Heroes of Configuration and Release Managem...Package Repositories:  The Unsung Heroes of Configuration and Release Managem...
Package Repositories: The Unsung Heroes of Configuration and Release Managem...IBM UrbanCode Products
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsMarina Santini
 
Technical Challenges in Resource Discovery
Technical Challenges in Resource DiscoveryTechnical Challenges in Resource Discovery
Technical Challenges in Resource DiscoveryPaul Walk
 
Technical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul WalkTechnical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul WalkJisc
 
A Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksA Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksTony Tam
 
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Raffi Khatchadourian
 
2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...
2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...
2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...IEEEBEBTECHSTUDENTSPROJECTS
 
IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...
IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...
IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...IEEEBEBTECHSTUDENTPROJECTS
 
CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19OW2
 

Similar a ReLink: Recovering Links between Bugs and Changes (ESEC/FSE 2011) (20)

Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...
Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...
Олексій Павленко. CONTRACT PROTECTION ON THE FRONTEND SIDE: HOW TO ORGANIZE R...
 
An Empirical Study of Unspecified Dependencies in Make-Based Build Systems
An Empirical Study of Unspecified Dependencies in Make-Based Build SystemsAn Empirical Study of Unspecified Dependencies in Make-Based Build Systems
An Empirical Study of Unspecified Dependencies in Make-Based Build Systems
 
Fp201 unit1 1
Fp201 unit1 1Fp201 unit1 1
Fp201 unit1 1
 
REST vs. GraphQL: Critical Look
REST vs. GraphQL: Critical LookREST vs. GraphQL: Critical Look
REST vs. GraphQL: Critical Look
 
Measuring Your Code
Measuring Your CodeMeasuring Your Code
Measuring Your Code
 
Streamlined Geek Talk
Streamlined Geek TalkStreamlined Geek Talk
Streamlined Geek Talk
 
Concurrent Ruby Application Servers
Concurrent Ruby Application ServersConcurrent Ruby Application Servers
Concurrent Ruby Application Servers
 
An Efficient Approach for Requirement Traceability Integrated With Software R...
An Efficient Approach for Requirement Traceability Integrated With Software R...An Efficient Approach for Requirement Traceability Integrated With Software R...
An Efficient Approach for Requirement Traceability Integrated With Software R...
 
Put Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowPut Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and How
 
Show Some Spine!
Show Some Spine!Show Some Spine!
Show Some Spine!
 
Package Repositories: The Unsung Heroes of Configuration and Release Managem...
Package Repositories:  The Unsung Heroes of Configuration and Release Managem...Package Repositories:  The Unsung Heroes of Configuration and Release Managem...
Package Repositories: The Unsung Heroes of Configuration and Release Managem...
 
An Efficient Approach for Requirement Traceability Integrated With Software ...
An Efficient Approach for Requirement Traceability Integrated  With Software ...An Efficient Approach for Requirement Traceability Integrated  With Software ...
An Efficient Approach for Requirement Traceability Integrated With Software ...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
Technical Challenges in Resource Discovery
Technical Challenges in Resource DiscoveryTechnical Challenges in Resource Discovery
Technical Challenges in Resource Discovery
 
Technical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul WalkTechnical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul Walk
 
A Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksA Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification Links
 
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
 
2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...
2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...
2014 IEEE DOTNET SOFTWARE ENGINEERING PROJECT Automatic summarization of bug ...
 
IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...
IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...
IEEE 2014 DOTNET SOFTWARE ENGINEER PROJECTS Automatic summarization of bug re...
 
CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19CROSSMINER Project at OW2con'19
CROSSMINER Project at OW2con'19
 

Más de Sung Kim

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Sung Kim
 
Time series classification
Time series classificationTime series classification
Time series classificationSung Kim
 
Tensor board
Tensor boardTensor board
Tensor boardSung Kim
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...Sung Kim
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesSung Kim
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Sung Kim
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSung Kim
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Sung Kim
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...Sung Kim
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)Sung Kim
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving softwareSung Kim
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test GenerationSung Kim
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect PredictionSung Kim
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 openingSung Kim
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect PredictionSung Kim
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSung Kim
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learningSung Kim
 

Más de Sung Kim (20)

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)
 
Time series classification
Time series classificationTime series classification
Time series classification
 
Tensor board
Tensor boardTensor board
Tensor board
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 opening
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 

Último

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

ReLink: Recovering Links between Bugs and Changes (ESEC/FSE 2011)

  • 1. Rongxin Wu, Hongyu Zhang, Sunghum Kim, Shi-chi Cheung Tsinghua University, China The Hong Kong University of Science and Technology, Hong Kong 1
  • 2. • The links between fixed bugs and committed changes are important: – for measuring software quality – for constructing defect prediction models Committed Fixed Changes Bugs BugZilla CVS/SVN 2
  • 3. • To discover the links: Mining software repository! • Heuristics traditionally used to collect links between bugs and changes: Searching for keywords (such as “Fixed” or “Bug”) and Bug IDs Bugzilla Mailings Source CVS/ Execution Code SVN traces Crash Require- Developer ments Logs … 3
  • 5. Missing Links! Bird et al. “Fair and Balanced? Bias in Bug-Fix Datasets”, FSE 2009 a5
  • 6. • Missing bug reference in change log • Irregular bug reference formats  “issue 681” , “bug 232”, “Fixed for #239”, “see #149”, “solve problem 681”,  Typos: “Fic 239” 6
  • 7. • To recover the missing links, we studied many bug reports (including comments) and change logs • We have identified the following features of links: – Time interval: the bug-fix time and change committed time are close 7
  • 8. • Time interval between bug-fix time and change committed time 8
  • 9. • Through empirical studies, we have identified the following features of links: – Bug owner and change committer: they are often the same person, or have mapping relationships 9
  • 10. Mapping • Bug owner and change committer relationship Bug Owner Change Committer Project dswitkin@gmail.com dswitkin ZXing dswitkin@google.com dswitkin@google.com ZXing srowen@gmail.com srowen ZXing pelili0101@googlemail.c peli0101 Openintents om Will Rowe Wrowe Apache Erik Abele Erikabele Apache 10
  • 11. Bug owner and change committer 11
  • 12. • Through empirical studies, we have identified the following features of links: – Text similarity: the textual descriptions in the bug report are often similar to those in the change logs. 12
  • 13. • Text similarity Texts are similar! Using IR technology to measure similarity 13
  • 14. 14
  • 15. • To determine the criteria of features, we learn from the explicit links that can be identified through traditional heuristics: – For the time interval feature and the text similarity feature, we exhaustively search for the optimal combination of these two values so that the maximum F-measure can be achieved. – For the mappings between bug owners and change committers, we also learn them from the explicit links. 15
  • 16. • Determine time interval and similarity threshold Step by step search the optimal similarity threshold and time interval values
  • 17. • Determine mapping relationship between bug owners and change committers To find the possible mappings from the explicit links
  • 18. • To obtain the ground truth (“golden set” of links) • For ZXing and OpenIntents, we manually identify the links • For Apache, we use the data provided by Bird et al. (annotated by an Apache core developer)
  • 19. • Four possible outcomes – A link we identify is a true link → TP – A link we identify is not a true link → FP – A link we miss is a true link → FN – A link we miss is not a true link → TN • Evaluation Metrics TP TP Precision Recall TP FP TP FN 2 * Precision * Recall FMeasure Precision Recall 19
  • 20. F-measure Recall ReLink Traditional Precision 0.65 0.7 0.75 0.8 0.85 0.9 Performance of ReLink in Apache Project
  • 21. 21
  • 22. • What can we do with the recovered links? – Improving Maintainability Measurement The percentage of bug-fixing changes The percentage of buggy files Mean time to fix – Constructing better software defect prediction models
  • 24. 24
  • 25. • Defect Prediction ReLink can improve the performance of defect prediction!
  • 26. • The quality of golden set of links can’t be completely assured • All the datasets are collected from open source projects • The approach needs to be verified in more projects 26
  • 27. • We propose ReLink to recover the missing links • The recovered links have positive impact on the follow-up software maintenance studies including defect prediction and maintainability measurement. • Future work:  Further improving the performance of ReLink  Applying to more projects including industrial projects 27
  • 28. Thank you! Dr Hongyu Zhang School of Software, Tsinghua University Beijing 100084, China Email: hongyu@tsinghua.edu.cn Web: http://sites.google.com/site/hongyujohn/ 28