SlideShare una empresa de Scribd logo
1 de 65
Automatically Generated Patches as 
Debugging Aids: A Human Study 
Yida Tao, Jindae Kim, Sunghun Kim 
Dept. of CSE, The Hong Kong University of Science and Technology 
Chang Xu 
State Key Lab for Novel Software Technology, Nanjing University
• Promising research progress 
• ClearView1: Prevent all 10 Firefox exploits 
• GenProg2: Fix 55/105 real bugs 
[1] Automatically Patching Errors in Deployed Software. 
Perkins et al. SOSP’09 
[2] A systematic study of automated program repair: fixing 
55 out of 105 bugs for $8 each. Le Goues et al. ICSE’12 
2 
Automatic Program Repair
3 
Automatic Program Repair
“It won't get your bug patched any quicker. 
You’ll just have shifted the coders' attention away from 
their own app's bugs, and onto the repair tool’s bugs.” 
- Slashdot discussion: 
http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- 
Bypassing-the-Source-Code 
4 
Automatic Program Repair
#what-could-possibly-go-wrong 
• Blackbox repair 
• Increasing maintenance cost 
• Vulnerable to attack 
- Slashdot discussion: 
http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- 
Bypassing-the-Source-Code 
- A human study of patch maintainability. ISSTA’12 
5 
- Automatic patch generation learned from human-written patches. ICSE’13
#what-could-possibly-go-wrong 
#program-out-of-control 
- Slashdot discussion: 
http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- 
Bypassing-the-Source-Code 
- A human study of patch maintainability. ISSTA’12 
6 
- Automatic patch generation learned from human-written patches. ICSE’13 
• Blackbox repair 
• Increasing maintenance cost 
• Vulnerable to attack
Use automatically 
generated patches as 
debugging aids 
7
Use automatically 
generated patches as 
debugging aids 
Our Human Study 
• Investigate the usefulness of 
generated patches as debugging aids 
• Discuss the impact of patch quality 
on debugging performance 
• Explore practitioners’ feedback on 
adopting automatic program repair 
8
Methodology 
9
Debugging aid Participants Bugs 
10 
is given to Debug
Debugging aid Participants Bugs 11
Low-quality 
generated patch 
Debugging aid Participants Bugs 12
Low-quality 
generated patch 
High-quality 
generated patch 
Debugging aid Participants Bugs 13
Low-quality 
generated patch 
High-quality 
generated patch 
Buggy method 
location 
Debugging aid Participants Bugs 14
Grad: 
44 
MTurk: 
23 
Engr: 
28 
95 Participants 
CS graduate students 
Amazon Mechanical 
Turk workers 
Industrial software 
engineers 
Debugging aid Participants Bugs 15
Debugging aid Participants Bugs 16
44 Graduate students 
• Between-group design 
14 students 
15 students 
15 students 
Debugging aid Participants Bugs 17
44 Graduate students 
• Between-group design 
Low-quality generated patch 
High-quality generated patch 
Buggy method location 
14 students 
15 students 
15 students 
Debugging aid Participants Bugs 18
44 Graduate students 
• Between-group design 
• Onsite setting 
• Eclipse IDE 
• Supervised session 
Low-quality generated patch 
High-quality generated patch 
Buggy method location 
14 students 
15 students 
15 students 
Debugging aid Participants Bugs 19
Low-quality 
generated patch 
High-quality 
generated patch 
Buggy method 
location 
Remote participants 
(28 Engr + 23 MTurk) 
• Within-group design 
Debugging aid Participants Bugs 20
Remote participants 
(28 Engr + 23 MTurk) 
• Within-group design 
• Online debugging system 
Low-quality 
generated patch 
High-quality 
generated patch 
Buggy method 
location 
Debugging aid Participants Bugs 21
Debugging aid Participants Bugs 22
Bug Selection Criteria 
• Real bugs 
• The bug has accepted patches written by developers 
• Proper number of bugs 
• The bug has generated patches with different quality 
Debugging aid Participants Bugs 23
Automatic patch generation learned from human-written patches. 
Kim et al. ICSE’13 
Debugging aid Participants Bugs 24
Automatic patch generation learned from human-written patches. 
Kim et al. ICSE’13 
for (int i=0; i<parenCount; i++) 
SubString sub = (SubString)parens.get(i) 
if(sub!=null){ 
args[i+1] = sub.toString(); 
Auto-generated patch A Auto-generated patch B 
Debugging aid Participants Bugs 25 
} 
} 
for (int i=0; i<parenCount; i++) 
SubString sub = (SubString)parens.get(i) 
args[parenCount+1] = 
new Integer(reImpl.leftContext.length); 
}
Automatic patch generation learned from human-written patches. 
Kim et al. ICSE’13 
for (int i=0; i<parenCount; i++) 
SubString sub = (SubString)parens.get(i) 
if(sub!=null){ 
args[i+1] = sub.toString(); 
Auto-generated patch A Auto-generated patch B 
avg. ranking from 85 devs and students 
Debugging aid Participants Bugs 26 
} 
} 
for (int i=0; i<parenCount; i++) 
SubString sub = (SubString)parens.get(i) 
args[parenCount+1] = 
new Integer(reImpl.leftContext.length); 
} 
1.6 
2.8
Automatic patch generation learned from human-written patches. 
Kim et al. ICSE’13 
for (int i=0; i<parenCount; i++) 
SubString sub = (SubString)parens.get(i) 
if(sub!=null){ 
args[i+1] = sub.toString(); 
Auto-generated patch A Auto-generated patch B 
High-Quality Patch Low-Quality patch 
avg. ranking from 85 devs and students 
Debugging aid Participants Bugs 27 
} 
} 
for (int i=0; i<parenCount; i++) 
SubString sub = (SubString)parens.get(i) 
args[parenCount+1] = 
new Integer(reImpl.leftContext.length); 
} 
1.6 
2.8
Debugging aid Participants Bugs 28
Participants submit 337 patches as their debugging outcome 
Debugging aid Participants Bugs 29
Location 
109 
LowQ 
112 
HighQ 
# submitted patches 116 
w.r.t debugging aid 
Participants submit 337 patches as their debugging outcome 
Debugging aid Participants Bugs 30
Location 
109 
LowQ 
112 
HighQ 
# submitted patches 116 
w.r.t debugging aid 
Bug1 
66 
Bug2 
74 
Bug5 
62 
Bug3 
59 
Bug4 
76 
# submitted patches 
w.r.t bugs 
Participants submit 337 patches as their debugging outcome 
Debugging aid Participants Bugs 31
Evaluation of debugging performance 
32
Patch Correctness 
Correctness 
33
Patch Correctness 
• Passing test cases 
Correctness 
34
Patch Correctness 
• Passing test cases 
• Matching the semantics of original accepted patches 
Correctness 
35
Patch Correctness 
• Passing test cases 
• Matching the semantics of original accepted patches 
• 3 evaluators 
Correctness 
36
Debugging Time 
• Eclipse Plug-in 
•Website Timer 
Correctness 
Debugging time 
37
Correctness 
Debugging time 
• Independent variables 
• Debugging aids 
• Bugs 
• Participant types 
• Programming experience 
38
Multiple Regression Analysis 
Correctness 
Debugging time 
• Independent variables 
• Debugging aids 
• Bugs 
• Participant types 
• Programming experience 
correctness = α0 + α1 ∙ x1 + α2 ∙ x2 + α3 ∙ x3 + α4 ∙ x4 
debugging time = β0 + β1 ∙ x1 + β2 ∙ x2 + β3 ∙ x3 + β4 ∙ x4 
39
Post-study Survey 
• Helpfulness of debugging aids 
• Difficulty of bugs 
• Opinions on using generated patches as debugging aids 
Correctness 
Debugging time 
Survey feedback 
40
Results 
41
High-quality patches significantly 
improve debugging correctness 
1 
48% 
33% 
71% 
42
High-quality patches significantly 
improve debugging correctness 
1 
% of correct patches 
48% 
33% 
71% 
43 
Location LowQ HighQ
High-quality patches significantly 
improve debugging correctness 
% of correct patches 
Location LowQ HighQ 
1 
Positive Coefficient = 1.25 
p-value= 0.00 < 0.05 48% 
71% 
44
Low-quality patches slightly 
undermine debugging correctness 
% of correct patches 
Location LowQ HighQ 
2 
48% 
33% 
71% 
45
Low-quality patches slightly 
undermine debugging correctness 
% of correct patches 
Location LowQ HighQ 
2 
Negative Coefficient = -0.55 
p-value= 0.09 48% 
33% 
71% 
46
Low-quality patches can 
undermine debugging correctness 
% of correct patches 
Location LowQ HighQ 
2 
Negative Coefficient = -0.55 
p-value= 0.09 48% 
33% 
71% 
47
High-quality patches are more useful for 
3 difficult bugs 
48
High-quality patches are more useful for 
3 difficult bugs 
49 
5 
4 
3 
2 
Bug Difficulty 
Bug1 
Math-280 
Bug2 
Rhino-114493 
Bug3 
Rhino-192226 
Bug4 
Rhino-217379 
Bug5 
Rhino-76683
High-quality patches are more useful for 
3 difficult bugs 
90% 
80% 
70% 
60% 
50% 
40% 
30% 
20% 
10% 
0% 
% of correct patches 
Bug1 Bug2 Bug3 Bug4 Bug5 
Location LowQ HighQ 
50 
5 
4 
3 
2 
Bug Difficulty 
Bug1 
Math-280 
Bug2 
Rhino-114493 
Bug3 
Rhino-192226 
Bug4 
Rhino-217379 
Bug5 
Rhino-76683
4 
The type of debugging aid does not affect 
debugging time 
51
4 
The type of debugging aid does not affect 
debugging time 
80 
60 
40 
20 
0 
Debugging time (min) 
Location LowQ HighQ 
52
5 
Other factors’ impact on debugging 
performance 
Difficult bugs significantly slow down debugging 
Engr and MTurk are more likely to debug correctly 
Novices tend to benefit more from HighQ patches 
53
Helpfulness of debugging aids 
Very helpful 
Helpful 
Medium 
Slightly Helpful 
Not Helpful 
6 
54 
Participants consider high-quality generated patches 
much more helpful than low-quality patches 
Low-quality 
generated patch 
High-quality 
generated patch 
Mann-Whitney U test 
p-value = 0.001
Feedback 
55
56
Quick starting point 
• Point to the buggy area 
• Brainstorm 
“They would seem to be useful 
in helping find various ideas 
around fixing the issue, even 
if the patch isn’t always 
correct on its own.” 
57
Quick starting point 
• Point to the buggy area 
• Brainstorm 
Confusing, incomplete, misleading 
• Wrong lead, especially for novices 
• Require further human perfection 
“They would seem to be useful 
in helping find various ideas 
around fixing the issue, even 
if the patch isn’t always 
correct on its own.” 
58
“Generated patches would be 
good at recognizing obvious 
problems” 
“…but may not recognize more 
involved defects.” 
59
“Generated patches would be 
good at recognizing obvious 
problems” 
“…but may not recognize more 
involved defects.” 
60 
“Generated patches simplify 
the problem” 
“…but they may over-simplify it by 
not addressing the root cause.”
“I would use generated 
patches as debugging aids, as 
they provide extra diagnostic 
information” 
61
“I would use generated 
patches as debugging aids, as 
they provide extra diagnostic 
information” 
“…along with access to standard 
debugging tools.” 
62
Threats to Validity 
63
Threats to Validity 
• Bugs and generated patches may not be representative 
• Quality measure of generated patches may not generalize 
• May not generalize to domain experts 
• Possibility of blindly reusing generated patches 
• Remove patches that are submitted less than 1 minute 
64
Takeaway 
65 
• Auto-generated patches can be useful as 
debugging aids 
• Participants fix bugs more correctly with auto-generated 
patches 
• Quality control is required 
• Participants’ debugging correctness is 
compromised with low-quality generated patches 
• Maximize the benefits 
• Difficult bugs 
• Novice developers

Más contenido relacionado

La actualidad más candente

How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...Sung Kim
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesSung Kim
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Sung Kim
 
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...Pavneet Singh Kochhar
 
Presentation slides: "How to get 100% code coverage"
Presentation slides: "How to get 100% code coverage" Presentation slides: "How to get 100% code coverage"
Presentation slides: "How to get 100% code coverage" Rapita Systems Ltd
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect PredictionSung Kim
 
Dissertation Defense
Dissertation DefenseDissertation Defense
Dissertation DefenseSung Kim
 
Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.Alexandre (Shura) Iline
 
Automated Program Repair Keynote talk
Automated Program Repair Keynote talkAutomated Program Repair Keynote talk
Automated Program Repair Keynote talkAbhik Roychoudhury
 
Manual testing interview questions and answers
Manual testing interview questions and answersManual testing interview questions and answers
Manual testing interview questions and answersTestbytes
 
SBST 2019 Keynote
SBST 2019 Keynote SBST 2019 Keynote
SBST 2019 Keynote Shiva Nejati
 
Declarative Performance Testing Automation - Automating Performance Testing f...
Declarative Performance Testing Automation - Automating Performance Testing f...Declarative Performance Testing Automation - Automating Performance Testing f...
Declarative Performance Testing Automation - Automating Performance Testing f...Vincenzo Ferme
 
Diffy : Automatic Testing of Microservices @ Twitter
Diffy : Automatic Testing of Microservices @ TwitterDiffy : Automatic Testing of Microservices @ Twitter
Diffy : Automatic Testing of Microservices @ TwitterPuneet Khanduri
 
Testing a GPS application | Testbytes
Testing a GPS application | TestbytesTesting a GPS application | Testbytes
Testing a GPS application | TestbytesTestbytes
 
Static analysis works for mission-critical systems, why not yours?
Static analysis works for mission-critical systems, why not yours? Static analysis works for mission-critical systems, why not yours?
Static analysis works for mission-critical systems, why not yours? Rogue Wave Software
 
Automated Developer Testing: Achievements and Challenges
Automated Developer Testing: Achievements and ChallengesAutomated Developer Testing: Achievements and Challenges
Automated Developer Testing: Achievements and ChallengesTao Xie
 

La actualidad más candente (20)

How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)
 
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
 
Presentation slides: "How to get 100% code coverage"
Presentation slides: "How to get 100% code coverage" Presentation slides: "How to get 100% code coverage"
Presentation slides: "How to get 100% code coverage"
 
Pragmatic Code Coverage
Pragmatic Code CoveragePragmatic Code Coverage
Pragmatic Code Coverage
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
 
Dissertation Defense
Dissertation DefenseDissertation Defense
Dissertation Defense
 
Mobilesoft 2017 Keynote
Mobilesoft 2017 KeynoteMobilesoft 2017 Keynote
Mobilesoft 2017 Keynote
 
Symbexecsearch
SymbexecsearchSymbexecsearch
Symbexecsearch
 
Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.
 
Automated Program Repair Keynote talk
Automated Program Repair Keynote talkAutomated Program Repair Keynote talk
Automated Program Repair Keynote talk
 
Manual testing interview questions and answers
Manual testing interview questions and answersManual testing interview questions and answers
Manual testing interview questions and answers
 
SBST 2019 Keynote
SBST 2019 Keynote SBST 2019 Keynote
SBST 2019 Keynote
 
Declarative Performance Testing Automation - Automating Performance Testing f...
Declarative Performance Testing Automation - Automating Performance Testing f...Declarative Performance Testing Automation - Automating Performance Testing f...
Declarative Performance Testing Automation - Automating Performance Testing f...
 
Diffy : Automatic Testing of Microservices @ Twitter
Diffy : Automatic Testing of Microservices @ TwitterDiffy : Automatic Testing of Microservices @ Twitter
Diffy : Automatic Testing of Microservices @ Twitter
 
Testing a GPS application | Testbytes
Testing a GPS application | TestbytesTesting a GPS application | Testbytes
Testing a GPS application | Testbytes
 
Formal method
Formal methodFormal method
Formal method
 
Static analysis works for mission-critical systems, why not yours?
Static analysis works for mission-critical systems, why not yours? Static analysis works for mission-critical systems, why not yours?
Static analysis works for mission-critical systems, why not yours?
 
Automated Developer Testing: Achievements and Challenges
Automated Developer Testing: Achievements and ChallengesAutomated Developer Testing: Achievements and Challenges
Automated Developer Testing: Achievements and Challenges
 

Destacado

Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksSung Kim
 
How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012Sung Kim
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving softwareSung Kim
 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesSung Kim
 
A Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionSung Kim
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learningSung Kim
 
Tensor board
Tensor boardTensor board
Tensor boardSung Kim
 
Time series classification
Time series classificationTime series classification
Time series classificationSung Kim
 

Destacado (9)

Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social Networks
 
How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012How Do Software Engineers Understand Code Changes? FSE 2012
How Do Software Engineers Understand Code Changes? FSE 2012
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patches
 
A Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash ReproductionA Survey on Automatic Test Generation and Crash Reproduction
A Survey on Automatic Test Generation and Crash Reproduction
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 
Tensor board
Tensor boardTensor board
Tensor board
 
Time series classification
Time series classificationTime series classification
Time series classification
 

Similar a Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Requirements Based Testing
Requirements Based TestingRequirements Based Testing
Requirements Based TestingSSA KPI
 
Writing acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patchesWriting acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patchesYida Tao
 
Patterns for Extracting High Level Information from Bug Reports
Patterns for Extracting High Level Information from Bug ReportsPatterns for Extracting High Level Information from Bug Reports
Patterns for Extracting High Level Information from Bug ReportsRodrigo Rocha
 
FutureOfTesting2008
FutureOfTesting2008FutureOfTesting2008
FutureOfTesting2008vipulkocher
 
TLC2018 Thomas Haver: The Automation Firehose - Be Strategic and Tactical
TLC2018 Thomas Haver: The Automation Firehose - Be Strategic and TacticalTLC2018 Thomas Haver: The Automation Firehose - Be Strategic and Tactical
TLC2018 Thomas Haver: The Automation Firehose - Be Strategic and TacticalAnna Royzman
 
Software testing foundation
Software testing foundationSoftware testing foundation
Software testing foundationAnirudh503501
 
Verification Bug Metrics: A Different Approach
Verification Bug Metrics: A Different ApproachVerification Bug Metrics: A Different Approach
Verification Bug Metrics: A Different ApproachDVClub
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTao Xie
 
How to Design a Program Repair Bot? Insights from the Repairnator Project
How to Design a Program Repair Bot? Insights from the Repairnator ProjectHow to Design a Program Repair Bot? Insights from the Repairnator Project
How to Design a Program Repair Bot? Insights from the Repairnator ProjectSimon Urli
 
Growing as a software craftsperson (part 1) From Pune Software Craftsmanship.
Growing as a software craftsperson (part 1)  From Pune Software Craftsmanship.Growing as a software craftsperson (part 1)  From Pune Software Craftsmanship.
Growing as a software craftsperson (part 1) From Pune Software Craftsmanship.Dattatray Kale
 
Refactoring workshop
Refactoring workshop Refactoring workshop
Refactoring workshop Itzik Saban
 
Testing, fixing, and proving with contracts
Testing, fixing, and proving with contractsTesting, fixing, and proving with contracts
Testing, fixing, and proving with contractsCarlo A. Furia
 
Tool Up Your LAMP Stack
Tool Up Your LAMP StackTool Up Your LAMP Stack
Tool Up Your LAMP StackLorna Mitchell
 
Real%20 world%20software%20testing%20white%20backgoround1
Real%20 world%20software%20testing%20white%20backgoround1Real%20 world%20software%20testing%20white%20backgoround1
Real%20 world%20software%20testing%20white%20backgoround1Varun Sharma
 
Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...
Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...
Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...Andrey Karpov
 

Similar a Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014) (20)

Code Reviews
Code ReviewsCode Reviews
Code Reviews
 
Requirements Based Testing
Requirements Based TestingRequirements Based Testing
Requirements Based Testing
 
Code review prediction
Code review predictionCode review prediction
Code review prediction
 
Writing acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patchesWriting acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patches
 
Patterns for Extracting High Level Information from Bug Reports
Patterns for Extracting High Level Information from Bug ReportsPatterns for Extracting High Level Information from Bug Reports
Patterns for Extracting High Level Information from Bug Reports
 
FutureOfTesting2008
FutureOfTesting2008FutureOfTesting2008
FutureOfTesting2008
 
Continuous Testing
Continuous TestingContinuous Testing
Continuous Testing
 
TLC2018 Thomas Haver: The Automation Firehose - Be Strategic and Tactical
TLC2018 Thomas Haver: The Automation Firehose - Be Strategic and TacticalTLC2018 Thomas Haver: The Automation Firehose - Be Strategic and Tactical
TLC2018 Thomas Haver: The Automation Firehose - Be Strategic and Tactical
 
Software testing foundation
Software testing foundationSoftware testing foundation
Software testing foundation
 
Verification Bug Metrics: A Different Approach
Verification Bug Metrics: A Different ApproachVerification Bug Metrics: A Different Approach
Verification Bug Metrics: A Different Approach
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
 
How to Design a Program Repair Bot? Insights from the Repairnator Project
How to Design a Program Repair Bot? Insights from the Repairnator ProjectHow to Design a Program Repair Bot? Insights from the Repairnator Project
How to Design a Program Repair Bot? Insights from the Repairnator Project
 
Growing as a software craftsperson (part 1) From Pune Software Craftsmanship.
Growing as a software craftsperson (part 1)  From Pune Software Craftsmanship.Growing as a software craftsperson (part 1)  From Pune Software Craftsmanship.
Growing as a software craftsperson (part 1) From Pune Software Craftsmanship.
 
Refactoring workshop
Refactoring workshop Refactoring workshop
Refactoring workshop
 
Testing, fixing, and proving with contracts
Testing, fixing, and proving with contractsTesting, fixing, and proving with contracts
Testing, fixing, and proving with contracts
 
Cast 14 2 sample exam
Cast 14 2 sample examCast 14 2 sample exam
Cast 14 2 sample exam
 
Tool up your lamp stack
Tool up your lamp stackTool up your lamp stack
Tool up your lamp stack
 
Tool Up Your LAMP Stack
Tool Up Your LAMP StackTool Up Your LAMP Stack
Tool Up Your LAMP Stack
 
Real%20 world%20software%20testing%20white%20backgoround1
Real%20 world%20software%20testing%20white%20backgoround1Real%20 world%20software%20testing%20white%20backgoround1
Real%20 world%20software%20testing%20white%20backgoround1
 
Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...
Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...
Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...
 

Más de Sung Kim

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 openingSung Kim
 
Defect, defect, defect: PROMISE 2012 Keynote
Defect, defect, defect: PROMISE 2012 Keynote Defect, defect, defect: PROMISE 2012 Keynote
Defect, defect, defect: PROMISE 2012 Keynote Sung Kim
 
Predicting Recurring Crash Stacks (ASE 2012)
Predicting Recurring Crash Stacks (ASE 2012)Predicting Recurring Crash Stacks (ASE 2012)
Predicting Recurring Crash Stacks (ASE 2012)Sung Kim
 
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Sung Kim
 
Software Development Meets the Wisdom of Crowds
Software Development Meets the Wisdom of CrowdsSoftware Development Meets the Wisdom of Crowds
Software Development Meets the Wisdom of CrowdsSung Kim
 
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)Sung Kim
 
Self-defending software: Automatically patching errors in deployed software ...
Self-defending software: Automatically patching  errors in deployed software ...Self-defending software: Automatically patching  errors in deployed software ...
Self-defending software: Automatically patching errors in deployed software ...Sung Kim
 
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)Sung Kim
 

Más de Sung Kim (9)

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 opening
 
Defect, defect, defect: PROMISE 2012 Keynote
Defect, defect, defect: PROMISE 2012 Keynote Defect, defect, defect: PROMISE 2012 Keynote
Defect, defect, defect: PROMISE 2012 Keynote
 
Predicting Recurring Crash Stacks (ASE 2012)
Predicting Recurring Crash Stacks (ASE 2012)Predicting Recurring Crash Stacks (ASE 2012)
Predicting Recurring Crash Stacks (ASE 2012)
 
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
 
Software Development Meets the Wisdom of Crowds
Software Development Meets the Wisdom of CrowdsSoftware Development Meets the Wisdom of Crowds
Software Development Meets the Wisdom of Crowds
 
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
BugTriage with Bug Tossing Graphs (ESEC/FSE 2009)
 
Self-defending software: Automatically patching errors in deployed software ...
Self-defending software: Automatically patching  errors in deployed software ...Self-defending software: Automatically patching  errors in deployed software ...
Self-defending software: Automatically patching errors in deployed software ...
 
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
ReCrash: Making crashes reproducible by preserving object states (ECOOP 2008)
 

Último

Understanding Jainism Beliefs and Information.pptx
Understanding Jainism Beliefs and Information.pptxUnderstanding Jainism Beliefs and Information.pptx
Understanding Jainism Beliefs and Information.pptxjainismworldseo
 
Asli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in LahoreAsli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in Lahoreamil baba kala jadu
 
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptx
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptxThe King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptx
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptxOH TEIK BIN
 
No 1 astrologer amil baba in Canada Usa astrologer in Canada
No 1 astrologer amil baba in Canada Usa astrologer in CanadaNo 1 astrologer amil baba in Canada Usa astrologer in Canada
No 1 astrologer amil baba in Canada Usa astrologer in CanadaAmil Baba Mangal Maseeh
 
Topmost Black magic specialist in Saudi Arabia Or Bangali Amil baba in UK Or...
Topmost Black magic specialist in Saudi Arabia  Or Bangali Amil baba in UK Or...Topmost Black magic specialist in Saudi Arabia  Or Bangali Amil baba in UK Or...
Topmost Black magic specialist in Saudi Arabia Or Bangali Amil baba in UK Or...baharayali
 
Deerfoot Church of Christ Bulletin 4 21 24
Deerfoot Church of Christ Bulletin 4 21 24Deerfoot Church of Christ Bulletin 4 21 24
Deerfoot Church of Christ Bulletin 4 21 24deerfootcoc
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiAmil Baba Naveed Bangali
 
Amil baba kala jadu expert asli ilm ka malik
Amil baba kala jadu expert asli ilm ka malikAmil baba kala jadu expert asli ilm ka malik
Amil baba kala jadu expert asli ilm ka malikamil baba kala jadu
 
A Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - BlessedA Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - BlessedVintage Church
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiAmil Baba Mangal Maseeh
 
A357 Hate can stir up strife, but love can cover up all mistakes. hate, love...
A357 Hate can stir up strife, but love can cover up all mistakes.  hate, love...A357 Hate can stir up strife, but love can cover up all mistakes.  hate, love...
A357 Hate can stir up strife, but love can cover up all mistakes. hate, love...franktsao4
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiAmil Baba Mangal Maseeh
 
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptxThe Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptxNetwork Bible Fellowship
 
Topmost Kala ilam expert in UK Or Black magic specialist in UK Or Black magic...
Topmost Kala ilam expert in UK Or Black magic specialist in UK Or Black magic...Topmost Kala ilam expert in UK Or Black magic specialist in UK Or Black magic...
Topmost Kala ilam expert in UK Or Black magic specialist in UK Or Black magic...baharayali
 
Repentance involves Faith Powerpoint presentation
Repentance involves Faith Powerpoint presentationRepentance involves Faith Powerpoint presentation
Repentance involves Faith Powerpoint presentationcorderos484
 
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialist
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialistAsli amil baba in Karachi Pakistan and best astrologer Black magic specialist
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialistAmil Baba Mangal Maseeh
 
Dubai Call Girls Skinny Mandy O525547819 Call Girls Dubai
Dubai Call Girls Skinny Mandy O525547819 Call Girls DubaiDubai Call Girls Skinny Mandy O525547819 Call Girls Dubai
Dubai Call Girls Skinny Mandy O525547819 Call Girls Dubaikojalkojal131
 
Amil baba in uk amil baba in Australia amil baba in canada
Amil baba in uk amil baba in Australia amil baba in canadaAmil baba in uk amil baba in Australia amil baba in canada
Amil baba in uk amil baba in Australia amil baba in canadaamil baba kala jadu
 

Último (20)

Understanding Jainism Beliefs and Information.pptx
Understanding Jainism Beliefs and Information.pptxUnderstanding Jainism Beliefs and Information.pptx
Understanding Jainism Beliefs and Information.pptx
 
Asli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in LahoreAsli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in Lahore
 
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptx
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptxThe King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptx
The King 'Great Goodness' Part 1 Mahasilava Jataka (Eng. & Chi.).pptx
 
No 1 astrologer amil baba in Canada Usa astrologer in Canada
No 1 astrologer amil baba in Canada Usa astrologer in CanadaNo 1 astrologer amil baba in Canada Usa astrologer in Canada
No 1 astrologer amil baba in Canada Usa astrologer in Canada
 
Topmost Black magic specialist in Saudi Arabia Or Bangali Amil baba in UK Or...
Topmost Black magic specialist in Saudi Arabia  Or Bangali Amil baba in UK Or...Topmost Black magic specialist in Saudi Arabia  Or Bangali Amil baba in UK Or...
Topmost Black magic specialist in Saudi Arabia Or Bangali Amil baba in UK Or...
 
Deerfoot Church of Christ Bulletin 4 21 24
Deerfoot Church of Christ Bulletin 4 21 24Deerfoot Church of Christ Bulletin 4 21 24
Deerfoot Church of Christ Bulletin 4 21 24
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
 
Amil baba kala jadu expert asli ilm ka malik
Amil baba kala jadu expert asli ilm ka malikAmil baba kala jadu expert asli ilm ka malik
Amil baba kala jadu expert asli ilm ka malik
 
St. Louise de Marillac: Animator of the Confraternities of Charity
St. Louise de Marillac: Animator of the Confraternities of CharitySt. Louise de Marillac: Animator of the Confraternities of Charity
St. Louise de Marillac: Animator of the Confraternities of Charity
 
A Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - BlessedA Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
 
A357 Hate can stir up strife, but love can cover up all mistakes. hate, love...
A357 Hate can stir up strife, but love can cover up all mistakes.  hate, love...A357 Hate can stir up strife, but love can cover up all mistakes.  hate, love...
A357 Hate can stir up strife, but love can cover up all mistakes. hate, love...
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
 
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptxThe Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
 
Topmost Kala ilam expert in UK Or Black magic specialist in UK Or Black magic...
Topmost Kala ilam expert in UK Or Black magic specialist in UK Or Black magic...Topmost Kala ilam expert in UK Or Black magic specialist in UK Or Black magic...
Topmost Kala ilam expert in UK Or Black magic specialist in UK Or Black magic...
 
Repentance involves Faith Powerpoint presentation
Repentance involves Faith Powerpoint presentationRepentance involves Faith Powerpoint presentation
Repentance involves Faith Powerpoint presentation
 
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialist
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialistAsli amil baba in Karachi Pakistan and best astrologer Black magic specialist
Asli amil baba in Karachi Pakistan and best astrologer Black magic specialist
 
Dubai Call Girls Skinny Mandy O525547819 Call Girls Dubai
Dubai Call Girls Skinny Mandy O525547819 Call Girls DubaiDubai Call Girls Skinny Mandy O525547819 Call Girls Dubai
Dubai Call Girls Skinny Mandy O525547819 Call Girls Dubai
 
Amil baba in uk amil baba in Australia amil baba in canada
Amil baba in uk amil baba in Australia amil baba in canadaAmil baba in uk amil baba in Australia amil baba in canada
Amil baba in uk amil baba in Australia amil baba in canada
 
Top 8 Krishna Bhajan Lyrics in English.pdf
Top 8 Krishna Bhajan Lyrics in English.pdfTop 8 Krishna Bhajan Lyrics in English.pdf
Top 8 Krishna Bhajan Lyrics in English.pdf
 

Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

  • 1. Automatically Generated Patches as Debugging Aids: A Human Study Yida Tao, Jindae Kim, Sunghun Kim Dept. of CSE, The Hong Kong University of Science and Technology Chang Xu State Key Lab for Novel Software Technology, Nanjing University
  • 2. • Promising research progress • ClearView1: Prevent all 10 Firefox exploits • GenProg2: Fix 55/105 real bugs [1] Automatically Patching Errors in Deployed Software. Perkins et al. SOSP’09 [2] A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. Le Goues et al. ICSE’12 2 Automatic Program Repair
  • 4. “It won't get your bug patched any quicker. You’ll just have shifted the coders' attention away from their own app's bugs, and onto the repair tool’s bugs.” - Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- Bypassing-the-Source-Code 4 Automatic Program Repair
  • 5. #what-could-possibly-go-wrong • Blackbox repair • Increasing maintenance cost • Vulnerable to attack - Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- Bypassing-the-Source-Code - A human study of patch maintainability. ISSTA’12 5 - Automatic patch generation learned from human-written patches. ICSE’13
  • 6. #what-could-possibly-go-wrong #program-out-of-control - Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But- Bypassing-the-Source-Code - A human study of patch maintainability. ISSTA’12 6 - Automatic patch generation learned from human-written patches. ICSE’13 • Blackbox repair • Increasing maintenance cost • Vulnerable to attack
  • 7. Use automatically generated patches as debugging aids 7
  • 8. Use automatically generated patches as debugging aids Our Human Study • Investigate the usefulness of generated patches as debugging aids • Discuss the impact of patch quality on debugging performance • Explore practitioners’ feedback on adopting automatic program repair 8
  • 10. Debugging aid Participants Bugs 10 is given to Debug
  • 12. Low-quality generated patch Debugging aid Participants Bugs 12
  • 13. Low-quality generated patch High-quality generated patch Debugging aid Participants Bugs 13
  • 14. Low-quality generated patch High-quality generated patch Buggy method location Debugging aid Participants Bugs 14
  • 15. Grad: 44 MTurk: 23 Engr: 28 95 Participants CS graduate students Amazon Mechanical Turk workers Industrial software engineers Debugging aid Participants Bugs 15
  • 17. 44 Graduate students • Between-group design 14 students 15 students 15 students Debugging aid Participants Bugs 17
  • 18. 44 Graduate students • Between-group design Low-quality generated patch High-quality generated patch Buggy method location 14 students 15 students 15 students Debugging aid Participants Bugs 18
  • 19. 44 Graduate students • Between-group design • Onsite setting • Eclipse IDE • Supervised session Low-quality generated patch High-quality generated patch Buggy method location 14 students 15 students 15 students Debugging aid Participants Bugs 19
  • 20. Low-quality generated patch High-quality generated patch Buggy method location Remote participants (28 Engr + 23 MTurk) • Within-group design Debugging aid Participants Bugs 20
  • 21. Remote participants (28 Engr + 23 MTurk) • Within-group design • Online debugging system Low-quality generated patch High-quality generated patch Buggy method location Debugging aid Participants Bugs 21
  • 23. Bug Selection Criteria • Real bugs • The bug has accepted patches written by developers • Proper number of bugs • The bug has generated patches with different quality Debugging aid Participants Bugs 23
  • 24. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 Debugging aid Participants Bugs 24
  • 25. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) if(sub!=null){ args[i+1] = sub.toString(); Auto-generated patch A Auto-generated patch B Debugging aid Participants Bugs 25 } } for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) args[parenCount+1] = new Integer(reImpl.leftContext.length); }
  • 26. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) if(sub!=null){ args[i+1] = sub.toString(); Auto-generated patch A Auto-generated patch B avg. ranking from 85 devs and students Debugging aid Participants Bugs 26 } } for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) args[parenCount+1] = new Integer(reImpl.leftContext.length); } 1.6 2.8
  • 27. Automatic patch generation learned from human-written patches. Kim et al. ICSE’13 for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) if(sub!=null){ args[i+1] = sub.toString(); Auto-generated patch A Auto-generated patch B High-Quality Patch Low-Quality patch avg. ranking from 85 devs and students Debugging aid Participants Bugs 27 } } for (int i=0; i<parenCount; i++) SubString sub = (SubString)parens.get(i) args[parenCount+1] = new Integer(reImpl.leftContext.length); } 1.6 2.8
  • 29. Participants submit 337 patches as their debugging outcome Debugging aid Participants Bugs 29
  • 30. Location 109 LowQ 112 HighQ # submitted patches 116 w.r.t debugging aid Participants submit 337 patches as their debugging outcome Debugging aid Participants Bugs 30
  • 31. Location 109 LowQ 112 HighQ # submitted patches 116 w.r.t debugging aid Bug1 66 Bug2 74 Bug5 62 Bug3 59 Bug4 76 # submitted patches w.r.t bugs Participants submit 337 patches as their debugging outcome Debugging aid Participants Bugs 31
  • 32. Evaluation of debugging performance 32
  • 34. Patch Correctness • Passing test cases Correctness 34
  • 35. Patch Correctness • Passing test cases • Matching the semantics of original accepted patches Correctness 35
  • 36. Patch Correctness • Passing test cases • Matching the semantics of original accepted patches • 3 evaluators Correctness 36
  • 37. Debugging Time • Eclipse Plug-in •Website Timer Correctness Debugging time 37
  • 38. Correctness Debugging time • Independent variables • Debugging aids • Bugs • Participant types • Programming experience 38
  • 39. Multiple Regression Analysis Correctness Debugging time • Independent variables • Debugging aids • Bugs • Participant types • Programming experience correctness = α0 + α1 ∙ x1 + α2 ∙ x2 + α3 ∙ x3 + α4 ∙ x4 debugging time = β0 + β1 ∙ x1 + β2 ∙ x2 + β3 ∙ x3 + β4 ∙ x4 39
  • 40. Post-study Survey • Helpfulness of debugging aids • Difficulty of bugs • Opinions on using generated patches as debugging aids Correctness Debugging time Survey feedback 40
  • 42. High-quality patches significantly improve debugging correctness 1 48% 33% 71% 42
  • 43. High-quality patches significantly improve debugging correctness 1 % of correct patches 48% 33% 71% 43 Location LowQ HighQ
  • 44. High-quality patches significantly improve debugging correctness % of correct patches Location LowQ HighQ 1 Positive Coefficient = 1.25 p-value= 0.00 < 0.05 48% 71% 44
  • 45. Low-quality patches slightly undermine debugging correctness % of correct patches Location LowQ HighQ 2 48% 33% 71% 45
  • 46. Low-quality patches slightly undermine debugging correctness % of correct patches Location LowQ HighQ 2 Negative Coefficient = -0.55 p-value= 0.09 48% 33% 71% 46
  • 47. Low-quality patches can undermine debugging correctness % of correct patches Location LowQ HighQ 2 Negative Coefficient = -0.55 p-value= 0.09 48% 33% 71% 47
  • 48. High-quality patches are more useful for 3 difficult bugs 48
  • 49. High-quality patches are more useful for 3 difficult bugs 49 5 4 3 2 Bug Difficulty Bug1 Math-280 Bug2 Rhino-114493 Bug3 Rhino-192226 Bug4 Rhino-217379 Bug5 Rhino-76683
  • 50. High-quality patches are more useful for 3 difficult bugs 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% % of correct patches Bug1 Bug2 Bug3 Bug4 Bug5 Location LowQ HighQ 50 5 4 3 2 Bug Difficulty Bug1 Math-280 Bug2 Rhino-114493 Bug3 Rhino-192226 Bug4 Rhino-217379 Bug5 Rhino-76683
  • 51. 4 The type of debugging aid does not affect debugging time 51
  • 52. 4 The type of debugging aid does not affect debugging time 80 60 40 20 0 Debugging time (min) Location LowQ HighQ 52
  • 53. 5 Other factors’ impact on debugging performance Difficult bugs significantly slow down debugging Engr and MTurk are more likely to debug correctly Novices tend to benefit more from HighQ patches 53
  • 54. Helpfulness of debugging aids Very helpful Helpful Medium Slightly Helpful Not Helpful 6 54 Participants consider high-quality generated patches much more helpful than low-quality patches Low-quality generated patch High-quality generated patch Mann-Whitney U test p-value = 0.001
  • 56. 56
  • 57. Quick starting point • Point to the buggy area • Brainstorm “They would seem to be useful in helping find various ideas around fixing the issue, even if the patch isn’t always correct on its own.” 57
  • 58. Quick starting point • Point to the buggy area • Brainstorm Confusing, incomplete, misleading • Wrong lead, especially for novices • Require further human perfection “They would seem to be useful in helping find various ideas around fixing the issue, even if the patch isn’t always correct on its own.” 58
  • 59. “Generated patches would be good at recognizing obvious problems” “…but may not recognize more involved defects.” 59
  • 60. “Generated patches would be good at recognizing obvious problems” “…but may not recognize more involved defects.” 60 “Generated patches simplify the problem” “…but they may over-simplify it by not addressing the root cause.”
  • 61. “I would use generated patches as debugging aids, as they provide extra diagnostic information” 61
  • 62. “I would use generated patches as debugging aids, as they provide extra diagnostic information” “…along with access to standard debugging tools.” 62
  • 64. Threats to Validity • Bugs and generated patches may not be representative • Quality measure of generated patches may not generalize • May not generalize to domain experts • Possibility of blindly reusing generated patches • Remove patches that are submitted less than 1 minute 64
  • 65. Takeaway 65 • Auto-generated patches can be useful as debugging aids • Participants fix bugs more correctly with auto-generated patches • Quality control is required • Participants’ debugging correctness is compromised with low-quality generated patches • Maximize the benefits • Difficult bugs • Novice developers

Notas del editor

  1. This is a work with …
  2. Automatic program repair has been a very hot topic in recent years. We’ve seen quite promising research progress in this area. For example, Perkins et al. proposed a self-defending software ClearView, which successfully prevents all of the 10 Firefox exploits created by a red team and generated patches for 7 of them. As another successful example, Le Goues et al. proposed GenProg and used it to fix 55 out of 105 real bugs
  3. However, there are also skeptics and worries about automatic program repair. Here is a quote from online discussion.
  4. Here is a quote from online discussion.
  5. Followed with this general concern, we’ve observed from online community and literatures worries about things that could possibly go wrong with the program repair technique. For example, whether it creates sort of blackbox repair that hardly make sense. Whether it increase maintenance cost, and whether machine generated patches are vulnerable to attack.
  6. In general, people are worried about whether a program, after being repaired automatically, still work as intended, or will become unexpectable, and out of control. Because of these concerns, direct deployment of automatic program repair seems problematic at this point. But, can we still benefit from this techinuqe?
  7. How about using ..? In this case, developers can refer to generated patches when they debug, but they don’t necessarily have to use it. In other words, they still take full control over the content of the patch. This sounds like a more comfortable usage scenario.
  8. Which is also the focus of our human study. First … And because some of the controversy of program repair comes from the quality of automatically generated patches, we also want to disc… Finally, we explore…
  9. Here is our methodology
  10. Which is actually quite intuitive. Basically, we conducted controlled experiments, where we give certain type of debugging aids to participants, who use them to debug. Next, I’ll introduce these 3 parts in detail.
  11. First, we have 3 different types of debugging aids.
  12. And for the last type of debugging aid, we need some kind of baseline. Because the first two debugging aids already suggest candidate fix.
  13. For fair comparison, for the baseline, or the control group, we provide only the buggy method location as the debugging aid. which is common in practice, where developers typically know the general buggy area from bug reports or stack trace, before they start to debug. That’s the 3 types of debugging aids we’re gonna give to pariticipants.
  14. We recruited 95 participants from a wide population. Which includes 44 cs graduate students, 28 software engineers from industry, and 23 workers of Amazon mechanical turk, which is a crowdsourcing marketplace. Average years: Grad: 4.1, Engr: 2.4 (1-10), Mturk: 5.7 (1-14)
  15. Now the question is, how we assign debugging aid to participants?
  16. For the 44 graduate students, we adopt a between-group design by evenly dividing students into 3 groups of similar programming experience
  17. Each group is given only one of debugging aids.
  18. These students use Eclipse to debug in a supervised session.
  19. For remote participants, namely 28 engr and 23 mturk workers, it’s unlikely for us to determine their numbers and expertise beforehand, so between-group design is not appropriate here if we want to ensure the fairness of group division. Instead, we adopt with-in group design, such that participants can be exposed to different debugging aids. To balance the experimental conditions, whenever participants select a bug, we assign the type of debugging aids to this particular bug in a round-robin fashion s.t. each aid was equally likely to be given to each bug.
  20. We developed an online … for them to complete debugging tasks.
  21. Next, how do we select bugs?
  22. Accordingly, we selected all 5 bugs reported in this work…..
  23. For each of the 5 bugs, this work reported two patches generated by different program repair techniques,
  24. And they presented these different patches of the same bug to 85 … , and asked them to rank the patch based on the question, “which one is more acceptable?” In the end, this work reported this ranking of different patches for the same bug
  25. And, for the purpose of our human study, we label the patch with higher ranking as the “high-quality patch”, and its peer patch for the same bug, but with lower ranking, as the “low-quality” patch
  26. That’s basically how we design this debugging human study.
  27. In total, participants submit 337 patches ……
  28. Here is the # of submitted patches that are created with each of the debugging aid.
  29. And here is the # of submitted patches that are created for each bug. Our design basically ensures that the these two distributions are well balanced.
  30. Next, I’ll describe how we evaluate participants’ debugging performance.
  31. First, we evaluate the correctness of participants’ submitted patches.
  32. A patch is labeled correct only if it passes our test cases
  33. … and match the …
  34. For this part we have 3 evaluators to check and discuss the semantic matching.
  35. We also measure participants’ debugging time by developing an eclipse plug-in and a website timer to record the time they spent on each bug
  36. Up to this point, several factors can affect debugging correctness and time. For example, the type of debugging aids, of course, and also bugs, participant types, and their expertise.
  37. So, we use multiple regression analysis to quantify the relation between these independent variables and the outcome. That is, we use multiple regression to compute the coefficient values and statistical significance, so that we can understand whether the corresponding factors really have positive or negative impact on debugging performance, and if so, how much the impact is.
  38. Our evaluation also includes a post-study survey, in which we asked participants to rate the …, the …, and offer opinions.
  39. Results
  40. First, high-q patches DO improve debugging correctness, SIGNIFICANTLY
  41. Here is the % of correct patches made by these two groups. It’s pretty straightforward that group with highq patches has made a MUCH higher % of correct patches.
  42. The regression analysis also shows that high-q patch has a statistically significant positive coefficient on debugging correctness
  43. Surprisingly, the group with low-quality patches has made less correct patches, EVEN when compared to the control group.
  44. Regression also shows negative coefficient for low-quality patches, although it’s not statistically significant.
  45. But we do observe that low… can indeed …
  46. Next, we find…
  47. Here’s participants’ survey feedback on bug difficult. We can see that they consider the third bug, Rhino … to be the most difficult one to debug
  48. And when we check for each bug, the percentage of correct patches made by each group, we observe an obvious trend For the 3rd bug, no one except for the participants using high-quality patches can fix the bug correctly.
  49. On the other hand, we also found that …
  50. We can see from this figure, that the debugging time of these three groups is not that different. And regression analysis also suggests the same.
  51. We also found other … . For example, the last bullet We found that novices, whose programming experience is below the average among all participants, tend to
  52. Next, when we analyze the survey results, where we ask participants to rate how help each debugging aid is, we found that they consider highQ generated patches much more helpful than lowQ generated patches,
  53. Now let’s listen to what participants said about their human study experience in using generated patches in debugging.
  54. As usual, things always have positive and the negative side.
  55. Quote…
  56. But, on the other hand, such a quick starting point may be confusing… And, they might require further perfection from human developers
  57. Since we distinguish highQ and lowQ patches based on their acceptability ranking reported in another work, this may not generalize to other quality measures, such as metric-based ones Another threat is that participants may blindly… Actually we took several measures to prevent such behaviors. … When participants submit their patches, we’ll ask them to justify their patches in an input box.
  58. Finally, the take-away of this work. BUT, strict quality … If we gave …, it could be misleading and indeed compromise their debugging performance. Finally, the benefits of using auto-generated patches as debugging aids could be much more obvious for difficult debugging tasks, or for novice developers