Bug fixing is a time-consuming and tedious task. To reduce the manual efforts in bug fixing, researchers have pre- sented automated approaches to software repair. Unfortunately, recent studies have shown that the state-of-the-art techniques in automated repair tend to generate patches only for a small number of bugs even with quality issues (e.g., incorrect behavior and nonsensical changes). To improve automated program repair (APR) techniques, the community should deepen its knowledge on repair actions from real-world patches since most of the techniques rely on patches written by human developers. Previous investigations on real-world patches are limited to statement level that is not sufficiently fine-grained to build this knowledge. In this work, we contribute to building this knowledge via a systematic and fine-grained study of 16,450 bug fix commits from seven Java open-source projects. We find that there are opportunities for APR techniques to improve their effectiveness by looking at code elements that have not yet been investigated. We also discuss nine insights into tuning automated repair tools. For example, a small number of statement and expression types are recurrently impacted by real-world patches, and expression-level granularity could reduce search space of finding fix ingredients, where previous studies never explored.
Handwritten Text Recognition for manuscripts and early printed texts
A Closer Look at Real-World Patches
1. A Closer Look at Real-world Patches
Kui Liu, Dongsun Kim, Anil Konyuncu, Tegawendé F. Bissyandé, and Yves Le Traon
Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Luxembourg
Li Li
Monash Software Force (MSF), Monash University, Melbourne, Australia
@ Madrid Spain, 34th ICSME 2018September 27, 2018
2. 1
> Basic Process of Automated Program Repair (APR)
Fault
Localization
Test
Pass
Fail
Patch
Candidate
APR
Tools
Suspicious
buggy code
Where is the code to be fixed? How to generate patches? Is the patch correct?
passing
tests
Passing
tests
Failing
tests
3. 2
> How many bugs are fixed by existing APR tools?
Benchmark Defects4J [42] (395 bugs).
APR Tool # fixed bugs # Correctly fixed bugs
jGenProg 29 5
jKali 22 1
jMutRepair 17 3
Nopol 35 5
HDRepair 23 6
ACS 23 18
ssFix 60 20
ELIXIR 41 26
JAID 26 9
CapGen 25 21
SketchFix 26 19
SimFix 56 34
Why the quantity of bugs
that can be fixed by APR
tools and the quality of
patches generated by APR
tools are such low?
4. 3
> Scope Limitation of APR Tools
Fixing bugs at the statement level.
Bug Chart_1 in Defects4J fixed by jMutRepair, ELIXIR, ssFix,
JAID, SketchFix, CapGen, SimFix.
5. 4
> Are Non-Statement Code Entities Bug-free?
Bug located in Type Declaration (Math-12 in Defects4J).
Bug located in Method Declaration (Lang-29 in Defects4J). Bug located in Field Declaration (Lang-56 in Defects4J).
Bugs located in non-statement code entities.
None of existing APR tools can fix these bugs.
6. 5
> Statement Level VS. Finer Granularity Level
Statement level: UPD ReturnStatement.
The repair action is difficult to be used
to fix similar bugs.
Expression level: dim / 2 0.5 * dim.
Project: Commons-math.
Bug Report ID: MATH-29, “Fix truncated value.”
Commit cedf0d27f9e9341a9e9fa8a192735a0c2e11be40,
--- a/src/main/java/org/apache/commons/math3/distribution/MultivariateNormalDistribution.java
+++ b/src/main/java/org/apache/commons/math3/distribution/MultivariateNormalDistribution.java
@@ −895, 1 +895, 1 @@
- return FastMath.pow(2 * FastMath.PI, -dim / 2) *
+ return FastMath.pow(2 * FastMath.PI, -0.5 * dim) *
FastMath.pow(covarianceMatrixDeterminant, -0.5) * getExponentTerm(vals);
The fix pattern could be used to fix similar bugs.
7. 6
> Objective
Deepen knowledge on repair ingredients
from real-world patches in a fine-grained
way for automated program repair.
9. 8
> Research Questions.
RQ4. Which parts of buggy expressions are prone to be buggy?
RQ1. Do patches impact some specific statement types?
RQ2. Are there code elements in statements that are prone to be faulty?
RQ3. Which expression types are most impacted by patches?
In APR, Fault localization techniques
(e.g.,Tarantula[31], Ochiai[32], Ochiai2[33], Zoltar[34] and
DStar[35]) are used to identify bug positions at code line level.
Data
Type
Variable
Name Operator
Being
Assigned
Expression
14. 13
> RQ1: Root AST Nodes Impacted by Patches
• Statements are the main buggy
code entities.
None of existing APR tools can fix declaration-related bugs in Defects4J.
Distributions of Root AST node Types Impacted by Patches.
MethodDeclaration, 15.95%
FieldDeclaration, 9.32%
EnumDeclaration, 0.03%
TypeDeclaration, 1.41%
Statement,
73.29%
• Declaration entities (~27%) could
be buggy.
15. 14
> RQ1: Statements Recurrently Impacted by Patches.
5 out of 22 Statement types occupy 88% buggy code statements.
APR tools could focus on fixing some specific statements.
16. 15
> RQ1: Adoption of Update
Supports the investigation of repair
ingredients in a fine-grained way.
“Update” occupies half of repair actions.
1. double d = FastMath.pow(2 * FastMath.PI, -dim / 2);
2. double d = FastMath.pow(2 * FastMath.PI, -dim / 3);
Update:
- a = a + b;
+ a = a * b;
Delete:
int a = 0;
- a = a + b
Move:
- a = a + b;
sum(a,b);
+ a = a + b;
Insert:
int a = 0;
+ a = a + b;
17. 16
> Search Space at Statement Level VS. Expression Level
Expression-level granularity could reduce search space.
Number of buggy
ExpressionStatements: ~40,000.
Commit log: added protection against infinite loops by
setting a maximal number of valuations.
Number of buggy
PrefixExpression: 1,362.
18. 17
> RQ2: Buggy Modifier.
Three ways of repair actions for “modifier”-
related bugs:
1) Add a missing modifier.
2) Delete an inappropriate modifier.
3) Replace an inappropriate modifier.
None of existing APR tools can fix modifier-related bugs in Defects4J.
Modifier, 3.30%
Type, 8.70%
Identifier, 5.50%
Expression,
82.40%
Distributions of inner-statement elements impacted by patches.
Commit log: LANG-334: To avoid exposing a mutating map.
19. 18
> RQ2: Buggy Type Usage.
Buggy Types:
1. Buggy primitive types.
2. Buggy non-primitive types.
Modifier, 3.30%
Type, 8.70%
Identifier, 5.50%
Expression,
82.40%
Distributions of inner-statement elements impacted by patches.
It is a new challenge for APR tools to fix non-primitive type related
bugs.
Commit log: Fix integer overflow.
20. 19
> RQ2: Buggy Identifiers.
APR tools Do not Fix Buggy Identifiers.
Modifying the inconsistent identifier is also
labeled as a bug fix by developers.
Debugging buggy names [58, 59, 60, 61, 62].
Modifier, 3.30%
Type, 8.70%
Identifier, 5.50%
Expression,
82.40%
Distributions of inner-statement elements impacted by patches.
21. 20
> RQ3: Expressions Recurrently Impacted by Patches
5 out of 34 expression types occupy 80% of buggy expressions.
APR tools could focus on fixing some specific expressions.
Distributions of repair actions at the expression level.
22. 21
> RQ3: Buggy Literal Expressions.
Buggy Literal Expressions raise a new challenge for APR tools.
Commit log: SOLR-6959, fix incorrect base url for PDFs.
23. 22
> RQ4: Fault-prone Parts in Expressions.
Non-buggy part of expressions could provide context for fix
pattern mining at the expression level.
Distribution of whole VS. sub-element changes in some buggy expressions.
Expression % whole exp % each sub-exp
Assignment 18.1% Left_Hand_Exp (13.3%) Operator (0.8%) Right_Hand_Exp (73.5)
CastExpression 45.8% Type (11.9%) Exp (42.9%)
ClassInstanceCreation 15.5% Pre_Exp (9.2%) ClassType (19.7%) Argus (63%)
ConditionalExpression 22.9% Condition_Exp (24.1%) Then_Exp (33%) Else_Exp (49.5%)
InfixExpression 27.3% Left_Hand_Exp (35%) Operator (5.6%) Right_Hand_Exp (68.7)
MethodInvocation 14.7% MethodName (22.1%) Argus (79.8%)
24. 23
> Fix Pattern Mining at Expression Level
Commit 44854912194177d67cdfa1dc765ba684eb013a4c
--- a/src/main/java/org/apache/commons/lang3/time/FastDateParser.java
+++ b/src/main/java/org/apache/commons/lang3/time/FastDateParser.java
@@ −895, 1 +895, 1 @@
- final TimeZone tz = TimeZone.getTimeZone(value.toUpperCase());
+ final TimeZone tz = TimeZone.getTimeZone(value.toUpperCase(Locale.ROOT));
- value.toUpperCase()
+
value.toUpperCase(Locale.ROOT);
Fix
Pattern:
Commit log: use toUpperCase(Locale) internally to avoid i18n issues.
25. 24
> Take-away
RQ1:
1. APR scope should be extended to declaration entities.
2. APR changes can be prioritized on a few specific statement types.
3. Move action can be ignored by APR tools.
4. Real-world patches support further investigation in a fine-grained way.
RQ2:
1. APR scope should be extended to modifiers.
2. Buggy non-primitive types could be a new direction for APR.
RQ3:
1. APR changes can be prioritized on a few specific expression types.
2. Buggy literal expressions raise a new challenge for APR.
RQ4:
Non-buggy part of expressions could provide context for fix pattern mining at the expression level.
26. 25
> Summary
15
> RQ1: Adoption of Update
Supports the investigation of repair
ingredients in a fine-grained way.
“Update” occupies half of repair actions.
1. double d = FastMath.pow(2 * FastMath.PI, -dim / 2);
2. double d = FastMath.pow(2 * FastMath.PI, -dim / 3);
10
> Patch Differencing at AST Node Level
Buggy version
Fixed version
Patch
Regroup
Hierarchical construct
of code change actions.
GumTree[25]
https://github.com/AutoProRepair/PatchParser
Notas del editor
Chart_17, Lang_4 none of apr tools can fix non primitive type related bugs.
Some bugs are also related to literal expressions.