Software systems continuously change and developers spent a large portion of their time in keeping track and understanding changes and their effects. Current development tools provide only limited support. Most of all, they track changes in source files only on the level of textual lines lacking semantic and context information on changes. Developers frequently need to reconstruct this information manually which is a time consuming and error prone task. In this talk, I present three techniques to address this problem by extracting detailed syntactical information from changes in various source files. I start with introducing ChangeDistiller, a tool and approach to extract information on source code changes on the level of ASTs. Next, I present the WSDLDiff approach to extract information on changes in web services interface description files. Finally, I present FMDiff, an approach to extract changes from feature models defined with the linux Kconfig language. For each approach I report on cases studies and experiments to highlight the benefits of our techniques. I also point out several research opportunities opened by our techniques and tools, and the detailed data on changes extracted by them.
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
1. Analyzing Changes in Software Systems
From ChangeDistiller to FMDiff
Martin Pinzger
Software Engineering Research Group
University of Klagenfurt
http://serg.aau.at/bin/view/MartinPinzger
8. Understanding changes and their impact
Current tools lack support for comprehending changes
“How do software engineers understand code changes? - an
exploratory study in industry”, Tao et al. 2012
Developers need to reconstruct the detailed context and
impact of each change which is time consuming and error
prone
“An exploratory study of awareness interests about software
modifications”, Kim 2011
8
9. We need better support to analyze and
comprehend changes and their impact
10. Overview of (my) tools
ChangeDistiller
Fine-Grained Evolution of Java classes
WSDLDiff
Evolution of service-oriented systems
FMDiff
Evolution of feature models
10
12. Extracting source code changes using ASTs
Using tree differencing, we can determine
public void method(D d) {
if (d != null) {
d.foo();
d.bar();
}
}
public void method(D d) {
d.foo();
d.bar();
}
12
13. Using tree differencing, we can determine
Enclosing entity (root node)
Extracting source code changes using ASTs
public void method(D d) {
if (d != null) {
d.foo();
d.bar();
}
}
public void method(D d) {
d.foo();
d.bar();
}
13
14. Using tree differencing, we can determine
Enclosing entity (root node)
Kind of statement which changed (node information)
public void method(D d) {
d.foo();
d.bar();
}
public void method(D d) {
if (d != null) {
d.foo();
d.bar();
}
}
Extracting source code changes using ASTs
14
15. Using tree differencing, we can determine
Enclosing entity (root node)
Kind of statement which changed (node information)
Kind of change (tree edit operation)
public void method(D d) {
if (d != null) {
d.foo();
d.bar();
}
}
public void method(D d) {
d.foo();
d.bar();
}
Extracting source code changes using ASTs
15
18. ChangeDistiller references
“Change distilling: Tree differencing for fine-grained source
code change extraction”, Fluri et al. 2007
Diffing UML diagrams
“UMLDiff: An Algorithm for Object-Oriented Design Differencing”,
Xing et al. 2005
Recording changes in the IDE
“Mining Fine-grained Code Changes to Detect Unknown Change
Patterns”, Negara et al. 2014
18
19. ChangeDistiller improved -> gumtree
gumtree (sources available at GitHub)
“Fine-grained and accurate source code differencing”, Falleri et
al. 2014
19
21. Predicting bug-prone files
UC values of E 1 using logistic regression with
CC as predictors for bug-prone and a not bug-
Larger values are printed in bold.
Eclipse Project AUC LM AUC SCC
Compare 0.84 0.85
jFace 0.90 0.90
JDT Debug 0.83 0.95
Resource 0.87 0.93
Runtime 0.83 0.91
Team Core 0.62 0.87
CVS Core 0.80 0.90
Debug Core 0.86 0.94
jFace Text 0.87 0.87
Update Core 0.78 0.85
Debug UI 0.85 0.93
JDT Debug UI 0.90 0.91
Help 0.75 0.70
JDT Core 0.86 0.87
OSGI 0.88 0.88
Median 0.85 0.90
Overall 0.85 0.89
21
SCC outperforms LM
“Comparing Fine-Grained
Source Code Changes And
Code Churn For Bug
Prediction”, Giger et al. 2011
More info:
22. Predicting bug-prone methods
Large files are typically the most bug-prone files
Retrieving bug-prone methods saves manual inspection
effort and testing effort
11 methods on average
class 1 class 2 class 3 class n...class 2
4 methods are bug prone (ca. 36%)
22
23. Predicting bug-prone methods
Models computed with change metrics (CM) perform best
authors and methodHistories are the most important measures
More info
“Method-Level Bug Prediction”, Giger et al. 2012
23
Table 4: Median classification results over all pro-
jects per classifier and per model
CM SCM CM&SCM
AUC P R AUC P R AUC P R
RndFor .95 .84 .88 .72 .5 .64 .95 .85 .95
SVM .96 .83 .86 .7 .48 .63 .95 .8 .96
BN .96 .82 .86 .73 .46 .73 .96 .81 .96
J48 .95 .84 .82 .69 .56 .58 .91 .83 .89
values of the code metrics model are approximately 0.7 for
each classifier—what is defined by Lessman et al. as ”promis-
ing” [26]. However, the source code metrics suffer from con-
siderably low precision values. The highest median precision
value for the code metrics model is obtained in case of J48
24. Research opportunities
Extract details on statement changes (-> solved by
gumtree?)
Argument changes in method invocations
Nesting of statements
Expression changes
Extract context information
Consider call, access, inheritance, and type dependencies
Combine with task context - see, e.g., Mylyn [Kersten and
Murphy, 2005]
24
25. Research opportunities (cont.)
Support multiple programming languages
Currently only Java is supported
Extract changes from “other” source files
Configuration files, project and build files, etc.
Extract changes of whole software ecosystems
25
34. What we learned from WSDL evolution
Users of the AmazonEC2 service
New operations are continuously added
Data types change frequently adding new elements
Users of the FedEx service
Data types change frequently
Operations are more stable
More info
“Analyzing the Evolution of Web Services using Fine-Grained
Changes”, Romano and Pinzger 2012
34
35. Research opportunities
Perform the study with “real” industrial systems
Analyzing co-evolution
between WSDL interfaces
between WSDL interfaces and their implementation
Use changes for test selection/prioritization
Etc.
Major challenge is to find case studies of real/industrial
service-oriented systems that are available
35
36. FMDiff
Evolution of the linux kernel feature model
Nicolas Dintzner, Arie van Deursen, and Martin Pinzger
38. Main motivation
Identify co-evolution patterns (common changes in the
different artifacts implementing a feature)
Local validation of changes to prevent inconsistencies
Facilitate test selection
Prevent variability related implementation bugs
Implementation of features is intermixed, leading to undesired
interactions
Interactions occur between features from different sub-systems
demanding cross-subsystem knowledge
More info: “42 variability bugs in the linux kernel: a qualitative
analysis”, Abal et al. 2014
38
39. Extracting feature model changes from the Linux kernel with FMDiff - Nicolas Dintzner
A feature in a Kconfig file
Name
Type & Prompt
Default
Depends
Select
(help text)
Additional
structures
39
if ACPI
config ACPI_AC
tristate "AC Adapter"
default y if ACPI
depends X86
select POWER_SUPPLY
help
This driver supports the AC Adapter object ,(...).
endif
40. Extracting feature model changes from the Linux kernel with FMDiff - Nicolas Dintzner
FMDiff: Extracting changes from Kconfig files
40
Linux repository
Feature model translation
X86
v1
X86
v2
X86
v…
Feature model reconstruction (EMF)
Feature model comparison (EMF Compare)
Feature change classification
Feature change repository
41. Extracting feature model changes from the Linux kernel with FMDiff - Nicolas Dintzner
Feature model transl.: from Kconfig to Kdump
41
if ACPI
config ACPI_AC
tristate "AC Adapter"
default y if ACPI
depends X86
select POWER_SUPPLY
endif
config ACPI_AC
tristate "AC Adapter"
default y if ACPI
depends X86 && ACPI
select POWER_SUPPLY
1- Kconfig (original) 2- Hierarchy flattening
config ACPI_AC
tristate "AC Adapter"
default y if X86 && ACPI
depends X86 && ACPI
select POWER_SUPPLY if X86 && ACPI
3- Depends propagation
Item ACPI_AC tristate
Prompt ACPI_AC 1
Default ACPI_AC "y" "X86 && ACPI"
Depends ACPI_AC "X86 && ACPI"
ItemSelects ACPI_AC POWER_SUPPLY "X86 && ACPI”
4- Kdump format (what we use)
Credits to for Undertaker and the translation process
42. Extracting feature model changes from the Linux kernel with FMDiff - Nicolas Dintzner
FMDiff meta model
42
Feature
Type (string)
Prompt (boolean)
Depends (string)
DependsReferences (list of strings)
Select Statement
Target (string)
Condition (string)
SelectConditionReferences (list of strings)
Default Statement
DefaultValue (string)
Condition (string)
DefaultValueReferences (list of strings)
DefaultValueConditionReferences (list of strings)
"contains"
"contains"
"contains"
FeatureModel
Architecture (string)
Revision (string)
0
*
0 *
0 *
43. Extracting feature model changes from the Linux kernel with FMDiff - Nicolas Dintzner
Example
43
Item ACPI_AC tristate
Depends ACPI_AC "X86 && ACPI"
…
Item ACPI_AC tristate
Depends ACPI_AC “(X86 || AMD) && ACPI"
…
V1 V2
Feature Model
Architecture: X86
Revision: 1
Feature
Name: ACPI_AC
Depends: “X86 && ACPI”
Depends references:
[X86,ACPI]
Feature Model
Architecture: X86
Revision: 2
Feature
Name: ACPI_AC
Depends: “(AMD || X86) && ACPI”
Depends references:
[X86,ACPI,AMD]
Rev
.
Feature Change type Category Subcategory Old
value
New value
2 ACPI_AC Modification modify depends modify condition X86 X86||AMD
2 ACPI_AC Modification modify depends add depends ref. AMD
44. Extracting feature model changes from the Linux kernel with FMDiff - Nicolas Dintzner
Change classification
44
Change operations: Add, Remove, Modify
Attribute Depends Default Select
Type
Prompt
Expression
References
Default Value
Condition
References
Target
Condition
References
Feature
45. Study with 14 releases of the linux kernel
45
Featurechangescategorydistribution
0%
25%
50%
75%
100%
Linux kernel releases
v2.6.39 v3.0 v3.1 v3.2 v3.3 v3.4 v3.5 v3.6 v3.7 v3.8
ADDED
REMOVED
MODIFIED
493 772 397 1740 612 493 750 609 1068 544
46. Impact on architectures
Which architectures are affected and need to be tested?
46
Release #changed features % aff. all architectures
2.6.39 1016 26.47
3.0 1020 58.43
3.2 2361 39.00
3.4 778 32.39
3.6 823 34.14
3.8 963 29.38
47. What we learned from feature evolution
Modification of existing features is done frequently
Should be considered when studying the linux kernel (existing
studies mainly focussed on addition and removal of features)
Changes affecting All architectures vary between 10-50%
Future studies should be clear about which architectures they
study
More info
“Analysing the Linux kernel feature model changes using FMDiff”,
Dintzner et al. 2015
47
48. Research opportunities
Link changes in the three implementation spaces
Kconfig, Kbuild, source code
Mine co-evolution patterns
Detail the level of changes
E.g., consider changes in the conditional statements
Study evolution of other systems
E.g., toybox, ecos, BusyBox
Consider frameworks and other highly-configurable systems
48
49. Conclusions
49
ChangeDistiller WSDLDiff
FMDiff
if ACPI
config ACPI_AC
tristate "AC Adapter"
default y if ACPI
depends X86
select POWER_SUPPLY
help
This driver supports the AC Adapter object
endif
Enrich changes
to comprehend them
Martin Pinzger
martin.pinzger@aau.at