Source code comprehension on evolving software

Source Code Comprehension on Evolving Software:
A Literature Survey
Yida Tao
Supervisor: Sunghun Kim
1

Motivation
Code Change Comprehension
Tao et al., FSE’12
Code change comprehension is
• Frequently required
• In major development activities, in
particular the code-review process
• How do software engineers understand code changes? An exploratory study in industry. Tao et al., FSE’12
• Expectations, outcomes, and challenges of modern code review. Bacchelli and Bird, ICSE’13
Bacchelli & Bird, ICSE’13
• “…review and understand code they
have not seen before may be more
common that a developer working on
new code”
• “From interviews, no other code
review challenge emerged as clearly as
understanding the submitted change”
2

Outline
Program Differencing
Describing code changes
Code Change Summarization
Explaining code changes
Querying and Filtering
Customization
3

4
 Text Differencing
 Syntactic Differencing
 Semantic Differencing

Text Differencing
 Flat representation of a program
 Sequence of strings
 Unix diff
 Only output added/deleted lines, can not detect modified lines
 Hard to determine when a code fragment is moved upward or downward
 Ldiff (Canfora et al., ICSE’09)
 An enhanced line differencing tool
 Limitations
 Changes to *characters*
 No syntactic-structure information
5

Syntactic Differencing
 Structured representation of a program
 Abstract syntax tree; XML
 ChangeDistiller (Fluri et al., TSE’07)
 Tree differencing
 Node: bigram string similarity
 Control structure: subtree similarity
 Output: tree edit script (insert, delete, move, update)
 XML differecing
 srcXML (Maletic & Collard, ICSM’04): embeds abstract syntax and structure
within the source code
 diffX (Al-Ekram et al., CASCON '05)
 Limitation
 Cannot describe how the behavior of a program is changed
 Still report differences for behavior-preserving changes
6

Semantic Differencing
 Semantic diff (Jackson and Ladd, ICSM’94)
 Method-level
 Variable dependencies comparison
7
==

Semantic Differencing (cont.)
 JDiff (Apiwattanapong et al. ASE’04, 06)
 Extended control-flow graph (ECFG)
 Dynamic binding, class hierarchy, exception handling, etc.
8

Semantic Differencing (cont.)
 Differential symbolic execution (Person et al., FSE’08)
 “Executing” a program using symbolic values
9

Outline
Text Differencing
Syntactic differencing
Semantic differencing
Explaining code changes
Customization
10

 LSdiff (Kim and Notkin, ICSE’09)
 Group related changes
 Detect potential inconsistencies in a code change
11

Code Change Summarization (cont.)
 DeltaDoc (Buse and Weimer, ASE’10)
 Symbolic execution: obtain path predicates for each statement in both
versions
 Identify statements that are added, deleted, or have a changed predicates
 Summarization
12

 Multi-document summarization (Rastkar and Murphy, ICSE’13)
 Linking evolutionary documents (commit log, issue tracking entries)
 Finding the most informative sentences to extract to form a summary
 Similarity between a sentence and the title of the enclosing document
 Overlap between a sentence and the adjacent document
13

 Challenges
 Evolutionary documents
 Linkage might not be found (Bachman et al., FSE’10, Wu et al., FSE’11)
 Human-written document may be unavailable or uninformative (Buse and Weimer,
ASE’10, Tao et al., FSE’12)
 Automatically generated document
 Verbosity
 Uninteresting changes are identified, e.g., “all types that declared toString() added
constructors” (Kim and Notkin, ICSE’09)
14
LSdiff DeltaDoc

Outline
Text Differencing
Rules and exceptions
Control-flow changes
Evolutionary documentation
Customization
15

 Specifying and detecting meaningful changes (Yu et al., ASE’11)
 Normalize the program (user-specified) before differencing
 Non-trivial to construct the query
16

Querying and Filtering (cont.)
 Filtering non-essential changes (Kawrykow and
Robillard, ICSE’11)
 Non-essential changes: rename-induced modifications, local
variable extraction, trivial keyword modification, whitespace
and documentation updates
 ChangeDistiller (Fluri et al., TSE’07) + Partial program
analysis (Dagenais and Robillard, ICSE’08)
 Goal: improving mining and recommendation accuracy
instead of developers’ comprehension
17

Outline
Text Differencing
Meaningful changes
Non-essential changes
18

Research Directions
Text Differencing
Meaningful changes
Source Code Changes
Work-item-based changes?
19

Work-item-based Changes
 Multiple work-items in a single code change (e.g., a bug fix +
code cleanup + a new feature)
 Very difficult to understand (Tao et al., FSE’12)
20
JFreeChart revision 1083
Trivial keyword removal
Bug fix
Formatting

Work-item-based Change Detection
 Multiple work-items in a single code change (e.g., a bug fix +
code cleanup + a new feature)
 Very difficult to understand (Tao et al., FSE’12)
 Change decomposition
 Program slicing (entity dependencies)
 Pattern matching (similarities)
 A single work-item spreads across multiple code changes
(e.g., 5 changes to finally fix a bug completely)
 Change aggregation
 Linkage to the same issue
 Heuristics like time duration, commit authors, program dependencies, etc.
21

Research Directions
Text Differencing
Syntax differencing
Meaningful changes
Work-item change detection
Change decomposition
Change aggregation
22

Research Directions
Text Differencing
Syntax differencing
Meaningful changes
Work-item-specific
changes
Change aggregation
23

Research Directions
Text Differencing
Syntax differencing
Meaningful changes
Work-item-specific
changes
Concrete Execution
Change aggregation
24

Explaining code changes with executions of co-
changed test cases
25
 Test cases
 Best documentation for source code
 Test cases co-changed with source code
 Documentation for code changes?
 Mostly synchronous co-evolution of production and test
code (Zaidman et al., Empirical Software Engineering’11)
 Differential test executions
 Co-changed test cases T
 Executing T on the old version P and new version P’
 Comparing executions to explained change behaviors
From StackExchange
http://programmers.stackexchange.com/questions/154439/quality-of-code-in-
unit-tests?newsletter=1&nlcode=67628%7c1a35
• “Unit tests are one of the best sources of documentation for your system,
and arguably the most reliable form”
• “Unit tests are often the first thing you look at when trying to grasp what
some piece of code does”
• “They can also serve as a starting point for people new to the code base”

Research Directions
Text Differencing
Syntax differencing
Meaningful changes
Work-item-specific
changes
Concrete Execution
• Co-changed test cases
• Differential test execution
Change aggregation
26

Source code comprehension on evolving software

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (14)

Similar a Source code comprehension on evolving software

Similar a Source code comprehension on evolving software (20)

Más de Sung Kim

Más de Sung Kim (9)

Último

Último (20)

Source code comprehension on evolving software

Notas del editor