1. Debugging of Software Regressions
Abhik Roychoudhury
National University of Singapore
Intl. Seminar on Program Verification, Automated
Debugging and Symbolic Computation (PAS) 2012
Organized by Beihang U. and Chinese Acad. of Sciences
2. Software is Evolving
1329662 versions in
About 270 changes per day, 1 change per 6 minutes
More than 250 billion lines of code have been created
More than 53116 issues publicly reported in Apache
BugZilla.
Maintaining software quality in this evolving process is
challenging.
Testing, debugging, bug-fixing after code changes
3. Outline: Debugging software regressions
Describe intended behavior of program changes
Change Contract language (later part of the talk)
OR
Extract actual behavior resulting from program changes
Symbolic execution
Novel usage of symbolic execution, beyond guiding search.
4. Debugging vs. Bug Hunting
input = 0
P
G( pc = end ⇒output > input)
P
output = 0
Model Checker
Counter-example:
input = 0, output = 0
We should have (output > input)
(b) Model Checking
(a) Debugging
5. Debugging vs. Bug Hunting
Debugging
Have a problematic input i, or ``counter-example” trace.
Does not match expected output for i.
Not sure what desired ``property” is violated.
Amounts to implicitly alerting programmer about program’s intended
specifications as well.
Bug Hunting via Model Checking
Have a desired “property”
Tries to find a counter-example trace, and hence an input which
violates the property.
6. Regression Debugging
Test Input t why?
Old Stable New Buggy
Program P Program P’
Pass Fail
7. Contributions
Debugging evolving programs
Introduction of formal techniques into debugging
Traditional: input mutation, trace comparison …
Ours: symbolic execution, SMT solving, dependency analysis, …
New usage of symbolic execution
From guiding search to extracting glimpse of program semantics
8. Adapting Trace Comparison
Test Input t New Input t’ ??
Old Stable New Buggy
Program P Program P’
Path σ Path π
for t
X
Directly Compare σ and π
for t
9. How to obtain the new test?
The new test input
New
Old Pgm. P’
Pgm. P
Buggy input
10. Path Condition
in==0
input in;
in >= 0
Yes No
Useful to find:
“the set of all inputs a = -1;
which trace a given a = in;
path”
-> Path condition
return a;
in ≥ 0
Corresponds to the logical formula ∃ in in ≥ 0
11. Our Approach
Test Input t New Input t’
Old Stable New Buggy
Program P Program P’
Path σ for t Path π for t Path π’ for t’
Path condition f Path condition f’
1. Solve f ∧ ¬f ' to get 2. Compare π and
another input t’ π’ to get bug report.
12. Generating New Input
1. Compute f, the path condition of t in P.
2. Compute f’, the path condition of t in P’.
3. Solve for f ∧ ¬f '
Many solutions: Compare the trace of each t’ in P’ with the
trace of t in P’. Return bug report from P’.
No solution: go to next step.
4. Solve for f '∧¬f
Many solutions: Compare the trace of each t’ in P with the
trace of t in P. Return bug report from P.
No solution: Impossible, since then f ⇔ f '
13. Simple Example
int inp, outp; int inp, outp;
scanf("%d", &inp); scanf("%d", &inp);
if (inp >=1){ if (inp >= 1){
outp = g(inp); outp = g(inp);
if (inp>9){ /* if (inp>9){
outp=g1(inp); outp=g1(inp);
} } */
} else{ } else{
outp = h(inp); outp = h(inp);
} }
printf("%d", outp); printf("%d", outp);
1,2,..,9 1,2,…,9,
10,11,… Explain inp == 100
10,11,…
0,-1,-2,.. 0,-1,-2,…
using ?? 9
14. int inp, outp; inp==100 int inp, outp;
scanf("%d", &inp); scanf("%d", &inp);
if (inp >=1){ if (inp >= 1){
outp = g(inp); outp = g(inp);
if (inp>9){ /* if (inp>9){
outp=g1(inp); outp=g1(inp);
} } */
} else{ } else{
outp = h(inp); outp = h(inp);
} }
printf("%d", outp); printf("%d", outp);
Path condition f Path condition f’
(inp >= 1)&& (inp>9) (inp >= 1)
STP Solver
f ∧ ¬f ' = (inp > 9) & &(inp <= 1) No soln.
STP Solver
f '∧¬f = (inp >= 1) & &(inp <= 9) inp==9
16. Choosing Alternative Inputs
b1 Solve f ∧ ¬f '
¬ψ 1 ψ1
b2 f ' = (ψ 1 ∧ψ 2 ∧ ... ∧ψ m )
¬ψ 2 ψ2
b3 Check for satisfiability of
¬ψ 3 ψ3 f ∧ ¬ψ 1
b4
ψ4 f ∧ ψ 1 ∧ ¬ψ 2
b5
ψ5 f ∧ ψ 1 ∧ ψ 2 ∧ ¬ψ 3
b6
f'
At most m alternate inputs !!
17. Bug report for one alternate input
b1
ψ1 tnew = input obtained by solving
b2 f ∧ ψ 1 ∧ ψ 2 ∧ ¬ψ 3
ψ2
b3 Bug report by comparing traces of
¬ψ 3 ψ3 tbug and tnew should be the branch
b3 !!
b4
tnew ψ4
At most m alternate inputs ⇒
b5 at most m lines in bug report.
ψ5
b6 Comparing traces with deviation in one
f' branch –remove trace comparison
tbug
altogether
18. DARWIN: Putting Everything Together
Test Input t Alternative Input t’
Old Stable New Buggy
Program P Program P’
STP Solver Satisfiable sub-
and input Concrete and
formulae from
validation Symbolic Execution
f / ¬f’
f:Path condition f':Path condition
of t in P of t in P’ Bug Report (Assembly level)
Bug Report (Source level)
f ∧ ¬f '
19. Implementation
TEMU
VINE:
Windows/Linux OS
x86 binaries Symbolic Execution
Path Conditition Extraction
Dynamic Slicing
Other analysis on binary
trace
Assembly level
Execution Trace
http://bitblaze.cs.berkeley.edu/
20. Results
Buggy Program Stable program Time taken Bug report size
LibPNG v1.0.7 LibPNG 13 m 34 s 9
(31164 loc) v1.2.21
(36776 loc)
TCPflow TCPflow 31m 6
(patched) (unpatched)
Miniweb Apache 14s 5
(2838 loc) (358379 loc)
Savant Apache httpd 9m 46
(8730 loc) (358379 loc)
If we require the alternative input to behave the same in buggy program and reference
program (passing test) - the bug report size is 1 in all three cases.
21. An experiment we tried
Validate Embedded Linux
AGAINST
Linux (GNU Core-utils, net –tools)
Busybox distribution is 121 KLOC.
Various errors to be root-caused in tr, arp, top, printf.
22. Trying on Embedded Linux
• The concept
– Golden: GNU Coreutils, net-tools
– Buggy: Busybox
– De-facto distribution for embedded devices.
– Aims for low code size
– Less checks and more errors.
– Try DARWIN!
• The practice
– Failing input takes logically equivalent paths in Busybox and Core-
utils.
23. Going beyond
P P’
input x; input x;
y = 2 * x; y = 2*x + 1; // bug
output y output y
Observable error: Input x == 0, Expected output y == 0
Observed output y == 1
Employ DARWIN:
In program P, path condition f == true
path condition f’ == true
f ∧ ¬ f’ == false also f’ ∧ ¬f == false.
No Bug report generated !!
24. A more direct approach
P P’
input x; input x;
y = 2 * x; y = 2*x+1; // bug
output y output y
• Characterize observable error (obs)
– y != 0
• Weakest pre-condition along failing path w.r.t. obs
– 2*x != 0
– 2*x + 1 != 0
• Compare the WPs and find differing constraints.
• Map differing constraints to the lines contributing them.
25. Approach 2 - summary
• Set observable error: x< 0
• Set slicing criterion: value of x at line 8
• Simultaneously perform
– Slicing – Control and Data dependencies
– Symbolic execution – along the slice
– WP computation along the slice
• The above is performed on both P, P’
– Produces WP, WP’ – conjunction of constraints
– Find differing constraints in WP, WP’
– Map differing constraints to contributing LOC – this is the bug-
report.
26. Approach 2 – in action
1. ... // input inp1, inp2 inp1 - 1< 0 ∧ inp1> 0
2. if (inp1 > 0) inp1 - 1 < 0 ∧ inp1 > 0 (control dep.)
4. x = inp1 - 1; // bug
inp1 - 1 < 0 (data dep.)
4. else x = inp1 + 1;
5. if (inp2 > 0)
8. y = inp2 - 2
7. else y = inp2 + 2;
8. ... // output x, y observe unexpected x < 0 for inp1 == inp2 == 1
27. Comparing WP, WP’
WP = (ϕ1∧ϕ2∧ … ∧ϕn)
WP’ = (ϕ’1∧ ϕ’2 ∧ …∧ϕ’m)
Solver may choke in trying to check
WP ⇒ ϕ’1 … WP’ ⇒ ϕ1 …
Instead, we perform pair-wise comparison
XTautology elimination – computation along slice:
=1 WP lot of reduction!
… 1 > 0 ∧ Y < 0 // due to assignment of X
if (X > 0) {
… X > 0 ∧ Y < 0 // due to the branch
printf(“%d”,Y);
} Y < 0 // the constraint we start with
28. Experiments on Embedded Linux
Utility Trace Size Slice Size WP terms WP terms LOC in Time
(after elim.) BugReport taken
arp 5039 : 4764 56524 : 51448 722 : 434 27 : 34 1:3 1m30s
top 1637 : 3921 34523 : 332281 566 : 2501 8:6 2:0 1m28s
printf 3702 : 3633 27781 : 40403 241 : 414 21 : 35 1:3 1m20s
tr 5474 : 138538 85047 : 29375 445 : 280 9:9 1:0 2m28s
• Each : separated tuple in Columns 2-6 refers to data from
embedded Linux and GNU Coreutils in that order
• Trace Size refers to no. of assembly / intermediate level instructions
• Tautology elimination reduces a significant WP analysis overhead
• Bug report size is quite small in each of the cases
29. Retrospective
Symbolic execution –
Test generation [ e.g. DART, KLEE, … ]
Path traversal in path sensitive searches such as model checkers
[e.g. JPF]
Debugging – some milestones
Visualization [e.g. Tarantula ]
Dynamic Slicing [e.g. JSlice]
Trace comparison and Delta Debugging
Symbolic Techniques
30. Perspective: Symbolic Execution
Guiding search Uncovering what went wrong
- Test generation, model checking, … - Debugging!
- Summarization & Semantics Extraction
31. Our use of the symbolic techniques
• Debugging evolving programs (code evolution)
– Program Versions
– Embedded SW against non-embedded version
– Two implementations of same specification
– Web-servers implementing http
• Related works in our group using symbolic techniques
– Test generation to stress a given program change
– Test suite augmentation and Program Path Partitioning
32. Where are we?
Debugging is aided by specification discovery.
Intended program behavior
What the program should do!
Symbolic Execution can extract specifications.
Actual program behavior
What the program is actually doing!!
Symbolic execution of reference program gives a hint of intended behavior!
One possible take from today’s discussion -
It is possible to bridge the gulf.
How to directly specify intended behavior of changes?
33. Outline: Debugging software regressions
Describe intended behavior of program changes
Change Contract language (now!)
OR
Extract actual behavior resulting from program changes
Symbolic execution
Novel usage of symbolic execution, beyond guiding search.
34. Programmer Intention
Test Input t
Old Stable New Program
Program P P’
Output ≠ Output’
Bug?
35. Change Contract Example
Set m(String s){ Set m(String s){
if(/*complex predicate on s*/) if(/*complex predicate on s*/)
return new HashSet(); return new HashSet();
else else
return new TreeSet(); return new TreeSet().add(s);
} }
/*@changed_behavior
@ when_ensured result instanceof TreeSet;
@ ensures result.size() == prev(result).size() +
1;
@*/
36. Change Contract
A specification language for program changes
This is the intended change, not the actual change!!
Two Aspects
Under which condition the program semantics is changed? (sub input space)
How is the program semantics changed?
Realized in Java based on JML(Java Modeling Language)
37. Why not Program Contract?
Set m(String s){ Set m(String s){
if(/*complex predicate on s*/) if(/*complex predicate on s*/)
return new HashSet(); return new HashSet();
else else
return new TreeSet(); return new TreeSet().add(s);
} }
/*@changed_behavior
@ when_ensured result instanceof TreeSet;
@ ensures result.size() == prev(result).size() +
1;
@*/
38. Default Equal Assumption
For inputs not specified in change contract, program
behavior remains the same.
/*@changed_behavior
@ when_ensured result instanceof TreeSet;
@ ensures result.size() == prev(result).size() +
1;
@*/
When return value is not instance of TreeSet,
previous method and current method should behave
the same
39. Empty change contract
pre ≡ pre
//@ changed_behavior
void m1() { void m2() {
x.f = x.f - 1; x.f = x.f + 1 - 2;
} }
post ≡ post
40. Change contract checker
V1 Change
Contract
C.C
Checker
Parser RAC ESC
V2 OpenJML
Automatic
Input
Test suite
Augmentation [ASE10]
41. Change Contract Language Expressiveness
User-study on 3 open-source Java projects
Ant, JMeter, Log4j
Write Change Contracts based on real changes
two MS students (CS major)
Not related to our research – to give veracity to our user study.
Did not have background on contracts.
52 changes in total
All of them can be expressed using change contracts
42. User study
Total Refactorin
Proj. Diff ?? n/a
changes g
Ant 43 4 28 3 8
JMete
17 1 11 1 4
r
log4j 20 2 13 1 4
• ??: failed to be understood due to e.g., 3rd-party library
• n/a: e.g., multi-threading, non-program changes
43. Detecting Incorrect Changes through Change
Contracts
V1 Buggy change
V2 Bugfix to previous bug
V3
Change Contract
Detected all incorrect changes for 10 studied cases in Ant, Jmeter,
Log4j, via randomly generated test cases from Randoop.
44. Change contract can ...
detect incorrect program changes.
serve as program change requirement.
substitute ambiguous and often incorrect change logs.
guide more efficient test suite augmentation.
45. Wrap-up: Debugging software regressions
Describe intended behavior of program changes
Change Contract language
Extract actual behavior resulting from program changes
Symbolic execution
Novel usage of symbolic execution, beyond guiding search.
46. Our dilemma is that we hate change
and love it at the same time; what we
really want is for things to remain the
same but get better.
Sydney J. Harris
47. Acknowledgements
• Co-Authors
• Dawei Qi , Zhenkai Liang, Jooyong Yi, … - NUS
• Kapil Vaswani – Microsoft Research India
• Ansuman Banerjee – Indian Statistical Institute Kolkata
• Funding from
• Defense Research and Technology Office.
• Ministry of Education, Singapore.
• Papers: FSE 09, FSE 10, FSE 11, FSE 12, ASE10, TOSEM