Automating Google Workspace (GWS) & more with Apps Script
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
1. A Survey on
Dynamic Symbolic Execution
for Automatic Test Generation
Jan. 6 2014
PQE
Hyunmin Seo
1
2. Motivation
• Testing is a practical way to verify software
• The cost for testing account more than 50%
of total software development costs [Tassey
‘02]
• Effective, efficient and scalable automatic
testing is required [Bounimova ‘13, Kim ‘12]
2
5. Random Testing
• Random Testing
– Randomly generate test inputs
• Adaptive Random Testing (ART)
– Spread test cases evenly over input domain [Chen
’04]
– Failure-causing inputs form contiguous region
[White ‘80, Chan ‘96]
• Feedback-Directed Random Testing
– Randoop [Pacheco ‘07]
– Unit testing
5
6. Random Testing Summary
• One of the most fundamental and well-studied
approach [Hamlet ‘94, Loo ‘88]
– Many variations
• Pros
– Efficient, Scalable
– No source code requirement
• Cons
– Low coverage [Burnim ’08]
6
8. Combinatorial Testing
• Find a subset of input parameters satisfying a
certain property [Cohen ‘13]
• Mathematical property
8
9. Vertical
Ruler
Ruler
Units
Default
View
SS Navigation End with Black Always
Mirror
Warn Before
Visible Inches Normal Pop-up Yes Yes Yes
Invisible Centimeters Slide None No No No
Points Outline
Picas
Total # of configuration Settings = 2*4*3*2*2*2 = 384
9
10. N-way Covering Array
• A subset including all the possible
combinations from any N factors at least once
[Cohen ‘13]
10
11. No Vertical
Ruler
Ruler
Units
Default
View
SS Navigation End with
Black
Always
Mirror
Warn
Before
1 Visible Centimeters Outline Pop-up No No Yes
2 Invisible Inches Outline Pop-up No No No
3 Invisible Centimeters Slide None Yes Yes Yes
4 Visible Picas Outline Pop-up Yes Yes No
5 Invisible Centimeters Normal Pop-up Yes Yes No
6 Visible Points Outline None Yes No Yes
7 Invisible Points Slide Pop-up No No No
8 Invisible Picas Slide Pop-up No Yes Yes
9 Invisible Points Normal None No Yes No
10 Visible Inches Normal None Yes No Yes
11 Visible Inches Slide Pop-up No Yes Yes
12 Invisible Picas Normal None Yes No No
Vertical
Ruler
Ruler
Units
Default
View
SS Navigation End with Black Always
Mirror
Warn Before
Visible Inches Normal Pop-up Yes Yes Yes
Invisible Centimeters Slide None No No No
Points Outline
Picas
2-Way Covering Array
11
CA(12;2,(25,31,41)
12. Combinatorial Testing Summary
• Research Direction
– How to find the minimum size array
• Greedy [Tung ‘00, Colbourn ‘04]
• Meta-heuristics [Cohen ‘03, Stardom ‘01]
– Application to different domain
• Software Product Line [McGregor ‘01, Perrouin ‘10]
• Pros
– Systematic testing with mathematical property [Cohen ‘13]
– Sample configurations to be tested [Qu 08’]
• Cons
– Too many combinations for program inputs
12
14. Search-Based Testing
• A branch of SBSE in which meta heuristics are
used to guide the search [McMinn ‘04]
• Typical process
– Start with a random input
– Search nearby locations for better solution
– Evaluate with fitness function
– Update the current solution with a better solution
– Search is guided by meta-heuristics
14
16. [McMinn ’11]
Input :A string
count:The number of digits in the string
if (count >= 4)
if (count <= 10)
if (checksum % 10
== checkdigit)
FALSE
FALSE
FALSE
TRUE
TRUE
TRUE
Target
π2: count = 20
π3: count = 11
π1
π2
π3
Search Based-Testing Example
16
17. Fitness Function
• Combination of approach level and branch distance
• Approach level
– The number of target’s control dependent node not executed by the current
input
• Branch distance [Tracey ‘98]
17
Element
Value
Boolean
if
TRUE
then
0
else
K
a
=
b
if
abs(a-‐b)
=
0
then
0
else
abs(a-‐b)
+
K
a
≠
b
if
abs(a-‐b)
≠
0
then
0
else
K
a
<
b
if
a-‐b
<
0
then
0
else
(a-‐b)
+
K
a
≤
b
if
a-‐b
≤
0
then
0
else
(a-‐b)
+
K
a
>
b
if
b-‐a
<
0
then
0
else
(b-‐a)
+
K
a
≥
b
if
b-‐a
≤
0
then
0
else
(b-‐a)
+
K
a
∨
b
min
(
cost(a),
cost(b)
)
a
∧
b
cost
(a)
+
cost
(b)
!a
move
negation
inward
and
propagate
18. Search-Based Testing Summary
• A branch of SBSE
– Different search heuristics
– Different domain [Harman ’13]
• Pros
– Guide the execution toward a specific branch
– Non-functional testing (ex. longest execution time)
[Wegener ’98]
• Cons
– Search space challenge
– Design of fitness functions [Arcuri ‘10]
18
20. Symbolic Execution-Based Testing
• Use symbolic values to represent program
variables and path conditions [King ‘76, Clarke
‘76]
• Find precise constraints for each execution
path and generate test input by solving the
constraints.
20
21. x
=
sym_input();
y
=
sym_input();
z
=
sym_input();
a
=
x
+
y
if
(z
>
a)
b
=
x
–
y
else
b
=
2
*
y
...
Var
Value
PC:
s3>s1+s2
PC:
s3<=s1+s2
x s1
y s2
z s3
a s1 + s2
b s1 - s2
Var
Value
x s1
y s2
z s3
a s1 + s2
b 2s2
Symbolic Execution
21
22. π1 : PC1
π2 : PC2
π3 : PC3
.
.
.
πn : PCn
Test Generation
SMT solver
π1 : x = 1, y = 2, ...
π2 : x = 1, y = 5, ...
π3 : x = -5, y = 0,..
.
.
.
πn : x = …, y = …
Path Conditions Test Inputs
22
23. Symbolic Execution Based-Testing
Summary
• Pros
– No redundant inputs taking the same path
– High Coverage
• Cons
– Low efficiency
– Depends on constraint solving techniques
– External library calls
– State explosion
– Imprecision
23
28. Benefit
• Based on symbolic execution
– No redundant inputs taking the same path
– High coverage
• Reach deep program state by starting from well-formed user
provided input
• Use concrete values to overcome limitations
– External library calls
– Complicated constraints
• Many tools
– CREST, CUTE, JCUTE, PEX, SAGE, EXE, KLEE
28
29. Comparison
Technique Efficiency Coverage
Source code
Requirement
ETC
Random
No
Combinatorial
No
Combine with
other techniques
Search-Based
Yes/No
Non-functional
Testing
Symbolic
Execution
Yes
DSE
Yes Concretization
29
31. Imprecision
• When the symbolic execution cannot
represent the exact semantic of the program
[Elkarablieh ’09]
– Modeling a 4-Byte integer with a mathematical
integer
• Imprecision may manifest as Divergence
[Godefroid ’08]
31
33. Proposed solutions
• Integer size, Bit operations
– BitVector [SAGE ’08]
• Symbolic pointer dereferencing
– Array Theory of SMT solvers [Elkarablieh ‘09]
• Floating-point operations
– Combined static and dynamic analysis [Godefroid ‘10]
• Interaction with environment
– Modeling [KLEE ‘08]
– Reporting [Xiao ‘11]
33
34. BitVector
• Use bitvector in SMT solvers
– Fixed-size integers
– Bit operation on integer variables
• a & b
• a << 4
• Slower than integer arithmetic
34
37. 01
void
single
array
(BYTE
x,
BYTE
y)
{
02
BYTE
∗
a
=
new
BYTE[4];
03
a[0]
=
x;
04
a[1]
=
0;
05
a[2]
=
1;
06
a[3]
=
2;
07
08
if
(a[x]
==
a[y]
+
2)
09
assert(false
);
10
11
delete
[]
a;
12 }
[Elkarablieh ‘09]
37
Array Theory of SMT Solver
Con Sym Con
x 0 S0 2
y 1 S1 1
a[0] 0 S0 2
a[1] 0 0 0
a[2] 1 1 1
a[3] 2 2 2
a[x] 0 S0 1
a[y] 0 0 0
a[x]
:
0
≤
x
≤
3
∧
a[x]
{0,1,2}
a[y]
:
0
≤
y
≤
3
∧
a[y]
{0,1,2,x}
38. Floating Point Operation
• [Godefroid ’10]
• FP code should only perform memory safe
data-processing
– Payload of an image or video file
• Non-FP code should deal with buffer
allocations and memory address computations
• Lightweight local path-insensitive “may”
analysis + precise “must” dynamic analysis
38
39. Interaction With Environment
• Modeling [KLEE ‘08]
– System Calls
– int
fd
=
open(argv[1],
O_RDNLY);
• Precise Identification and Report
– [Xiao ’11]
39
40. Imprecision Summary
Reason Proposed Solutions
Fixed-size Integer BitVector [SAGE ‘08]
Symbolic Pointer
Dereferencing
Array Theory [Elkarablieh ’09]
Floating-point operations Combined Static and Dynamic
analysis [Godefroid ‘10]
Interaction with
Environment
Modeling [KLEE ‘08]
Precise identification and report
[Xiao ’11]
40
Remaining Challenges: Precise reasoning about floating
points, Interaction with Environment, External Library
Calls, Concurrent programs
45. Meta-Heuristic Approach
• SMT solvers may not support
– Non-linear constraints
– Floating-Points expressions
– Very complex constraints
• Use Meta-Heuristic Approaches
[Borges ‘12, Souza ‘11, Lakhotia ’10]
45
46. Hybrid Approach [Garg ’13]
• Apply concretization first and solve it quickly
with an off-the-shelf SMT solver
• If divergence occurred, use ICP (Interval
Constraint Propagation) to solve the
constraints
46
52. Pruning Redundant Paths
• RWset ‘08
– If an execution reached a program point in the
same state as some previous executions, then the
execution will produce the same results
– If two states are only differ in program values that
are not subsequently read, then the two state will
produce the same results
52
53. Pruning Redundant Paths
• Interpolant [Jaffar ’13]
• Succinctly representation of the core reason
why a branch cannot be covered
53
55. Function Summary
• A function summary [Godefroid ‘07,‘10]
• prew is a conjunction of constraints of the
inputs to the function
• postw , effect, is a conjunction of constraints of
the outputs from the function
55
56. Function Summary
foo(x, y)
Assume foo has 10 execution paths
Without Summary With Summary
N paths
N × 10
paths
foo(x, y)
N paths
N paths
56
61. Limitations of Search Heuristics
• Does not consider how execution reached to
branch
• Does not handle non-symbolic path
constraints
– pc = 3 > 0
– pc’ = !(3 > 0) = 3 ≤ 0 = UNSAT
61