A Survey on Dynamic Symbolic Execution for Automatic Test Generation

A Survey on
Dynamic Symbolic Execution
for Automatic Test Generation
Jan. 6 2014
PQE
Hyunmin Seo
1

Motivation
•  Testing is a practical way to verify software
•  The cost for testing account more than 50%
of total software development costs [Tassey
‘02]
•  Effective, efficient and scalable automatic
testing is required [Bounimova ‘13, Kim ‘12]
2

Outline
•  Automatic Test Generation
–  Random Testing
–  Combinatorial Testing
–  Search-Based Testing
–  Symbolic Execution-Based Testing
–  Dynamic Symbolic Execution
•  Challenges in DSE (SE)
–  Imprecision
–  Constraint Solving
–  Path Explosion
3

Outline
–  Imprecision
4

Random Testing
•  Random Testing
– Randomly generate test inputs
•  Adaptive Random Testing (ART)
– Spread test cases evenly over input domain [Chen
’04]
– Failure-causing inputs form contiguous region
[White ‘80, Chan ‘96]
•  Feedback-Directed Random Testing
– Randoop [Pacheco ‘07]
– Unit testing
5

Random Testing Summary
•  One of the most fundamental and well-studied
approach [Hamlet ‘94, Loo ‘88]
–  Many variations
•  Pros
–  Efficient, Scalable
–  No source code requirement
•  Cons
–  Low coverage [Burnim ’08]
6

Outline
–  Imprecision
7

Combinatorial Testing
•  Find a subset of input parameters satisfying a
certain property [Cohen ‘13]
•  Mathematical property
8

Vertical
Ruler
Ruler
Units
Default
View
SS Navigation End with Black Always
Mirror
Warn Before
Visible Inches Normal Pop-up Yes Yes Yes
Invisible Centimeters Slide None No No No
Points Outline
Picas
Total # of configuration Settings = 2*4*3*2*2*2 = 384
9

N-way Covering Array
•  A subset including all the possible
combinations from any N factors at least once
[Cohen ‘13]
10

No Vertical
Ruler
Ruler
Units
Default
View
SS Navigation End with
Black
Always
Mirror
Warn
Before
1 Visible Centimeters Outline Pop-up No No Yes
2 Invisible Inches Outline Pop-up No No No
3 Invisible Centimeters Slide None Yes Yes Yes
4 Visible Picas Outline Pop-up Yes Yes No
5 Invisible Centimeters Normal Pop-up Yes Yes No
6 Visible Points Outline None Yes No Yes
7 Invisible Points Slide Pop-up No No No
8 Invisible Picas Slide Pop-up No Yes Yes
9 Invisible Points Normal None No Yes No
10 Visible Inches Normal None Yes No Yes
11 Visible Inches Slide Pop-up No Yes Yes
12 Invisible Picas Normal None Yes No No
Vertical
Ruler
Ruler
Units
Default
View
SS Navigation End with Black Always
Mirror
Warn Before
Visible Inches Normal Pop-up Yes Yes Yes
Invisible Centimeters Slide None No No No
Points Outline
Picas
2-Way Covering Array
11
CA(12;2,(25,31,41)

Combinatorial Testing Summary
•  Research Direction
–  How to find the minimum size array
•  Greedy [Tung ‘00, Colbourn ‘04]
•  Meta-heuristics [Cohen ‘03, Stardom ‘01]
–  Application to different domain
•  Software Product Line [McGregor ‘01, Perrouin ‘10]
•  Pros
–  Systematic testing with mathematical property [Cohen ‘13]
–  Sample configurations to be tested [Qu 08’]
•  Cons
–  Too many combinations for program inputs
12

Outline
–  Imprecision
13

Search-Based Testing
•  A branch of SBSE in which meta heuristics are
used to guide the search [McMinn ‘04]
•  Typical process
–  Start with a random input
–  Search nearby locations for better solution
–  Evaluate with fitness function
–  Update the current solution with a better solution
–  Search is guided by meta-heuristics
14

Meta-Heuristics
Input domain
(a) Hill climbing
Fitnessvalue
Input domain
(b) Simulated Annealing
Fitnessvalue
Input domain
(c) Genetic Algorithm
Fitnessvalue
15
[McMinn ‘11]

[McMinn ’11]
Input :A string
count:The number of digits in the string
if (count >= 4)
if (count <= 10)
if (checksum % 10
== checkdigit)
FALSE
FALSE
FALSE
TRUE
TRUE
TRUE
Target
π2: count = 20
π3: count = 11
π1
π2
π3
Search Based-Testing Example
16

Fitness Function
•  Combination of approach level and branch distance
•  Approach level
–  The number of target’s control dependent node not executed by the current
input
•  Branch distance [Tracey ‘98]
17
Element
Value

Boolean
if
TRUE
then
0
else
K

a
=
b
if
abs(a-‐b)
=
0
then
0
else
abs(a-‐b)
+
K

a
≠
b
if
abs(a-‐b)
≠
0
then
0
else
K

a
<
b
if
a-‐b
<
0
then
0
else
(a-‐b)
+
K

a
≤
b
if
a-‐b
≤
0
then
0
else
(a-‐b)
+
K

a
>
b
if
b-‐a
<
0
then
0
else
(b-‐a)
+
K

a
≥
b
if
b-‐a
≤
0
then
0
else
(b-‐a)
+
K

a
∨
b
min
(
cost(a),
cost(b)
)

a
∧
b
cost
(a)
+
cost
(b)

!a
move
negation
inward
and
propagate

Search-Based Testing Summary
•  A branch of SBSE
–  Different search heuristics
–  Different domain [Harman ’13]
•  Pros
–  Guide the execution toward a specific branch
–  Non-functional testing (ex. longest execution time)
[Wegener ’98]
•  Cons
–  Search space challenge
–  Design of fitness functions [Arcuri ‘10]
18

Outline
–  Imprecision
19

Symbolic Execution-Based Testing
•  Use symbolic values to represent program
variables and path conditions [King ‘76, Clarke
‘76]
•  Find precise constraints for each execution
path and generate test input by solving the
constraints.
20

x
=
sym_input();

y
=
sym_input();

z
=
sym_input();

a
=
x
+
y

if
(z
>
a)

b
=
x
–
y

else

b
=
2
*
y

...

Var
Value

PC:
s3>s1+s2

PC:
s3<=s1+s2

x s1

y s2

z s3

a s1 + s2

b s1 - s2

Var
Value

x s1

y s2

z s3

a s1 + s2

b 2s2

Symbolic Execution
21

π1 : PC1
π2 : PC2
π3 : PC3
.
.
.
πn : PCn
Test Generation
SMT solver
π1 : x = 1, y = 2, ...
π2 : x = 1, y = 5, ...
π3 : x = -5, y = 0,..
.
.
.
πn : x = …, y = …
Path Conditions Test Inputs
22

Symbolic Execution Based-Testing
Summary
•  Pros
–  No redundant inputs taking the same path
–  High Coverage
•  Cons
–  Low efficiency
–  Depends on constraint solving techniques
–  External library calls
–  State explosion
–  Imprecision
23

Outline
–  Imprecision
24

Limitations of SE
25
01

void
foo(int
x,
int
y)
{

02

if
(external
(x)
==
y)
{

03

//
branch
1

04

}

05

else
if
(hash(x)

>
y)
{

06

//
branch
2

07

}

08

}

è No
source
code
available

è hash()
is
complex
arithmetic

Dynamic Symbolic Execution
•  Perform symbolic execution dynamically along
an execution path of a concrete input [DART
‘05, CUTE ’05, PEX ‘08]
•  Apply concretization
– External library calls
– Complex constraints
26

DSE
π1

pc1

pc2

pc3

pc4

π2

π1

π2

π1

π3

PC’ =

pc1∧pc2∧!pc3

PC’’ =

pc1∧!pc2

27
PC = pc1∧pc2∧pc3 … ∧pcn

Benefit
•  Based on symbolic execution
–  No redundant inputs taking the same path
–  High coverage
•  Reach deep program state by starting from well-formed user
provided input
•  Use concrete values to overcome limitations
–  External library calls
–  Complicated constraints
•  Many tools
–  CREST, CUTE, JCUTE, PEX, SAGE, EXE, KLEE
28

Comparison
Technique Efficiency Coverage
Source code
Requirement
ETC
Random
No
Combinatorial
No
Combine with
other techniques
Search-Based
Yes/No
Non-functional
Testing
Symbolic
Execution
Yes
DSE
Yes Concretization
29

Outline
–  Imprecision
30

Imprecision
•  When the symbolic execution cannot
represent the exact semantic of the program
[Elkarablieh ’09]
– Modeling a 4-Byte integer with a mathematical
integer
•  Imprecision may manifest as Divergence
[Godefroid ’08]
31

Divergence
pc1
pc2
pc3
pc4
pc5
pc1 ∧ pc2 ∧ ! pc3
32

Proposed solutions
•  Integer size, Bit operations
–  BitVector [SAGE ’08]
•  Symbolic pointer dereferencing
–  Array Theory of SMT solvers [Elkarablieh ‘09]
•  Floating-point operations
–  Combined static and dynamic analysis [Godefroid ‘10]
•  Interaction with environment
–  Modeling [KLEE ‘08]
–  Reporting [Xiao ‘11]
33

BitVector
•  Use bitvector in SMT solvers
– Fixed-size integers
– Bit operation on integer variables
•  a & b
•  a << 4
•  Slower than integer arithmetic
34

Symbolic Pointer Dereferencing
•  Symbolic values are used to calculate the
addresses of pointer values
– Array index
– a[S0]
35

01

void
single
array
(BYTE
x,
BYTE
y)
{

02

BYTE
∗
a
=
new
BYTE[4];

03

a[0]
=
x;

04

a[1]
=
0;

05

a[2]
=
1;

06

a[3]
=
2;

07

08

if
(a[x]
==
a[y]
+
2)

09

assert(false
);

10

11

delete
[]
a;

12 }

a[x] == a[y] + 2 è 0 != 0 + 2
a[x] == a[y] + 2 è S0 != 0 + 2
a[x] == a[y] + 2 è 1 != 0 + 2
[Elkarablieh ‘09]
36
Con Sym Con
x 0 S0 2
y 1 S1 1
a[0] 0 S0 2
a[1] 0 0 0
a[2] 1 1 1
a[3] 2 2 2
a[x] 0 S0 1
a[y] 0 0 0
Symbolic Pointer Dereferencing Example

01

void
single
array
(BYTE
x,
BYTE
y)
{

02

BYTE
∗
a
=
new
BYTE[4];

03

a[0]
=
x;

04

a[1]
=
0;

05

a[2]
=
1;

06

a[3]
=
2;

07

08

if
(a[x]
==
a[y]
+
2)

09

assert(false
);

10

11

delete
[]
a;

12 }

[Elkarablieh ‘09]
37
Array Theory of SMT Solver
Con Sym Con
x 0 S0 2
y 1 S1 1
a[0] 0 S0 2
a[1] 0 0 0
a[2] 1 1 1
a[3] 2 2 2
a[x] 0 S0 1
a[y] 0 0 0
a[x]
:

0
≤
x
≤
3
∧
a[x]

{0,1,2}

a[y]
:

0
≤
y
≤
3
∧
a[y]

{0,1,2,x}

Floating Point Operation
•  [Godefroid ’10]
•  FP code should only perform memory safe
data-processing
– Payload of an image or video file
•  Non-FP code should deal with buffer
allocations and memory address computations
•  Lightweight local path-insensitive “may”
analysis + precise “must” dynamic analysis
38

Interaction With Environment
•  Modeling [KLEE ‘08]
– System Calls
– int
fd
=
open(argv[1],
O_RDNLY);

•  Precise Identification and Report
– [Xiao ’11]
39

Imprecision Summary
Reason Proposed Solutions
Fixed-size Integer BitVector [SAGE ‘08]
Symbolic Pointer
Dereferencing
Array Theory [Elkarablieh ’09]
Floating-point operations Combined Static and Dynamic
analysis [Godefroid ‘10]
Interaction with
Environment
Modeling [KLEE ‘08]
Precise identification and report
[Xiao ’11]
40
Remaining Challenges: Precise reasoning about floating
points, Interaction with Environment, External Library
Calls, Concurrent programs

Outline
–  Imprecision
41

Constraint Solving
•  Need to solve path constraints to get the test
input
•  The major bottleneck
– Takes long time
– Cannot solve
42

Proposed Solutions
•  Optimization [KLEE ‘08]
– Expression rewriting
– Implied value concretization
– Irrelevant constraint elimination
– Constraint caching
•  Meta-heuristic based constraints solving
[Borges ‘12, Souza ‘11, Lakhotia ‘10]
•  Hybrid approach [Garg ‘13]
43

Optimization
•  Irrelevant constraint elimination [KLEE ‘08]
•  Constraint Caching [KLEE ‘08]
44

Meta-Heuristic Approach
•  SMT solvers may not support
– Non-linear constraints
– Floating-Points expressions
– Very complex constraints
•  Use Meta-Heuristic Approaches
[Borges ‘12, Souza ‘11, Lakhotia ’10]
45

Hybrid Approach [Garg ’13]
•  Apply concretization first and solve it quickly
with an off-the-shelf SMT solver
•  If divergence occurred, use ICP (Interval
Constraint Propagation) to solve the
constraints
46

Constraint Solving Summary
Target Proposed Solutions
Time overhead Irrelevant Constraint Elimination
Constraint Caching [KLEE ‘08]
Complex constraints Meta-heuristic Approach [Borges ‘12,
Souza ‘11, Lakhotia ‘10]
Non-linear constraints ICP [Garg,‘13]
47
Remaining Challenges: Floating points, Complex
constraints, Non-linear constraints

Outline
–  Imprecision
48

Path Explosion
•  The number of paths in a program increases
exponentially with the number of branches in
the program
49

Path Explosion
π1

pc1

pc2

pc3

pc4

π2

π1

π2

π1

π3

pc1∧pc2∧!pc3

pc1∧!pc2

50

Proposed Solutions
•  Pruning Redundant Path
–  RWset [Cristian ‘08]
–  Interpolation [Jaffar ’13]
•  Function Summary
–  Compositional [Godefroid ‘07,‘10]
–  Demand-driven compositional [Anand ‘08]
•  Search Heuristics
–  CFG [Burnim ‘08]
–  Generational [Godefroid ‘08]
–  CarFast [Park ‘12]
–  Hybrid [Majumdar ‘07]
51

Pruning Redundant Paths
•  RWset ‘08
– If an execution reached a program point in the
same state as some previous executions, then the
execution will produce the same results
– If two states are only differ in program values that
are not subsequently read, then the two state will
produce the same results
52

Pruning Redundant Paths
•  Interpolant [Jaffar ’13]
•  Succinctly representation of the core reason
why a branch cannot be covered
53

Interpolant Example
54
UNSAT branch
Full Interpolant
( x < 3z + 2)
[Jaffar ’13]

Function Summary
•  A function summary [Godefroid ‘07,‘10]
•  prew is a conjunction of constraints of the
inputs to the function
•  postw , effect, is a conjunction of constraints of
the outputs from the function
55

Function Summary
foo(x, y)
Assume foo has 10 execution paths
Without Summary With Summary
N paths
N × 10
paths
foo(x, y)
N paths
N paths
56

Search Heuristics
•  Prioritize branches and explore relevant
branches only
57

Search Heuristics
(a) DFS (b) BFS (c) Heuristic Search
58

Search Heuristics
•  Coverage-Optimized
– CFG-directed [Burnim ‘08]
– CarFast [Park ‘12]
– Generational [GodeFroid ‘10]
– Hybrid [Majumdar ‘07]
•  Patch-Optimized
– KATCH [Cadar ‘13]
59

CFG-Directed Search
60
π1

pc1

pc2

pc3

pc4

[Burnim ’08]

Limitations of Search Heuristics
•  Does not consider how execution reached to
branch
•  Does not handle non-symbolic path
constraints
– pc = 3 > 0
– pc’ = !(3 > 0) = 3 ≤ 0 = UNSAT
61

Guiding Execution Toward a Branch
62
UNSAT

Path Explosion Summary
Approach Proposed Solutions
Pruning Redundant Paths RWset [Boonstoppel ‘08]
Interpolation [Jaffar ‘13]
Function Summary Compositional [Godefroid ’07,‘10]
Demand-Driven Compositional
[Anand ‘08]
Search Heuristics CFG-Directed [Burnim ‘08]
Generational [Godefroid ‘08]
CarFast [Park ‘12]
Hybrid [Majumdar ‘07]
KATCH [Cadar ’13]
63
Remaining Challenges: Better Search Strategies, Guiding
execution toward a specific branch

Conclusion
•  DSE is a promising automatic test generation
techniques achieving a high coverage
•  DSE relies on symbolic execution and
constraint solving
•  Challenges
– Imprecision, Constraint solving, Path explosion
– GUI Application Testing, Concurrent programs,
Object Creation problem
64

65
Challenges and Proposed Solutions
Imprecision Integer Size BitVector [SAGE ’08]
Symbolic Pointer
Dereferencing
Array Theory [Elkarablieh ’09]
Floating-points Combined Static and Dynamic analysis
[Godefroid ’10]
Environments Modeling [KLEE ‘08]
Precise identification and report [Xiao ’11]
Constraint Solving Optimization Irrelevant Constraint Elimination
Constraint Caching [KLEE ’08]
Meta-Heuristics [Borges ‘12, Souza ‘11, Lakhotia ’10]
Hybrid ICP [Garg,‘13]
Path Explosion Pruning Redundant Paths RWset [Boonstoppel ‘08]
Interpolation [Jaffar ’13]
Function Summary Compositional [Godefroid ’07,‘10]
Demand-Driven Compositional [Anand ’08]
Search Heuristics CFG-Directed [Burnim ‘08]
Generational [Godefroid ‘08]
CarFast [Park ‘12]
KATCH [Cadar ’13]
Hybrid [Majumdar ‘07]

A Survey on Dynamic Symbolic Execution for Automatic Test Generation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to A Survey on Dynamic Symbolic Execution for Automatic Test Generation

Similar to A Survey on Dynamic Symbolic Execution for Automatic Test Generation (20)

More from Sung Kim

More from Sung Kim (13)

Recently uploaded

Recently uploaded (20)

A Survey on Dynamic Symbolic Execution for Automatic Test Generation