Enabling and Supporting the Debugging of Software Failures (PhD Defense)

ENABLING AND SUPPORTING
THE DEBUGGING
OF SOFTWARE FAILURES
Thesis Defense
James Clause

DEFINITIONS
‣ mistake: a human action that
produces an incorrect result
‣ fault: an incorrect step,
process, or data deﬁnition in
a computer program
‣ failure: the inability of a
system or component to
perform its required
functions within speciﬁed
requirements
Debugging

• “...departments tend to spend about half of their applications staff time on
maintenance” – Lientz and Swanson, 1981
DEBUGGING IS EXPENSIVE

• “Boehm, Brooks, Myers, andYourdon and Constantine indicate that testing
and debugging alone represent approximately half the cost of new system
development.” –Vessey, 1985

• “According to an informal industry poll, 85 to 90 percent of the IS
[Information Services] budget goes to legacy system operation and
maintenance.” – Erlikh, 2000

• “...the national annual costs of an inadequate infrastructure for software
testing is estimated to range from $22.2 to $59.5 billion” – NIST, 2002

• “...the national annual costs of an inadequate infrastructure for software
testing is estimated to range from $22.2 to $59.5 billion” – NIST, 2002
• “24,191 people … were involved in either opening, handling, commenting
on, or resolving WindowsVista bugs. That is an order of magnitude greater
than the ∼2,000 developers who wrote code forVista” – Guo, 2010

THESIS STATEMENT
Program analysis techniques can enable and
support the debugging of failures in widely-used
applications by:
1) capturing, replaying, minimizing, and, as much
as possible, anonymizing failing executions
2) highlighting subsets of failure-inducing inputs
that are likely to be helpful for debugging
such failures

TECHNICAL CONTRIBUTIONS
Recording and
replaying executions

Recording and
Input minimization✘

Recording and
Input anonymization

Recording and
Input anonymization
Highlighting failure-
relevant inputs

Recording and
Input anonymization
relevant inputs
Enable

Recording and
Input anonymization
relevant inputs
Support
Enable

MOTIVATION
Failures can be difﬁcult
to reproduce.

ENVIRONMENT INTERACTIONSStreams

ENVIRONMENT INTERACTIONSStreams
Files

LIMITATIONS
Not applicable in every situation

LIMITATIONS
• May not be enough space to store accessed data
• databases
• long running executions

LIMITATIONS
• databases
• May have unacceptable runtime overhead
• webservers, real-time applications

LIMITATIONS
• databases
• May have unacceptable runtime overhead
• webservers, real-time applications
Evaluation demonstrates that it can be useful
for some common application types.

EVALUATION
Acceptable
runtime overhead
Failures reproduced
successfully

EVALUATION
Prototype implementation:
• maps libc function calls to
interaction events
Subjects:
• several cpu intensive applications
(e.g., bzip, gcc)
Results:
• negligible overheads
• data size is acceptable
• all failures successfully replayed

345
PRACTICALITY ISSUES
Large in size

345345
PRACTICALITY ISSUES
Large in size Contain sensitive
information

345345
PRACTICALITY ISSUES
information
Minimize
✘
Highlight

345345
PRACTICALITY ISSUES
information
AnonymizeMinimize
✘
Highlight

MINIMIZATION✘
Time
minimization 2:5524:15

MINIMIZATION✘
✂
Data
minimization 2:55Time

MINIMIZATION✘
✂
Data
minimization 2:55Time
Oracle Oracle

TIME MINIMIZATION
Event log:
Environment data (streams):
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
NETWORK: {3405}<html><body>... ❙ {202}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...

TIME MINIMIZATION
Event log:
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...
Remove idle time

TIME MINIMIZATION
Event log:
KEYBOARD: {5680}hello ❙ {4056}c ❙ {300}...
FILE foo.1
POLL KEYBOARD NOK
POLL KEYBOARD OK
PULL KEYBOARD 5
POLL NETWORK OK
PULL NETWORK 1024
FILE bar.1
POLL NETWORK NOK
POLL NETWORK OK
FILE foo.2
...
PULL NETWORK 1024
FILE foo.2
POLL KEYBOARD NOK
...
Remove idle time
Remove delays

DATA MINIMIZATION
Environment data (ﬁles):
foo.1 foo.2 bar.1
Whole entities
Chunks
Atoms

DATA MINIMIZATION
foo.2 bar.1
Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr,sed
diam nonumy eirmod
tempor invidunt ut
labore et dolore magna aliquyam
erat, sed diam voluptua. At
vero
eos et accusam et justo duo
dolores et ea rebum. Stet clita
kasd gubergren, no sea takimata
sanctus est Lorem ipsum dolor
sit amet. Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr, sed diam
nonumy eirmod tempor invidunt
ut labore et dolore magna
aliquyam erat, sed diam
voluptua. At vero eos et
Whole entities
Chunks
Atoms

DATA MINIMIZATION
foo.2 bar.1
Whole entities
Chunks
Atoms

DATA MINIMIZATION
bar.1
Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr,sed
diam nonumy eirmod
tempor invidunt ut
labore et dolore magna aliquyam
erat, sed diam voluptua. At
vero
eos et accusam et justo duo
dolores et ea rebum. Stet clita
kasd gubergren, no sea takimata
sanctus est Lorem ipsum dolor
sit amet. Lorem ipsum dolor sit
amet, consetetur
Whole entities
Chunks
Atoms

DATA MINIMIZATION
bar.1
Whole entities
Chunks
Atoms

DATA MINIMIZATION
bar.1
Whole entities
Chunks
Atoms
sadipscing elitr,
eirmod invidunt
ut labore dolore magna
erat,
voluptua.

DATA MINIMIZATION
bar.1
Whole entities
Chunks
Atoms
sadipscing elitr,
eirmod invidunt
erat,
voluptua.
foo.2

DATA MINIMIZATION
Whole entities
Chunks
Atoms
sadipscing elitr,
eirmod invidunt
erat,
voluptua.
foo.2

ANALYSIS
1. Original and minimized executions produce the same failure
2. Minimized execution is not larger than the original execution
(assuming a correct oracle)
Correctness

ANALYSIS
1. Original and minimized executions produce the same failure
2. Minimized execution is not larger than the original execution
(assuming a correct oracle)
Correctness
polynomial in the size of the captured data
(assuming delta debugging)
Worst case performance

EVALUATION
Can the technique produce, in a reasonable amount
of time, minimized executions that can be used to
debug the original failure?

EVALUATION
Pine email and news client
• two real ﬁeld failures
• 20 failing executions, 10 per failure

EVALUATION
Pine email and news client
• two real ﬁeld failures
• 20 failing executions, 10 per failure
Minimized executions generated by
• randomly generating interaction scripts
• manually performing the scripts (while recording)
• minimizing the captured executions

RESULTS
Header-color fault Address book fault
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
# entities streams size files size
Averagevalueafterminimization

RESULTS
Results are likely to be conservative; recorded executions
only contain the minimal amount of data needed to perform an action.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%

RESULTS
Results are likely to be conservative; recorded executions
only contain the minimal amount of data needed to perform an action.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Inputs can be minimized in a reasonable
amount of time (less then 75 minutes)

Sensitive
input (I)
that causes F
Input domain
ANONYMIZATION

Sensitive
input (I)
that causes F
Input domain
Inputs that
cause F
ANONYMIZATION

Sensitive
input (I)
that causes F
Input domain
Inputs that
cause F
ANONYMIZATION
Anonymized
input (I’)
that also
causes F

Inputs that satisfy
F’s path condition Sensitive
input (I)
that causes F
Input domain
Inputs that
cause F
ANONYMIZATION
Anonymized
input (I’)
that also
causes F

PATH CONDITION GENERATION
Path condition: set of constraints on a program’s
inputs that encode the conditions necessary for a
speciﬁc path to be executed.

boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}

if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
5 3 0

if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
5 3 0
(sensitive)

Path Condition:
Symbolic State:
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
5 3 0
(sensitive)

Path Condition:
Symbolic State:
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
5 3 0
x→i1
y→i2
z→i3
(sensitive)

Path Condition:
i1 <= 5
Symbolic State:
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
5 3 0
x→i1
y→i2
z→i3
(sensitive)

Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
5 3 0
x→i1
y→i2
z→i3
(sensitive)

Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
5 3 0
x→i1
y→i2
z→i3
∧ i2+i1*2 > 10
(sensitive)

Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
5 3 0
x→i1
y→i2
z→i3
∧ i2+i1*2 > 10
∧ i3 == 0
(sensitive)

CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0

Constraint
Solver
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0

Constraint
Solver
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0 i1 == 5
i2 == 3
i3 == 0

Constraint
Solver
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0 i1 == 5
i2 == 3
i3 == 0
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
5 3 0

Constraint
Solver
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0
Input Constraints:
i1 != 5
∧ i2 != 3
∧ i3 != 0

Constraint
Solver
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0
Input Constraints:
i1 != 5
∧ i2 != 3
∧ i3 != 0
(breakable)

Constraint
Solver
CHOOSING ANONYMIZED
INPUTS
Path Condition:
i1 <= 5
∧ i2+i1*2 > 10
∧ i3 == 0
Input Constraints:
i1 != 5
∧ i2 != 3
∧ i3 != 0
i1 == 4
i2 == 10
i3 == 0
(breakable)

PATH CONDITION RELAXATION
Sensitive
input (I)
that causes F
Input domain

EVALUATION
Feasibility
Can the approach generate, in a
reasonable amount of time, anonymized
inputs that reproduce the failure?
Strength
How much information about the
original inputs is revealed?
Effectiveness
Are the anonymized inputs safe to send
to developers?

SUBJECTS
• Columba: 1 fault
• htmlparser: 1 fault
• Printtokens: 2 faults
• NanoXML: 16 faults
(20 faults, total)

SUBJECTS
Select sensitive failure-inducing inputs
• manually generated or included with subject
• several 100 bytes to 5MB in size
(20 faults, total)

SUBJECTS
Select sensitive failure-inducing inputs
• manually generated or included with subject
• several 100 bytes to 5MB in size
(Assume all of each input is potentially sensitive)
(20 faults, total)

RQ1: FEASIBILITY
0
150
300
450
600
ExecutionTime(s)
0
5
10
15
20
columba
htmlparser
printtokens1
printtokens2
nanoxml1
nanoxml2
nanoxml3
nanoxml4
nanoxml5
nanoxml6
nanoxml7
nanoxml8
nanoxml9
nanoxml10
nanoxml11
nanoxml12
nanoxml13
nanoxml14
nanoxml15
nanoxml16
SolverTime(s)

RQ1: FEASIBILITY
0
150
300
450
600
ExecutionTime(s)
0
5
10
15
20
columba
htmlparser
printtokens1
printtokens2
nanoxml1
nanoxml2
nanoxml3
nanoxml4
nanoxml5
nanoxml6
nanoxml7
nanoxml8
nanoxml9
nanoxml10
nanoxml11
nanoxml12
nanoxml13
nanoxml14
nanoxml15
nanoxml16
SolverTime(s)
Inputs can be anonymized in a reasonable
amount of time (easily done overnight)

Average % Bits Revealed Average % Residue
RQ2: STRENGTH

RQ2: STRENGTH
Measures how many inputs
that satisfy the path
condition
Little
information revealed

RQ2: STRENGTH
condition
Lots of

RQ2: STRENGTH
condition
Measures how much of the
anonymized input is identical
to the original input
AAAAAA
secret
AAAAAA
...
AAAAAA
BBBBBB
secret
BBBBBB
...
BBBBBB
I’
Lots of
I

RQ2: STRENGTH
0
25
50
75
100
0
25
50
75
100
columba
htmlparser
printtokens1
printtokens2
nanoxml1
nanoxml2
nanoxml3
nanoxml4
nanoxml5
nanoxml6
nanoxml7
nanoxml8
nanoxml9
nanoxml10
nanoxml11
nanoxml12
nanoxml13
nenoxml14
nanoxml15
nanoxml16
Average
%BitsRevealed
Average
%Residue

RQ2: STRENGTH
0
25
50
75
100
0
25
50
75
100
columba
htmlparser
printtokens1
printtokens2
nanoxml1
nanoxml2
nanoxml3
nanoxml4
nanoxml5
nanoxml6
nanoxml7
nanoxml8
nanoxml9
nanoxml10
nanoxml11
nanoxml12
nanoxml13
nenoxml14
nanoxml15
nanoxml16
Average
%BitsRevealed
Average
%Residue
Anonymized inputs reveal, on average, between
60% (worst case) and 2% (best case) of the
information in the original inputs

RQ3: EFFECTIVENESS
HTMLPARSER
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/
xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>james clause @ gatech | home</title>
<style type="text/css" media="screen" title="">
<![CDATA[
</style>
</head>
<body>
...
</body>

RQ3: EFFECTIVENESS
HTMLPARSER
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/
xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>james clause @ gatech | home</title>
<style type="text/css" media="screen" title="">
<![CDATA[
</style>
</head>
<body>
...
</body>
The portions of the inputs that remain after
anonymization tend to be structural in nature and
therefore are safe to send to developers

Foo
512B
Bar
1KB
Baz
1.5GB
OVERVIEW

1 Taint inputs
Foo
512B
Bar
1KB
Baz
1.5GB
OVERVIEW

1 Taint inputs
Foo
512B
Bar
1KB
Baz
1.5GB
OVERVIEW
1
2
3
4
5
6
7
8
9
0

1 Taint inputs
2 Propagate
taint marks
Foo
512B
Bar
1KB
Baz
1.5GB
OVERVIEW
1
2
3
4
5
6
7
8
9
0

1 Taint inputs
2 Propagate
taint marks
Foo
512B
Bar
1KB
Baz
1.5GB
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
OVERVIEW
1
2
3
4
5
6
7
8
9
0

1 Taint inputs
2 Propagate
taint marks
3 Identify
relevant inputs
Foo
512B
Bar
1KB
Baz
1.5GB
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
OVERVIEW
1
2
3
4
5
6
7
8
9
0

EVALUATION
Study 1: Effectiveness for debugging real failures
Study 2: Comparison with Delta Debugging

EVALUATION
Application KLoC Fault location
bc 1.06 10.5 more_arrays : 177
gzip 1.24 6.3 get_istat : 828
ncompress 4.24 1.4 comprexx : 896
pine 4.44 239.1 rfc822_cat : 260
squid 2.3 69.9 ftpBuildTitleUrl : 1024
Subjects:

EVALUATION
Application KLoC Fault location
bc 1.06 10.5 more_arrays : 177
gzip 1.24 6.3 get_istat : 828
ncompress 4.24 1.4 comprexx : 896
pine 4.44 239.1 rfc822_cat : 260
squid 2.3 69.9 ftpBuildTitleUrl : 1024
Subjects:
We selected a failure-revealing input vector for each subject.

STUDY 1: EFFECTIVENESS
Is the information that
Penumbra provides helpful for
debugging real failures?

STUDY 1 RESULTS: GZIP & NCOMPRESS
Crash when a ﬁle name is longer than 1,024 characters.

Contents
&
Attributes
Contents
&
Attributes
bar
Contents
&
Attributes
foo./gzip
# Inputs: 10,000,056
long
filename[ ]

Contents
&
Attributes
Contents
&
Attributes
bar
Contents
&
Attributes
foo./gzip
# Inputs: 10,000,056 # Relevant (DF): 1
long
filename[ ]

Contents
&
Attributes
Contents
&
Attributes
bar
Contents
&
Attributes
foo./gzip
# Relevant (DF + CF): 3
# Inputs: 10,000,056 # Relevant (DF): 1
long
filename[ ]

STUDY 1: CONCLUSIONS
1. Data-flow propagation is always effective,
data- and control-flow propagation is sometimes effective.
➡ Use data-flow propagation first then, if necessary, use
control-flow propagation.
2. Highlighted inputs correspond to the failure conditions.
➡ Our technique is effective in assisting the debugging of
real failures.

STUDY 2: COMPARISON WITH
DELTA DEBUGGING
RQ1: How much manual effort
does each technique require?
RQ2: How long does it take to
ﬁx a considered failure given
the information provided by
each technique?

RQ1: MANUAL EFFORT
Use setup-time as a proxy for manual (developer) effort.

RQ1: MANUAL EFFORT
5,400
12,600
1,8001,800
1259731470163
ncompress bc pine
Setup-time(s)
gzip
Penumbra
Delta Debugging
squid

RQ1: MANUAL EFFORT
5,400
12,600
1,8001,800
1259731470163
ncompress bc pine
Setup-time(s)
gzip
Penumbra
Delta Debugging
squid
Penumbra requires considerably less setup time than Delta Debugging
(although more time time overall for gzip and ncompress).

RQ2: DEBUGGING EFFORT
Use number of relevant inputs as a proxy for debugging effort.

Subject PenumbraPenumbra Delta Debugging
DF DF + CF
bc 209 743 285
gzip 1 3 1
ncompress 1 3 1
pine 26 15,100,344 90
squid 89 2,056 —

DF DF + CF
bc 209 743 285
gzip 1 3 1
ncompress 1 3 1
pine 26 15,100,344 90
squid 89 2,056 —
• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.

DF DF + CF
bc 209 743 285
gzip 1 3 1
ncompress 1 3 1
pine 26 15,100,344 90
squid 89 2,056 —
• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.
• Penumbra (DF + CF) is likely less effective for bc, pine, and squid

CONCLUSIONS
Program analysis techniques can enable and
support the debugging of failures in widely-used
applications by:
1) capturing, replaying, minimizing, and, as much
as possible, anonymizing failing executions
2) highlighting subsets of failure-inducing inputs
that are likely to be helpful for debugging
such failures

Enabling and Supporting the Debugging of Software Failures (PhD Defense)

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (8)

Destacado

Destacado (20)

Similar a Enabling and Supporting the Debugging of Software Failures (PhD Defense)

Similar a Enabling and Supporting the Debugging of Software Failures (PhD Defense) (20)

Más de James Clause

Más de James Clause (14)

Último

Último (20)

Enabling and Supporting the Debugging of Software Failures (PhD Defense)