2. DEFINITIONS
‣ mistake: a human action that
produces an incorrect result
‣ fault: an incorrect step,
process, or data definition in
a computer program
‣ failure: the inability of a
system or component to
perform its required
functions within specified
requirements
Debugging
4. • “...departments tend to spend about half of their applications staff time on
maintenance” – Lientz and Swanson, 1981
DEBUGGING IS EXPENSIVE
5. • “...departments tend to spend about half of their applications staff time on
maintenance” – Lientz and Swanson, 1981
• “Boehm, Brooks, Myers, andYourdon and Constantine indicate that testing
and debugging alone represent approximately half the cost of new system
development.” –Vessey, 1985
DEBUGGING IS EXPENSIVE
6. • “...departments tend to spend about half of their applications staff time on
maintenance” – Lientz and Swanson, 1981
• “Boehm, Brooks, Myers, andYourdon and Constantine indicate that testing
and debugging alone represent approximately half the cost of new system
development.” –Vessey, 1985
• “According to an informal industry poll, 85 to 90 percent of the IS
[Information Services] budget goes to legacy system operation and
maintenance.” – Erlikh, 2000
DEBUGGING IS EXPENSIVE
7. • “...departments tend to spend about half of their applications staff time on
maintenance” – Lientz and Swanson, 1981
• “Boehm, Brooks, Myers, andYourdon and Constantine indicate that testing
and debugging alone represent approximately half the cost of new system
development.” –Vessey, 1985
• “According to an informal industry poll, 85 to 90 percent of the IS
[Information Services] budget goes to legacy system operation and
maintenance.” – Erlikh, 2000
• “...the national annual costs of an inadequate infrastructure for software
testing is estimated to range from $22.2 to $59.5 billion” – NIST, 2002
DEBUGGING IS EXPENSIVE
8. • “...departments tend to spend about half of their applications staff time on
maintenance” – Lientz and Swanson, 1981
• “Boehm, Brooks, Myers, andYourdon and Constantine indicate that testing
and debugging alone represent approximately half the cost of new system
development.” –Vessey, 1985
• “According to an informal industry poll, 85 to 90 percent of the IS
[Information Services] budget goes to legacy system operation and
maintenance.” – Erlikh, 2000
• “...the national annual costs of an inadequate infrastructure for software
testing is estimated to range from $22.2 to $59.5 billion” – NIST, 2002
• “24,191 people … were involved in either opening, handling, commenting
on, or resolving WindowsVista bugs. That is an order of magnitude greater
than the ∼2,000 developers who wrote code forVista” – Guo, 2010
DEBUGGING IS EXPENSIVE
9. THESIS STATEMENT
Program analysis techniques can enable and
support the debugging of failures in widely-used
applications by:
1) capturing, replaying, minimizing, and, as much
as possible, anonymizing failing executions
2) highlighting subsets of failure-inducing inputs
that are likely to be helpful for debugging
such failures
27. LIMITATIONS
• May not be enough space to store accessed data
• databases
• long running executions
Not applicable in every situation
28. LIMITATIONS
• May not be enough space to store accessed data
• databases
• long running executions
• May have unacceptable runtime overhead
• webservers, real-time applications
Not applicable in every situation
29. LIMITATIONS
• May not be enough space to store accessed data
• databases
• long running executions
• May have unacceptable runtime overhead
• webservers, real-time applications
Not applicable in every situation
Evaluation demonstrates that it can be useful
for some common application types.
31. EVALUATION
Prototype implementation:
• maps libc function calls to
interaction events
Subjects:
• several cpu intensive applications
(e.g., bzip, gcc)
Results:
• negligible overheads
• data size is acceptable
• all failures successfully replayed
49. DATA MINIMIZATION
Environment data (files):
foo.2 bar.1
Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr,sed
diam nonumy eirmod
tempor invidunt ut
labore et dolore magna aliquyam
erat, sed diam voluptua. At
vero
eos et accusam et justo duo
dolores et ea rebum. Stet clita
kasd gubergren, no sea takimata
sanctus est Lorem ipsum dolor
sit amet. Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr, sed diam
nonumy eirmod tempor invidunt
ut labore et dolore magna
aliquyam erat, sed diam
voluptua. At vero eos et
Whole entities
Chunks
Atoms
50. DATA MINIMIZATION
Environment data (files):
foo.2 bar.1
Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr,sed
diam nonumy eirmod
tempor invidunt ut
labore et dolore magna aliquyam
erat, sed diam voluptua. At
vero
eos et accusam et justo duo
dolores et ea rebum. Stet clita
kasd gubergren, no sea takimata
sanctus est Lorem ipsum dolor
sit amet. Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr, sed diam
nonumy eirmod tempor invidunt
ut labore et dolore magna
aliquyam erat, sed diam
voluptua. At vero eos et
Whole entities
Chunks
Atoms
52. DATA MINIMIZATION
Environment data (files):
bar.1
Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr,sed
diam nonumy eirmod
tempor invidunt ut
labore et dolore magna aliquyam
erat, sed diam voluptua. At
vero
eos et accusam et justo duo
dolores et ea rebum. Stet clita
kasd gubergren, no sea takimata
sanctus est Lorem ipsum dolor
sit amet. Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr, sed diam
nonumy eirmod tempor invidunt
ut labore et dolore magna
aliquyam erat, sed diam
voluptua. At vero eos et
Whole entities
Chunks
Atoms
53. DATA MINIMIZATION
Environment data (files):
bar.1
Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr,sed
diam nonumy eirmod
tempor invidunt ut
labore et dolore magna aliquyam
erat, sed diam voluptua. At
vero
eos et accusam et justo duo
dolores et ea rebum. Stet clita
kasd gubergren, no sea takimata
sanctus est Lorem ipsum dolor
sit amet. Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr, sed diam
nonumy eirmod tempor invidunt
ut labore et dolore magna
aliquyam erat, sed diam
voluptua. At vero eos et
Whole entities
Chunks
Atoms
54. DATA MINIMIZATION
Environment data (files):
bar.1
Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr,sed
diam nonumy eirmod
tempor invidunt ut
labore et dolore magna aliquyam
erat, sed diam voluptua. At
vero
eos et accusam et justo duo
dolores et ea rebum. Stet clita
kasd gubergren, no sea takimata
sanctus est Lorem ipsum dolor
sit amet. Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr, sed diam
nonumy eirmod tempor invidunt
ut labore et dolore magna
aliquyam erat, sed diam
voluptua. At vero eos et
Whole entities
Chunks
Atoms
55. DATA MINIMIZATION
Environment data (files):
bar.1
Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr,sed
diam nonumy eirmod
tempor invidunt ut
labore et dolore magna aliquyam
erat, sed diam voluptua. At
vero
eos et accusam et justo duo
dolores et ea rebum. Stet clita
kasd gubergren, no sea takimata
sanctus est Lorem ipsum dolor
sit amet. Lorem ipsum dolor sit
amet, consetetur
sadipscing elitr, sed diam
nonumy eirmod tempor invidunt
ut labore et dolore magna
aliquyam erat, sed diam
voluptua. At vero eos et
Whole entities
Chunks
Atoms
62. ANALYSIS
1. Original and minimized executions produce the same failure
2. Minimized execution is not larger than the original execution
(assuming a correct oracle)
Correctness
63. ANALYSIS
1. Original and minimized executions produce the same failure
2. Minimized execution is not larger than the original execution
(assuming a correct oracle)
Correctness
polynomial in the size of the captured data
(assuming delta debugging)
Worst case performance
64. EVALUATION
Can the technique produce, in a reasonable amount
of time, minimized executions that can be used to
debug the original failure?
65. EVALUATION
Can the technique produce, in a reasonable amount
of time, minimized executions that can be used to
debug the original failure?
Pine email and news client
• two real field failures
• 20 failing executions, 10 per failure
66. EVALUATION
Can the technique produce, in a reasonable amount
of time, minimized executions that can be used to
debug the original failure?
Pine email and news client
• two real field failures
• 20 failing executions, 10 per failure
Minimized executions generated by
• randomly generating interaction scripts
• manually performing the scripts (while recording)
• minimizing the captured executions
68. RESULTS
Header-color fault Address book fault
Results are likely to be conservative; recorded executions
only contain the minimal amount of data needed to perform an action.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
# entities streams size files size
Averagevalueafterminimization
69. RESULTS
Header-color fault Address book fault
Results are likely to be conservative; recorded executions
only contain the minimal amount of data needed to perform an action.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
# entities streams size files size
Averagevalueafterminimization
Inputs can be minimized in a reasonable
amount of time (less then 75 minutes)
74. Inputs that satisfy
F’s path condition Sensitive
input (I)
that causes F
Input domain
Inputs that
cause F
ANONYMIZATION
Anonymized
input (I’)
that also
causes F
75. PATH CONDITION GENERATION
Path condition: set of constraints on a program’s
inputs that encode the conditions necessary for a
specific path to be executed.
76. boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
77. boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
78. boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
79. boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
(sensitive)
80. Path Condition:
Symbolic State:
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
(sensitive)
81. Path Condition:
Symbolic State:
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
(sensitive)
82. Path Condition:
Symbolic State:
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
(sensitive)
83. Path Condition:
i1 <= 5
Symbolic State:
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
(sensitive)
84. Path Condition:
i1 <= 5
Symbolic State:
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
(sensitive)
85. Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
(sensitive)
86. Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
(sensitive)
87. Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
∧ i2+i1*2 > 10
(sensitive)
88. Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
∧ i2+i1*2 > 10
(sensitive)
89. Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
PATH CONDITION GENERATION
5 3 0
x→i1
y→i2
z→i3
∧ i2+i1*2 > 10
∧ i3 == 0
(sensitive)
105. EVALUATION
Feasibility
Can the approach generate, in a
reasonable amount of time, anonymized
inputs that reproduce the failure?
Strength
How much information about the
original inputs is revealed?
Effectiveness
Are the anonymized inputs safe to send
to developers?
107. SUBJECTS
• Columba: 1 fault
• htmlparser: 1 fault
• Printtokens: 2 faults
• NanoXML: 16 faults
Select sensitive failure-inducing inputs
• manually generated or included with subject
• several 100 bytes to 5MB in size
(20 faults, total)
108. SUBJECTS
• Columba: 1 fault
• htmlparser: 1 fault
• Printtokens: 2 faults
• NanoXML: 16 faults
Select sensitive failure-inducing inputs
• manually generated or included with subject
• several 100 bytes to 5MB in size
(Assume all of each input is potentially sensitive)
(20 faults, total)
111. Average % Bits Revealed Average % Residue
RQ2: STRENGTH
112. Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs
that satisfy the path
condition
Little
information revealed
113. Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs
that satisfy the path
condition
Lots of
information revealed
114. Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs
that satisfy the path
condition
Measures how much of the
anonymized input is identical
to the original input
AAAAAA
secret
AAAAAA
...
AAAAAA
BBBBBB
secret
BBBBBB
...
BBBBBB
I’
Lots of
information revealed
I
115. Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs
that satisfy the path
condition
Measures how much of the
anonymized input is identical
to the original input
AAAAAA
secret
AAAAAA
...
AAAAAA
BBBBBB
secret
BBBBBB
...
BBBBBB
I’
Lots of
information revealed
I
120. RQ3: EFFECTIVENESS
HTMLPARSER
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/
xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>james clause @ gatech | home</title>
<style type="text/css" media="screen" title="">
<!--/*--><![CDATA[<!--*/
body {
margin: 0px;
...
/*]]>*/-->
</style>
</head>
<body>
...
</body>
The portions of the inputs that remain after
anonymization tend to be structural in nature and
therefore are safe to send to developers
130. EVALUATION
Study 1: Effectiveness for debugging real failures
Study 2: Comparison with Delta Debugging
Application KLoC Fault location
bc 1.06 10.5 more_arrays : 177
gzip 1.24 6.3 get_istat : 828
ncompress 4.24 1.4 comprexx : 896
pine 4.44 239.1 rfc822_cat : 260
squid 2.3 69.9 ftpBuildTitleUrl : 1024
Subjects:
131. EVALUATION
Study 1: Effectiveness for debugging real failures
Study 2: Comparison with Delta Debugging
Application KLoC Fault location
bc 1.06 10.5 more_arrays : 177
gzip 1.24 6.3 get_istat : 828
ncompress 4.24 1.4 comprexx : 896
pine 4.44 239.1 rfc822_cat : 260
squid 2.3 69.9 ftpBuildTitleUrl : 1024
Subjects:
We selected a failure-revealing input vector for each subject.
132. STUDY 1: EFFECTIVENESS
Is the information that
Penumbra provides helpful for
debugging real failures?
133. STUDY 1 RESULTS: GZIP & NCOMPRESS
Crash when a file name is longer than 1,024 characters.
134. STUDY 1 RESULTS: GZIP & NCOMPRESS
Contents
&
Attributes
Contents
&
Attributes
bar
Contents
&
Attributes
foo./gzip
Crash when a file name is longer than 1,024 characters.
# Inputs: 10,000,056
long
filename[ ]
135. STUDY 1 RESULTS: GZIP & NCOMPRESS
Contents
&
Attributes
Contents
&
Attributes
bar
Contents
&
Attributes
foo./gzip
Crash when a file name is longer than 1,024 characters.
# Inputs: 10,000,056 # Relevant (DF): 1
long
filename[ ]
136. STUDY 1 RESULTS: GZIP & NCOMPRESS
Contents
&
Attributes
Contents
&
Attributes
bar
Contents
&
Attributes
foo./gzip
Crash when a file name is longer than 1,024 characters.
# Relevant (DF + CF): 3
# Inputs: 10,000,056 # Relevant (DF): 1
long
filename[ ]
137. STUDY 1: CONCLUSIONS
1. Data-flow propagation is always effective,
data- and control-flow propagation is sometimes effective.
➡ Use data-flow propagation first then, if necessary, use
control-flow propagation.
2. Highlighted inputs correspond to the failure conditions.
➡ Our technique is effective in assisting the debugging of
real failures.
138. STUDY 2: COMPARISON WITH
DELTA DEBUGGING
RQ1: How much manual effort
does each technique require?
RQ2: How long does it take to
fix a considered failure given
the information provided by
each technique?
140. RQ1: MANUAL EFFORT
Use setup-time as a proxy for manual (developer) effort.
5,400
12,600
1,8001,800
1259731470163
ncompress bc pine
Setup-time(s)
gzip
Penumbra
Delta Debugging
squid
141. RQ1: MANUAL EFFORT
Use setup-time as a proxy for manual (developer) effort.
5,400
12,600
1,8001,800
1259731470163
ncompress bc pine
Setup-time(s)
gzip
Penumbra
Delta Debugging
squid
142. RQ1: MANUAL EFFORT
Use setup-time as a proxy for manual (developer) effort.
5,400
12,600
1,8001,800
1259731470163
ncompress bc pine
Setup-time(s)
gzip
Penumbra
Delta Debugging
squid
143. RQ1: MANUAL EFFORT
Use setup-time as a proxy for manual (developer) effort.
5,400
12,600
1,8001,800
1259731470163
ncompress bc pine
Setup-time(s)
gzip
Penumbra
Delta Debugging
squid
Penumbra requires considerably less setup time than Delta Debugging
(although more time time overall for gzip and ncompress).
145. RQ2: DEBUGGING EFFORT
Subject PenumbraPenumbra Delta Debugging
DF DF + CF
bc 209 743 285
gzip 1 3 1
ncompress 1 3 1
pine 26 15,100,344 90
squid 89 2,056 —
Use number of relevant inputs as a proxy for debugging effort.
146. RQ2: DEBUGGING EFFORT
Subject PenumbraPenumbra Delta Debugging
DF DF + CF
bc 209 743 285
gzip 1 3 1
ncompress 1 3 1
pine 26 15,100,344 90
squid 89 2,056 —
Use number of relevant inputs as a proxy for debugging effort.
• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.
147. RQ2: DEBUGGING EFFORT
Subject PenumbraPenumbra Delta Debugging
DF DF + CF
bc 209 743 285
gzip 1 3 1
ncompress 1 3 1
pine 26 15,100,344 90
squid 89 2,056 —
Use number of relevant inputs as a proxy for debugging effort.
• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.
• Penumbra (DF + CF) is likely less effective for bc, pine, and squid
148. CONCLUSIONS
Program analysis techniques can enable and
support the debugging of failures in widely-used
applications by:
1) capturing, replaying, minimizing, and, as much
as possible, anonymizing failing executions
2) highlighting subsets of failure-inducing inputs
that are likely to be helpful for debugging
such failures