The document discusses different techniques for instrumenting software systems to collect data for analysis purposes. It describes intrusive instrumentation, which involves modifying the original source code by inserting code to log or collect data. This can be done through logging frameworks. The document also covers guarded and unguarded implementations of intrusive instrumentation and discusses their pros and cons. Finally, it introduces the idea of a proxy implementation of instrumentation.
Instrument Software Systems with Dynamic Binary Instrumentation
1. Instrumentation
of
Software
Systems
James
H.
Hill,
M.S.,
Ph.D.
Department
of
Computer
&
Informa;on
Science
Indiana
University-‐Purdue
University
Indianapolis
Email:
hillj@cs.iupui.edu
Web:
hHp://www.cs.iupui.edu/~hillj
1
2. Learning
Objectives
• Understand
the
different
flavors
of
soMware
instrumenta;on
• Understand
how
dynamic
binary
instrumenta;on
works
• Understand
how
to
write
a
dynamic
binary
instrumenta;on
tool
in
Pin
and
Pin++
2
4. Software
Instrumentation
• Data
collec)on
is
the
process
of
preparing
and
collec;ng
data.
Its
purpose
is
to
obtain
informa;on
to
keep
on
record,
to
make
decisions
about
important
issues,
and
to
pass
informa;on
on
to
others.
–
Wikipedia
• So,ware
instrumenta)on
is
the
primary
method
for
collec;ng
data
in
soMware
systems
• Data
collected
while
instrumen;ng
a
system
can
be
used
to
analyze
faults,
performance
issues,
understand
system
behavior,
&
etc…
4
5. On
Data
Storage…
• When you collect data, you must store it somewhere. These are
the different methods for storing data, and the pros/cons
associated with each method
Name
Method
Pros
Cons
In
memory
storage
Data
is
stored
in
local
memory
while
test
executes
&
dumped
at
end
of
test
execu;on
• Tries
to
minimize
impact
on
behavior
&
performance
• Must
account
for
memory
allocated
to
system
Local
Disk
Data
is
stored
on
local
disk
while
the
test
is
execu;ng
• Persistent
storage
while
test
is
execu;ng
• Requires
context
switch
from
test
execu;on
to
write
data
to
disk
Remote
Storage
Data
is
collected
periodically
locally
and
transmiHed
to
a
central
loca;on
• Persistent
storage
while
test
is
execu;ng
• Supports
real-‐;me
analysis
of
data
• Can
impact
network
behavior
&
performance
5
6. Typical
Deployment
Host
1
Host
3
Host
2
Logging
Client
Logging
Client
Logging
Client
6
Logging
Server
DB
8. Intrusive
Instrumentation
• Intrusive
instrumenta)on
is
the
process
of
modifying
the
original
source
code
by
inser;ng
chunks
of
source
code
to
collect
data
for
analy;cal
purposes
• Best
supported
by
logging
frameworks
8
13. Unguarded
Implementation
class SC {!
// The logger!
static private Logger log = Logger.getLogger (SC.class);!
!
private int counter_;!
!
public void doSomething () {!
log.i (“Counter: “ + this.counter_);!
}!
}!
• Unguarded
implementa)on
is
when
you
insert
instrumenta;on
code
directly
into
source
code
without
any
restric;ons/constraints
on
when/how
it
is
executed
13
14. Unguarded
Implementation
class SC {!
// The logger!
static private Logger log = Logger.getLogger (SC.class);!
!
private int counter_;!
!
public void doSomething () {!
log.i (“Counter: “ + this.counter_);!
}!
}!
Instrumenta;on
code
• Unguarded
implementa)on
is
when
you
insert
instrumenta;on
code
directly
into
source
code
without
any
restric;ons/constraints
on
when/how
it
is
executed
14
15. Unguarded
Implementation
class SC {!
// The logger!
static private Logger log = Logger.getLogger (SC.class);!
!
private int counter_;!
!
public void doSomething () {!
log.i (“Counter: “ + this.counter_);!
}!
}!
Pro.
Directly
collect
data
of
interest
15
16. Unguarded
Implementation
class SC {!
// The logger!
static private Logger log = Logger.getLogger (SC.class);!
!
private int counter_;!
!
public void doSomething () {!
log.i (“Counter: “ + this.counter_);!
}!
}!
Con.
Overhead
associated
with
collec;ng
data
each
;me—
especially
if
call
rate
is
high
16
17. Unguarded
Implementation
class SC {!
// The logger!
static private Logger log = Logger.getLogger (SC.class);!
!
private int counter_;!
!
public void doSomething () {!
log.i (“Counter: “ + this.counter_);!
}!
}!
Con.
Must
construct
log
message,
even
if
log
severity
level
is
less
than
current
log
message
17
18. Guarded
Implementation
class SC {!
// The logger!
static private Logger log = Logger.getLogger (SC.class);!
!
private int counter_;!
!
public void doSomething () {!
if (INSTRUMENT)!
log.i (“Counter: “ + this.counter_);!
}!
}!
• Guarded
implementa)on
is
when
you
insert
instrumenta;on
code
directly
into
source
code,
but
place
restric;ons/constraints
on
when/how
it
is
executed
18
19. Guarded
Implementation
class SC {!
// The logger!
static private Logger log = Logger.getLogger (SC.class);!
!
private int counter_;!
Guard
!
public void doSomething () {!
if (INSTRUMENT)!
log.i (“Counter: “ + this.counter_);!
}!
}!
Instrumenta;on
code
• Guarded
implementa)on
is
when
you
insert
instrumenta;on
code
directly
into
source
code,
but
place
restric;ons/constraints
on
when/how
it
is
executed
19
20. Guarded
Implementation
class SC {!
// The logger!
Pro.
You
can
control
static private Logger log = Logger.getLogger (SC.class);!
when
and
how
oMen
!
you
collect
data
private int counter_;!
!
public void doSomething () {!
if (INSTRUMENT)!
log.i (“Counter: “ + this.counter_);!
}!
}!
20
21. Guarded
Implementation
class SC {!
// The logger!
static private Logger log = Logger.getLogger (SC.class);!
!
private int counter_;!
!
public void doSomething () {!
if (INSTRUMENT)!
log.i (“Counter: “ + this.counter_);!
}!
}!
Pro.
Addresses
problem
of
construc;ng
message
unnecessarily
21
22. Guarded
Implementation
class SC {!
// The logger!
static private Logger log = Logger.getLogger (SC.class);!
!
private int counter_;!
!
public void doSomething () {!
if (INSTRUMENT)!
log.i (“Counter: “ + this.counter_);!
}!
}!
Con.
Requires
an
extra
check
each
;me
this
part
of
the
code
is
reached
22
23. Proxy
Implementation
class SC_Proxy implements SC_Interface {!
// The logger!
private SC sc_impl_;!
!
public SC_Proxy (SC sc_impl) {!
this.sc_impl_ = sc_impl;!
}!
!
public void doSomething () {!
log.i (“Counter: “ + this.sc_impl_.getCounter ());!
this.sc_impl_.doSomething ();!
}!
}!
• Proxy
implementa)on
is
when
you
place
instrumenta;on
code
in
a
proxy
object,
and
proxy
calls
real
implementa;on
23
24. Proxy
Implementation
class SC_Proxy implements SC_Interface {!
// The logger!
private SC sc_impl_;!
Instrumenta;on
object
!
public SC_Proxy (SC sc_impl) {!
this.sc_impl_ = sc_impl;!
}!
Instrumenta;on
code
!
public void doSomething () {!
log.i (“Counter: “ + this.sc_impl_.getCounter ());!
this.sc_impl_.doSomething ();!
}!
}!
• Proxy
implementa)on
is
when
you
place
instrumenta;on
code
in
a
proxy
object,
and
proxy
calls
real
implementa;on
24
25. Proxy
Implementation
Client
Impl_Interface
Inst_Proxy
Impl
• Proxy
implementa)on
is
when
you
place
instrumenta;on
code
in
a
proxy
object,
and
proxy
calls
real
implementa;on
25
26. Proxy
Implementation
class SC_Proxy implements SC_Interface {!
// The logger!
private SC sc_impl_;!
!
public SC_Proxy (SC sc_impl) {!
this.sc_impl_ = sc_impl;!
}!
!
public void doSomething () {!
log.i (“Counter: “ + this.sc_impl_.getCounter ());!
this.sc_impl_.doSomething ();!
}!
Pro.
There
is
no
more
check
}!
before
instrumenta;on
code
26
27. Proxy
Implementation
class SC_Proxy implements SC_Interface {!
// The logger!
private SC sc_impl_;!
!
public SC_Proxy (SC sc_impl) {!
this.sc_impl_ = sc_impl;!
}!
!
public void doSomething () {!
log.i (“Counter: “ + this.sc_impl_.getCounter ());!
this.sc_impl_.doSomething ();!
}!
Pro.
Instrumenta;on
code
is
not
}!
mixed
with
implementa;on
27
28. Proxy
Implementation
Pro.
Instrumenta;on
can
be
removed
at
run;me,
and
replaced
with
real
object
Client
Impl_Interface
Inst_Proxy
Impl
28
29. Proxy
Implementation
Client
Impl_Interface
Inst_Proxy
Con.
Requires
more
code
to
realize
the
concept.
Impl
29
30. On
Aspects…
• Aspect-‐oriented
programming
(AOP)
is
a
programming
paradigm
that
aims
to
increase
modularity
by
allowing
the
separa;on
of
cross-‐cukng
concerns.
AOP
forms
a
basis
for
aspect-‐oriented
soMware
development
–
Wikipedia
• Aspects
can
be
used
to
weave
instrumenta;on
code
and
achieve
results
similar
to
the
proxy
implementa;on
approach
• Best
used
to
support
generic
instrumenta;on
concerns
(i.e.,
func;on
entry/exit)
instead
of
custom
instrumenta;on
concerns
(i.e.,
capturing
the
value
of
a
counter)
30
32. Problems
w/
Intrusive
Instrumentation
• Regardless
of
the
intrusive
instrumenta;on
technique
used,
you
need
the
source
code
in
order
to
instrument
the
system
• You
MUST
understand
the
source
code
• Intrusive
instrumenta;on
requires
a
LOT
of
upfront
planning
to
ensure
that
instrumenta;on
code
is
implemented
as
part
of
the
normal
system’s
implementa;on
• Requirements
defining
what
data
needs
to
be
captured
• Design
that
integrates
instrumenta;on
integrates
into
overall
system
• Part
of
the
normal
development
process,
not
an
aMerthought
• Hard
to
remove/modify
when
you
no
longer
need
to
instrumenta;on
and/or
requirements
change
32
33. Non-‐Intrusive
Instrumentation
• Non-‐intrusive
instrumenta)on
is
the
process
of
collec;ng
data
from
soMware
for
analy;cal
purposes
without
modifying
the
original
source
code
33
34. Dynamic
Binary
Instrumentation
• Dynamic
Binary
Instrumenta)on
(DBI)
is
a
form
of
non-‐
intrusive
soMware
instrumenta;on
where
instrumenta;on
code
is
injected
into
a
binary
executable
at
run;me
• Monitor
both
applica;on-‐
and
system-‐level
behavior
• E.g.,
resource
usage,
system
calls,
mul;-‐threading
behavior,
branching,
&
etc
Original Binary
Binary with Injected
Instrumentation Code
34
35. DBI
Pros
&
Cons
Pros
• Instrumenta;on
needs
do
not
have
to
be
;ghtly
integrated
with
development
process
• Does
not
require
original
source
code
• Add/remove
instrumenta;on
code
from
binary
on-‐demand
• Dynamically
change
behavior
as
needed
without
modifying
original
source
code
Cons
• Some
DBI
tools
are
virtual
machines
&
can
add
unwanted
overhead
• You
are
limited
to
the
data
points
provided
by
the
DBI
tool
35
36. DBI
Tools
Popular
DBI
tools:
• Pin
–
allow
developers
to
write
Pintools,
which
are
user-‐defined
libraries
wriHen
in
C++,
for
analyzing
user-‐specific
concerns.
[www.pintool.org]
• Solaris
Dynamic
Tracing
(DTrace)
–
primarily
of
Solaris
OS,
but
has
ports
to
Unix-‐like
OS’s,
and
uses
a
scrip;ng
language
for
analyzing
user-‐specific
concerns.
[www.dtrace.org]
•
DynamoRIO
–
Similar
to
Pin,
and
been
used
for
security,
debugging,
and
analysis
tools.
[dynamorio.org]
• DynInst
–
is
a
mul;-‐plamorm
run;me
code-‐patching
library
that
is
useful
in
the
development
of
performance
measurement
tools,
debuggers,
and
simulators
[www.dyninst.org]
36
37. Examples
of
Pintools
• Intel
Parallel
Studio
–
memory
debugging,
performance
analysis,
mul;threading
correctness
analysis
and
paralleliza;on
prepara;on
• Intel
So,ware
Development
Emulator
–
enables
the
development
of
applica;ons
using
instruc;on
set
extensions
that
are
not
currently
implemented
in
hardware
• CMP$IM
–
cache
profiler
built
using
pin
• PinPlay
–
capture
and
determinis;c
replay
of
the
running
of
mul;threaded
programs
under
pin.
Capturing
the
running
of
a
program
helps
developers
overcome
the
non-‐determinism
inherent
in
mul;threading
37
45. On
Callback
Arguments
VOID docount() { icount++; }
VOID Instruction(INS ins, VOID *v) {
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END);
}
The
insert
call
must
specify
the
number
of
arguments
&
type
of
each
argument
expected
by
the
analysis
callback
• The
IARGLIST
must
always
end
with
IARG_END
• Supports
tool
&
run;me
argument
types
• Complete
list
of
arguments
types
can
be
found
at
hHp://soMware.intel.com/sites/landingpage/pintool/docs/61206/
Pin/html/
group__INST__ARGS.html#g7e2c955c99fa84246bb2bce1525b5681
45
46. Levels
of
Instrumentation
• The
level
of
instrumenta;on
determines
when
and
how
oMen
an
instrument
rou;ne
is
called
• The
instrument
rou;ne
is
called
only
once
for
each
new
object
at
the
corresponding
level
it
encounters
• Course
grained
instrumenta;on
levels
have
access
to
fine-‐
grained
instrumenta;on
levels
&
visa
versa
Levels
of
Instrumenta)on
• Image
–
Executable
and
shared
library
• Rou;ne
–
Func;on
&
method
call
• Trace
–
Single
entrance,
mul;ple
exit
sequence
of
instruc;ons
• Instruc;on
–
A
mnemonic
machine
instruc;on
46
47. Execution
Modes
of
Pin
Just-‐In-‐Time
(JIT)
• Pin
generates
code
based
on
the
original
executable,
execute
generated
code,
&
regains
control
aMer
its
execu;on
for
next
set
of
instruc;ons
• Can
add
significant
amounts
of
overhead
Probed
• The
applica;on
runs
na;vely
• Analysis
probe
are
place
directly
inside
the
original
code
• Almost
no
overhead
added
from
instrumenta;on
47
55. Problems
w/
Traditional
Pintools
Previously
Stated
Problems
• Global
variables
make
it
hard
to
reuse
analysis
rou;nes
• Tool
developer
must
make
sure
that
number
of
parameters
in
callback
and
its
type
match
number
of
arguments
supplied
to
insert
method
• Tool
developer
must
remember
bootstrapping
process
Non-‐obvious
Problems
• Pin
is
responsible
for
tracking
tool-‐specific
informa;on,
which
impacts
performance
• Hard
to
see
structure
of
Pintool,
which
results
in
spaghek
code
• High
cycloma;c
complexity,
which
impacts
reuse
55
57. Pin++
• A
object-‐oriented
framework
for
wri;ng
Pintools
• Uses
design
paHerns
to
promote
reuse
and
reduce
complexity
of
Pintools
• Uses
template-‐metaprogramming
to
reduce
poten;al
development
errors
and
op;mize
the
performance
of
a
Pintool
at
compile
;me
• Designed
to
promote
reuse
of
different
key
components
in
a
Pintool
• Codifies
many
requirements
of
a
Pintool
so
developers
to
not
have
to
re-‐implement
them
for
each
and
every
tool
• e.g.,
bootstrapping,
ini;aliza;on,
registra;on,
&
etc
Public
Mirror:
hHps://github.com/hilljh82/OASIS
57
58. The
Structure
of
a
Pintool
• Callbacks
are
objects
called
when
the
Pintool
is
to
analyze
data
at
an
instrumenta;on
point
• Instruments
are
objects
called
by
Pin
when
the
Pintool
needs
to
instrument
the
element
of
interest
• Tool
is
the
object
in
a
Pintool
that
is
responsible
for
connec;ng
the
Pin
client
with
the
instrument
objects.
58
59. Implementing
Example
in
Pin++
• The
following
set
of
slides
show
how
to
implement
the
first
Pintool
we
implemented
using
the
tradi;onal
approach
using
Pin++
• See
$OASIS_ROOT/examples/pintools
for
example
Pintools
implemented
using
Pin++
59
60. Pintool
in
Pin++
:
Callbacks
class docount : public OASIS::Pin::Callback <docount (void)> {
public:
docount (void) : count_ (0) { }
void handle_analyze (void) {
++ this->count_;
}
UINT64 count (void) const { return this->count_; }
private:
UINT64 count_;
};
Counter
is
contained
to
the
callback
object
(i.e.,
not
global)
60
61. Pintool
in
Pin++
:
Callbacks
class docount : public OASIS::Pin::Callback <docount (void)> {
public:
docount (void) : count_ (0) { }
void handle_analyze (void) {
++ this->count_;
}
UINT64 count (void) const { return this->count_; }
private:
UINT64 count_;
};
All
callbacks
implement
handle_analyze
method
where
parameter
types
are
known
at
compile
;me
61
62. Pintool
in
Pin++
:
Callbacks
class docount : public OASIS::Pin::Callback <docount (void)> {
public:
docount (void) : count_ (0) { }
void handle_analyze (void) {
++ this->count_;
}
UINT64 count (void) const { return Callback
IARG_TYPEs
are
specified
this->count_; }
private:
UINT64 count_;
};
when
defining
the
callback
object
62
63. Examples
of
Pin++
Callbacks
• Callback
that
takes
1
parameter
using namespace OASIS::Pin;
class docount : public Callback <docount (ARG_INST_PTR)> {
public:
void handle_analyze (ADDRINT addr) {
// do something...
}
// ...
};
• Callback
that
takes
2
parameters
class docount : public Callback <docount (ARG_INST_PTR, ARG_CONTEXT)> {
public:
void handle_analyze (ADDRINT addr, Context & ctx) {
// do something...
}
// ...
};
63
66. Pintool
in
Pin++:
Instruments
class Instruction : public OASIS::Pin::Instruction_Instrument <Instruction> {
public:
void handle_instrument (const OASIS::Pin::Ins & ins){
ins.insert_call (IPOINT_BEFORE, &this->callback_);
}
UINT64 count (void) const { return this->callback_.count (); }
private:
docount callback_;
};
Instrument
must
implement
handle_instrument
()
method
The
parameter
type
for
this
method
depends
on
the
instrumenta;on
object
type
• Image,
Rou;ne,
Trace,
or
Ins
66
67. Pintool
in
Pin++:
Instruments
class Instruction : public OASIS::Pin::Instruction_Instrument <Instruction> {
public:
void handle_instrument (const OASIS::Pin::Ins & ins){
ins.insert_call (IPOINT_BEFORE, &this->callback_);
}
UINT64 count (void) const { return this->callback_.count (); }
private:
docount callback_;
};
Insert
call
takes
posi;on,
target
callback
object,
&
extra
arguments
for
IARG_TYPEs
67
68. Pintool
in
Pin++:
Instruments
class Instruction : public OASIS::Pin::Instruction_Instrument <Instruction> {
public:
void handle_instrument (const OASIS::Pin::Ins & ins){
ins.insert_call (IPOINT_BEFORE, &this->callback_);
}
UINT64 count (void) const { return this->callback_.count (); }
private:
docount callback_;
};
Instrument
declares
one
or
more
callback
objects
68
69. Pintool
in
Pin++:
Tool
class inscount : public OASIS::Pin::Tool <inscount> {
public:
inscount (void) {
this->register_fini_callback ();
}
void handle_fini (INT32 code) {
std::ofstream fout ("inscount.out");
fout.setf (ios::showbase);
fout << "Count " << this->instruction_.count () << std::endl;
fout.close ();
}
private:
Instruction instruction_;
};
DECLARE_PINTOOL (inscount); // Can also use DECLARE_PINTOOL_PROBED
69
70. Pintool
in
Pin++:
Tool
class inscount : public OASIS::Pin::Tool <inscount> {
public:
inscount (void) {
this->register_fini_callback ();
}
Subclass
from
the
tool
object
void handle_fini (INT32 code) {
std::ofstream fout ("inscount.out");
fout.setf (ios::showbase);
fout << "Count " << this->instruction_.count () << std::endl;
fout.close ();
}
private:
Instruction instruction_;
};
DECLARE_PINTOOL (inscount); // Can also use DECLARE_PINTOOL_PROBED
70
71. Pintool
in
Pin++:
Tool
class inscount : public OASIS::Pin::Tool <inscount> {
public:
inscount (void) {
this->register_fini_callback ();
}
void handle_fini (INT32 code) {
std::ofstream fout ("inscount.out");
fout.setf (ios::showbase);
Register
for
one
or
more
no;fica;ons
fout << "Count " << this->instruction_.count () << std::endl;
fout.close ();
}
private:
Instruction instruction_;
};
DECLARE_PINTOOL (inscount); // Can also use DECLARE_PINTOOL_PROBED
71
72. Pintool
in
Pin++:
Tool
class inscount : public OASIS::Pin::Tool <inscount> {
public:
inscount (void) {
this->register_fini_callback ();
}
void handle_fini (INT32 code) {
std::ofstream fout ("inscount.out");
fout.setf (ios::showbase);
fout << "Count " << this->instruction_.count () << std::endl;
fout.close ();
}
Implement
corresponding
no;fica;on
on
the
tool
private:
Instruction instruction_;
};
DECLARE_PINTOOL (inscount); // Can also use DECLARE_PINTOOL_PROBED
72
73. Pintool
in
Pin++:
Tool
class inscount : public OASIS::Pin::Tool <inscount> {
public:
inscount (void) {
this->register_fini_callback ();
}
void handle_fini (INT32 code) {
std::ofstream fout ("inscount.out");
fout.setf (ios::showbase);
Declare
one
or
more
instruments
for
the
fout << "Count " << this->instruction_.count () << std::endl;
tool,
registra;on
happens
automa;cally
fout.close ();
}
private:
Instruction instruction_;
};
DECLARE_PINTOOL (inscount); // Can also use DECLARE_PINTOOL_PROBED
73
74. Pintool
in
Pin++:
Tool
class inscount : public OASIS::Pin::Tool <inscount> {
public:
inscount (void) {
this->register_fini_callback ();
}
void handle_fini (INT32 code) {
std::ofstream fout ("inscount.out");
fout.setf (ios::showbase);
fout << "Count " << this->instruction_.count () << std::endl;
fout.close ();
}
Declare
Pintool
&
ini;aliza;on
happens
automa;cally
in
correct
sequence
private:
Instruction instruction_;
};
DECLARE_PINTOOL (inscount); // Can also use DECLARE_PINTOOL_PROBED
74
75. BeneSits
of
Using
Pin++
• No
more
need
for
global
variables
• LiHle
to
no
need
to
pass
data
to
callbacks
via
Pin;
Instead,
data
can
be
stored
directly
in
callbacks
• Each
component
in
Pin++
is
self-‐contained
&
easily
reusable
in
other
Pintools
• Improved
run;me
performance
• Catch
run-‐;me
errors
at
compile-‐;me
• Reduced
cycloma;c
complexity
75
76. Hands-‐on
Demonstrations
Follow
the
following
tutorials:
• hHps://github.com/hilljh82/OASIS/wiki/Compiling-‐and-‐
Installing
• hHps://github.com/hilljh82/OASIS/wiki/Crea;ng-‐a-‐Pintool-‐
using-‐Pin
76
77. Concluding
Remarks
• There
are
two
methods
of
instrumen;ng
a
soMware
system:
intrusive
&
non-‐
intrusive
• Dynamic
binary
instrumenta;on
is
a
powerful
approach
for
instrumen;ng
system
because
it
required
less
upfront
cost
• Pin++
is
a
framework
designed
to
implement
high
quality
Pintools
from
reusable
components
77