4. Instrumentation in General
Edit – Compile – Run Cycle
Compiler Run
Source Code Binary Results
Edit – Compile – Run Cycle with VampirTrace
Compiler Run
Source Code Binary Results
VT Wrapper
Traces
5. Compiler Wrappers
• Easiest way of using VampirTrace
• No source code modifications
• In the build system of your application, substitute
calls to the regular compiler with calls to the
VampirTrace compiler wrappers
– For compiling and linking
– e.g. in the makefile change icc to vtcc
• Rebuild the application
• Run the application to produce trace data
6. Instrumentation & Measurement
• What do you need to do for it?
– VampirTrace and a supported compiler
• Instrumentation (automatic with compiler wrappers)
CC = icc CC = vtcc
CXX = icpc CXX = vtcxx
F90 = ifc F90 = vtf90
MPICC = mpicc MPICC = vtcc -vt:cc mpicc
• Re-compile & re-link
• Trace Run (run with appropriate test data set)
• More details later
7. Compiler Wrappers
Compiler Wrappers
Captured events:
• All user function entries and exits
– If supported by the compiler (Intel, GNU, PGI, NEC,
IBM)
• MPI calls and messages
– If the application is MPI parallel
• OMP regions
– If the application is OpenMP parallel
8. Manual Instrumentation
• Allows for detailed source code instrumentation
– e.g. regions of functions such as loops
• Can be combined with automatic
instrumentation
• Be sure to instrument all function exits!
– Otherwise post-mortem analysis will fail
• I personally consider this advanced usage of
VampirTrace!
9. Manual Instrumentation
Manual Instrumentation
• Add the following into our source code to instrument
a region, e.g. C:
(available for C++ and FORTRAN as well)
#include "vt_user.h"
...
VT_USER_START("Region_1");
...
VT_USER_END("Region_1");
...
• Compile with “-DVTRACE”
– Otherwise, VampirTrace macros will expand to empty
blocks, producing zero overhead
vtcc -vt:inst manual prog.c -DVTRACE -o prog
10. Binary Instrumentation
• Using DYNINST
– http://www.dyninst.org
• Source should be compiled with “-g” switch
• “vtunify” has to be run manually afterwards
vtf90 -vt:inst dyninst prog.c -o prog
11. Behind the Scenes
Unifying - Post-Processing
OTF Open Trace Format
Tracing Overhead
RUN-TIME MEASUREMENT
12. Workflow
1) Instrumentation
– Hide instrumentation in compiler wrappers
– Use underlying compiler and add appropriate
options
CC=mpicc
CC= vtcc -vt:cc mpicc
2) Test Run
– Use representative test input
– Set parameters, environment variables, etc.
– Selective tracing
3) Get Trace
13. Automatic Function Tracing
• Uses compiler support to add tracing calls at
every function entry and exit
• Compilers supported:
– GNU, Intel, PGI, PathScale, IBM, Sun Fortran, NEC
• Binary instrumentation via Dyninst
14. MPI and OpenMP Tracing
• Tracing of MPI-1 and
MPI-IO events via
PMPI interface
• Tracing of OpenMP
directives via OPARI
source-to-source
instrumentation
15. Hardware Performance Counter
• Recording PAPI counter(s) at every function entry /
exit
• PAPI allows access to hardware (mostly CPU)
counters, e.g. floating point operations, cache
misses, exceptions
• Can derive rates, e.g. GFlop/s of each function
16. Memory and I/O Tracing
• Tracing of memory
allocation calls via libc
built-in hooks
• malloc, realloc, free, …
• Tracing of I/O calls,
accessed files,
transferred data volume
via wrappers for I/O calls
• open, read, write, …
17. Instrumentation & Measurement
What does VampirTrace do in the background?
• Trace Run:
– Event data collection
– Precise time measurement
– Parallel timer synchronization
– Collecting parallel process/thread traces
– Collecting performance counters
• from PAPI,
• memory usage,
• POSIX I/O calls and
• fork/system/exec calls, and more …
– Filtering and grouping of function calls
17
18. Behind the Scenes
• Trace data is written to a buffer in memory first
• When this buffer is full, data is flushed to storage
• After the application has run to completion,
these trace files are unified to produce the final
OTF trace
• Most aspects of this behavior can be customized
with environment variables
20. Unifying - Post-Processing
• Normally, trace data is unified automatically after
the application has run to completion
• This takes time – depending on the trace-data
• Can be switched off by an environment variable
• vtunify <number-of-trace-files> <trace-file-prefix>
vtunify 16 my_trace
21. How to Store Trace Data - Trace File
Various trace file formats (for HPC):
– VTF3 (TU Dresden)
– Tau Trace Format (Univ. of Oregon, LANL and JSC/Jülich)
– EPILOG (JSC/Jülich/Germany)
– STF (Pallas GmbH, now Intel)
– OTF (TU Dresden)
• ASCII or binary file formats
• single/multiple file(s) per trace
• merge process traces to single file
• multiple streams for parallel/selective I/O
22. OTF – Open Trace Format
• Open source trace file format
– Available from the homepage of TU Dresden, ZIH
http://www.tu-dresden.de/zih/otf/
• Includes powerful libotf for use in custom
applications
• API / Interfaces
– High level interface for analysis tools
– Low level interface for trace libraries
• Actively developed
– In cooperation with the University of Oregon and
Lawrence Livermore National Laboratory
23. Tracing Overhead
• Measured on SGI Altix 4700, Itanium 2 1.6 GHz
• Tracing overhead per function call (from test
program with one million function calls, multiple
repetitions)
• Suppressed inlining: icc -O2 -ip-no-inlining
VampirTrace Intel Trace Collector
3 PAPI counters 4.61 µs 9.64 µs
1 PAPI counter 4.47 µs 9.25 µs
Without PAPI 0.92 µs 1.10 µs
Filtered function 0.82 µs 1.04 µs
25. Environment Variables
• By default, trace data is written to the ‘pwd’
• About everything of this can be customized with
environment variables
• Environment variables must be set prior to
running the application, not prior to building the
application
26. Environment Variables
VT_PFORM_GDIR Directory where final trace file is stored
VT_PFORM_LDIR Directory for intermediate trace files
VT_FILE_PREFIX Trace file name
VT_BUFFER_SIZE Internal trace buffer size
VT_MAX_FLUSHES Max number of buffer flushes
VT_MEMTRACE Enable memory allocation tracing
VT_IOTRACE Enable I/O tracing
VT_MPITRACE Enable MPI tracing
VT_FILTER_SPEC Name of filter file
VT_GROUPS_SPEC Name of function groups file
VT_COMPRESSION Compress trace files
VT_METRICS List of PAPI counters
27. PAPI Counter
Environment Variables
• PAPI counters can be included in traces
– If PAPI is available on the platform
– If VampirTrace was build with PAPI support
• VT_METRICS can be used to specify a colon-
separated list of PAPI counters
export VT_METRICS=PAPI_FP_OPS:PAPI_L2_TCM
• VampirTrace >5.8.1 will have a customizable
separator as Component-PAPI counters will use
colons in the counter-names
28. Memory Counter
• Memory allocation counters can be included in
traces
– If VampirTrace was build with memory allocations
support
– If GNU glibc is used on the platform
• Memory function in glibc like “malloc” and “free”
are traced
• Environment variable VT_MEMTRACE
export VT_MEMTRACE=yes
29. I/O Counter
• I/O counter can be included in traces
– If VampirTrace was build with I/O tracing support
• Standard I/O calls like “open” and “read” are
recorded
• Environment variable VT_IOTRACE
export VT_IOTRACE=yes
30. User defined Counter
• Records program variables or any other
numerical quantity
#include "vt_user.h"
int main() {
unsigned int i, cid, cgid;
cgid = VT_COUNT_GROUP_DEF(’loopindex’);
cid = VT_COUNT_DEF("i", "#", VT_COUNT_TYPE_UNSIGNED, cgid);
for( i = 1; i <= 100; i++ ) {
VT_COUNT_UNSIGNED_VAL(cid, i);
}
return 0;
}
• Helps finding „that one loop-iteration“ which
causes trouble
32. Function Filtering
• Filtering is one of the ways to reduce trace file size
• Environment variable VT_FILTER_SPEC
%> export VT_FILTER_SPEC=filter.spec
• Filter definition file contains a list of filters
my_*;test_* -- 1000
debug_* -- 0
calculate -- -1
* -- 1000000
• Filter rules can be global to all processes or only be
assigned to specific ranks (see the manual for more details
of rank specific filtering)
• See also the vtfilter tool
– Can generate a customized filter file
– Can reduce the size of existing trace files
33. Switch Tracing On/Off
• Starting and stopping of tracing should be performed with
care
• Tracing has to be activated on the same level as it was
switched off to ensure the consistency of the trace file
• Useful if your program behaves in an iterative manner or if
you are only interested in some parts of your application
#include “vt_user.h”
…
VT_OFF();
for( i=1; i < 100; i++ ) { do something};
VT_ON();
…
• Recompile your source code with the user macro
“-DVTRACE”
%> vtcc … -DVTRACE source_code.c …
34. Selective Instrumentation
• Selective instrumentation can help you to reduce
the size of your trace file so that only those parts
of interests will be recorded
• One option to use selective instrumentation is to
use a manual instrumentation instead of a
automatic instrumentation
%> vtcc -vt:inst manual … source_code.c
• Another option is to modify your Makefile in such
a way that a automatic instrumentation (default)
is only applied to source files of interest
(functions of interest)
35. Function Grouping
• Groups can be defined by the user to group
related functions
– Groups can be assigned different colors in Vampir,
highlighting application behavior
• Environment variable VT_GROUPS_SPEC
export VT_GROUPS_SPEC=/path/to/groups.spec
• Group file contains a list of groups with
associated functions
CALC=calculate
MISC=my*;test
UNKNOWN=*
36. Advanced Performance Monitoring
• CUDA wrapper library Application
– Based on LD_PRELOAD
– Usable with dynamically Function
linked libraries enter leave
– Little overhead Preload-
Library
(indirection) Wrapper-
Function
– No re-compilation (neither
application nor library) enter leave
CUDA
Function
37. Advanced Performance Monitoring
• vtlibwrapgen foo.h
– Abstraction layer for
process monitoring monitor-gen
– Dynamic and static
libraries
callback.inc.* vt_user.h
– Requires library’s header
file only
– Portable make libmonitor/src
libmonitor.so
vtlibwrapgen -g SDL -o SDLwrap.c /usr/include/SDL/*.h
vtlibwrapgen --build --shared -o libSDLwrap SDLwrap.c
export LD_PRELOAD=$PWD/libSDLwrap.so <executable>