SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Instrumentation and
Run-Time Measurement

     VampirTrace
Overview
• Instrumentation
  – Automatic, manual and binary instrumentation
• Run-time measurement
  – Behind the scenes, post-processing
  – Trace file format, overhead
• Options, settings, parameters
  –   Environment Variables
  –   PAPI hardware performance counters
  –   Memory allocation counters, application I/O calls
  –   Filtering, grouping
• FAQ and Issues
INSTRUMENTATION
Instrumentation in General

  Edit – Compile – Run Cycle
                Compiler              Run
Source Code                  Binary             Results




  Edit – Compile – Run Cycle with VampirTrace
                Compiler              Run
Source Code                  Binary             Results
                VT Wrapper

                                                Traces
Compiler Wrappers
• Easiest way of using VampirTrace
• No source code modifications
• In the build system of your application, substitute
  calls to the regular compiler with calls to the
  VampirTrace compiler wrappers
  – For compiling and linking
  – e.g. in the makefile change icc to vtcc
• Rebuild the application
• Run the application to produce trace data
Instrumentation & Measurement
• What do you need to do for it?
   – VampirTrace and a supported compiler
• Instrumentation (automatic with compiler wrappers)
   CC      =    icc           CC      =     vtcc
   CXX     =    icpc          CXX     =     vtcxx
   F90     =    ifc           F90     =     vtf90
   MPICC   =    mpicc         MPICC   =     vtcc -vt:cc mpicc




• Re-compile & re-link
• Trace Run (run with appropriate test data set)

• More details later
Compiler Wrappers
              Compiler Wrappers
Captured events:
• All user function entries and exits
  – If supported by the compiler (Intel, GNU, PGI, NEC,
    IBM)
• MPI calls and messages
  – If the application is MPI parallel
• OMP regions
  – If the application is OpenMP parallel
Manual Instrumentation
• Allows for detailed source code instrumentation
  – e.g. regions of functions such as loops
• Can be combined with automatic
  instrumentation
• Be sure to instrument all function exits!
  – Otherwise post-mortem analysis will fail
• I personally consider this advanced usage of
  VampirTrace!
Manual Instrumentation
              Manual Instrumentation
• Add the following into our source code to instrument
  a region, e.g. C:
  (available for C++ and FORTRAN as well)
   #include "vt_user.h"
   ...
   VT_USER_START("Region_1");
   ...
   VT_USER_END("Region_1");
   ...

• Compile with “-DVTRACE”
  – Otherwise, VampirTrace macros will expand to empty
    blocks, producing zero overhead
  vtcc -vt:inst manual prog.c -DVTRACE -o prog
Binary Instrumentation
• Using DYNINST
  – http://www.dyninst.org
• Source should be compiled with “-g” switch
• “vtunify” has to be run manually afterwards
  vtf90 -vt:inst dyninst prog.c -o prog
Behind the Scenes
Unifying - Post-Processing
OTF Open Trace Format
Tracing Overhead

RUN-TIME MEASUREMENT
Workflow
1) Instrumentation
  – Hide instrumentation in compiler wrappers
  – Use underlying compiler and add appropriate
    options
           CC=mpicc
           CC= vtcc -vt:cc mpicc

2) Test Run
  – Use representative test input
  – Set parameters, environment variables, etc.
  – Selective tracing
3) Get Trace
Automatic Function Tracing
• Uses compiler support to add tracing calls at
  every function entry and exit
• Compilers supported:
  – GNU, Intel, PGI, PathScale, IBM, Sun Fortran, NEC
• Binary instrumentation via Dyninst
MPI and OpenMP Tracing
• Tracing of MPI-1 and
  MPI-IO events via
  PMPI interface



• Tracing of OpenMP
  directives via OPARI
  source-to-source
  instrumentation
Hardware Performance Counter
• Recording PAPI counter(s) at every function entry /
  exit
• PAPI allows access to hardware (mostly CPU)
  counters, e.g. floating point operations, cache
  misses, exceptions
• Can derive rates, e.g. GFlop/s of each function
Memory and I/O Tracing
• Tracing of memory
allocation calls via libc
built-in hooks
• malloc, realloc, free, …

• Tracing of I/O calls,
accessed files,
transferred data volume
via wrappers for I/O calls
• open, read, write, …
Instrumentation & Measurement
      What does VampirTrace do in the background?
• Trace Run:
  –   Event data collection
  –   Precise time measurement
  –   Parallel timer synchronization
  –   Collecting parallel process/thread traces
  –   Collecting performance counters
       •   from PAPI,
       •   memory usage,
       •   POSIX I/O calls and
       •   fork/system/exec calls, and more …
  – Filtering and grouping of function calls

                                                    17
Behind the Scenes
• Trace data is written to a buffer in memory first
• When this buffer is full, data is flushed to storage
• After the application has run to completion,
  these trace files are unified to produce the final
  OTF trace
• Most aspects of this behavior can be customized
  with environment variables
Filebased Workflow
Unifying - Post-Processing
• Normally, trace data is unified automatically after
  the application has run to completion
• This takes time – depending on the trace-data
• Can be switched off by an environment variable
• vtunify <number-of-trace-files> <trace-file-prefix>
   vtunify 16 my_trace
How to Store Trace Data - Trace File
Various trace file formats (for HPC):
    –   VTF3 (TU Dresden)
    –   Tau Trace Format (Univ. of Oregon, LANL and JSC/Jülich)
    –   EPILOG (JSC/Jülich/Germany)
    –   STF (Pallas GmbH, now Intel)
    –   OTF (TU Dresden)
•   ASCII or binary file formats
•   single/multiple file(s) per trace
•   merge process traces to single file
•   multiple streams for parallel/selective I/O
OTF – Open Trace Format
• Open source trace file format
  – Available from the homepage of TU Dresden, ZIH
     http://www.tu-dresden.de/zih/otf/
• Includes powerful libotf for use in custom
  applications
• API / Interfaces
  – High level interface for analysis tools
  – Low level interface for trace libraries
• Actively developed
  – In cooperation with the University of Oregon and
    Lawrence Livermore National Laboratory
Tracing Overhead
• Measured on SGI Altix 4700, Itanium 2 1.6 GHz
• Tracing overhead per function call (from test
  program with one million function calls, multiple
  repetitions)
• Suppressed inlining: icc -O2 -ip-no-inlining
                           VampirTrace   Intel Trace Collector

       3 PAPI counters       4.61 µs           9.64 µs

       1 PAPI counter        4.47 µs           9.25 µs

        Without PAPI         0.92 µs            1.10 µs

       Filtered function     0.82 µs           1.04 µs
Environment Variables
PAPI hardware performance counters
Memory allocation counters
Application I/O calls
Filtering
Grouping

OPTIONS, SETTINGS, PARAMETERS
Environment Variables
• By default, trace data is written to the ‘pwd’
• About everything of this can be customized with
  environment variables
• Environment variables must be set prior to
  running the application, not prior to building the
  application
Environment Variables
VT_PFORM_GDIR    Directory where final trace file is stored
VT_PFORM_LDIR    Directory for intermediate trace files
VT_FILE_PREFIX   Trace file name
VT_BUFFER_SIZE   Internal trace buffer size
VT_MAX_FLUSHES   Max number of buffer flushes
VT_MEMTRACE      Enable memory allocation tracing
VT_IOTRACE       Enable I/O tracing
VT_MPITRACE      Enable MPI tracing
VT_FILTER_SPEC   Name of filter file
VT_GROUPS_SPEC   Name of function groups file
VT_COMPRESSION   Compress trace files
VT_METRICS       List of PAPI counters
PAPI Counter
             Environment Variables
• PAPI counters can be included in traces
  – If PAPI is available on the platform
  – If VampirTrace was build with PAPI support
• VT_METRICS can be used to specify a colon-
  separated list of PAPI counters
  export VT_METRICS=PAPI_FP_OPS:PAPI_L2_TCM


• VampirTrace >5.8.1 will have a customizable
  separator as Component-PAPI counters will use
  colons in the counter-names
Memory Counter
• Memory allocation counters can be included in
  traces
  – If VampirTrace was build with memory allocations
    support
  – If GNU glibc is used on the platform
• Memory function in glibc like “malloc” and “free”
  are traced
• Environment variable VT_MEMTRACE

  export VT_MEMTRACE=yes
I/O Counter
• I/O counter can be included in traces
  – If VampirTrace was build with I/O tracing support
• Standard I/O calls like “open” and “read” are
  recorded
• Environment variable VT_IOTRACE
  export VT_IOTRACE=yes
User defined Counter
• Records program variables or any other
  numerical quantity
  #include "vt_user.h"
  int main() {
    unsigned int i, cid, cgid;

      cgid = VT_COUNT_GROUP_DEF(’loopindex’);
      cid = VT_COUNT_DEF("i", "#", VT_COUNT_TYPE_UNSIGNED, cgid);

      for( i = 1; i <= 100; i++ ) {
        VT_COUNT_UNSIGNED_VAL(cid, i);
      }
      return 0;
  }

• Helps finding „that one loop-iteration“ which
  causes trouble
User defined Counter
Function Filtering
• Filtering is one of the ways to reduce trace file size
• Environment variable VT_FILTER_SPEC
    %> export VT_FILTER_SPEC=filter.spec

• Filter definition file contains a list of filters
    my_*;test_* -- 1000
    debug_* -- 0
    calculate -- -1
    * -- 1000000

• Filter rules can be global to all processes or only be
  assigned to specific ranks (see the manual for more details
  of rank specific filtering)
• See also the vtfilter tool
   – Can generate a customized filter file
   – Can reduce the size of existing trace files
Switch Tracing On/Off
• Starting and stopping of tracing should be performed with
  care
• Tracing has to be activated on the same level as it was
  switched off to ensure the consistency of the trace file
• Useful if your program behaves in an iterative manner or if
  you are only interested in some parts of your application
    #include “vt_user.h”
    …
    VT_OFF();
    for( i=1; i < 100; i++ ) { do something};
    VT_ON();
    …


• Recompile your source code with the user macro
  “-DVTRACE”
    %> vtcc … -DVTRACE source_code.c …
Selective Instrumentation
• Selective instrumentation can help you to reduce
  the size of your trace file so that only those parts
  of interests will be recorded
• One option to use selective instrumentation is to
  use a manual instrumentation instead of a
  automatic instrumentation
   %> vtcc -vt:inst manual … source_code.c


• Another option is to modify your Makefile in such
  a way that a automatic instrumentation (default)
  is only applied to source files of interest
  (functions of interest)
Function Grouping
• Groups can be defined by the user to group
  related functions
  – Groups can be assigned different colors in Vampir,
    highlighting application behavior
• Environment variable VT_GROUPS_SPEC
   export VT_GROUPS_SPEC=/path/to/groups.spec

• Group file contains a list of groups with
  associated functions
   CALC=calculate
   MISC=my*;test
   UNKNOWN=*
Advanced Performance Monitoring
• CUDA wrapper library           Application
  – Based on LD_PRELOAD
  – Usable with dynamically        Function

    linked libraries                           enter       leave

  – Little overhead              Preload-
                                 Library
    (indirection)                               Wrapper-
                                                Function
  – No re-compilation (neither
    application nor library)                   enter       leave
                                 CUDA
                                                Function
Advanced Performance Monitoring
• vtlibwrapgen                      foo.h

  – Abstraction layer for
    process monitoring           monitor-gen
  – Dynamic and static
    libraries
                                callback.inc.*           vt_user.h
  – Requires library’s header
    file only
  – Portable                        make         libmonitor/src




                                libmonitor.so


   vtlibwrapgen -g SDL -o SDLwrap.c /usr/include/SDL/*.h
   vtlibwrapgen --build --shared -o libSDLwrap SDLwrap.c
   export LD_PRELOAD=$PWD/libSDLwrap.so <executable>
QUESTIONS?

Más contenido relacionado

La actualidad más candente

Contiki os timer tutorial
Contiki os timer tutorialContiki os timer tutorial
Contiki os timer tutorialSalah Amean
 
Introduction to arm processor
Introduction to arm processorIntroduction to arm processor
Introduction to arm processorRAMPRAKASHT1
 
ARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARMARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARMAnh Dung NGUYEN
 
AAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and OptimizationAAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and OptimizationAnh Dung NGUYEN
 
Q4.11: ARM Architecture
Q4.11: ARM ArchitectureQ4.11: ARM Architecture
Q4.11: ARM ArchitectureLinaro
 
Introduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM ProcessorIntroduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM ProcessorDarling Jemima
 
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAnh Dung NGUYEN
 
AAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation DiversityAAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation DiversityAnh Dung NGUYEN
 
LAS16-TR06: Remoteproc & rpmsg development
LAS16-TR06: Remoteproc & rpmsg developmentLAS16-TR06: Remoteproc & rpmsg development
LAS16-TR06: Remoteproc & rpmsg developmentLinaro
 
AAME ARM Techcon2013 003v02 Software Development
AAME ARM Techcon2013 003v02  Software DevelopmentAAME ARM Techcon2013 003v02  Software Development
AAME ARM Techcon2013 003v02 Software DevelopmentAnh Dung NGUYEN
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureRyo Jin
 
introduction to embedded systems part 2
introduction to embedded systems part 2introduction to embedded systems part 2
introduction to embedded systems part 2Hatem Abd El-Salam
 
Architectural support for High Level Language
Architectural support for High Level LanguageArchitectural support for High Level Language
Architectural support for High Level LanguageSudhanshu Janwadkar
 
AAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced FeaturesAAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced FeaturesAnh Dung NGUYEN
 
GPIO In Arm cortex-m4 tiva-c
GPIO In Arm cortex-m4 tiva-cGPIO In Arm cortex-m4 tiva-c
GPIO In Arm cortex-m4 tiva-cZakaria Gomaa
 
AAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System StartupAAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System StartupAnh Dung NGUYEN
 
Arm cortex-m4 programmer model
Arm cortex-m4 programmer modelArm cortex-m4 programmer model
Arm cortex-m4 programmer modelMohammed Gomaa
 

La actualidad más candente (20)

Contiki os timer tutorial
Contiki os timer tutorialContiki os timer tutorial
Contiki os timer tutorial
 
ARM AAE - System Issues
ARM AAE - System IssuesARM AAE - System Issues
ARM AAE - System Issues
 
Introduction to arm processor
Introduction to arm processorIntroduction to arm processor
Introduction to arm processor
 
ARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARMARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARM
 
AAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and OptimizationAAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and Optimization
 
Q4.11: ARM Architecture
Q4.11: ARM ArchitectureQ4.11: ARM Architecture
Q4.11: ARM Architecture
 
Introduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM ProcessorIntroduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM Processor
 
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
 
AAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation DiversityAAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation Diversity
 
LAS16-TR06: Remoteproc & rpmsg development
LAS16-TR06: Remoteproc & rpmsg developmentLAS16-TR06: Remoteproc & rpmsg development
LAS16-TR06: Remoteproc & rpmsg development
 
Arm architecture
Arm architectureArm architecture
Arm architecture
 
AAME ARM Techcon2013 003v02 Software Development
AAME ARM Techcon2013 003v02  Software DevelopmentAAME ARM Techcon2013 003v02  Software Development
AAME ARM Techcon2013 003v02 Software Development
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit Architecture
 
introduction to embedded systems part 2
introduction to embedded systems part 2introduction to embedded systems part 2
introduction to embedded systems part 2
 
Architectural support for High Level Language
Architectural support for High Level LanguageArchitectural support for High Level Language
Architectural support for High Level Language
 
ARM AAE - Architecture
ARM AAE - ArchitectureARM AAE - Architecture
ARM AAE - Architecture
 
AAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced FeaturesAAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced Features
 
GPIO In Arm cortex-m4 tiva-c
GPIO In Arm cortex-m4 tiva-cGPIO In Arm cortex-m4 tiva-c
GPIO In Arm cortex-m4 tiva-c
 
AAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System StartupAAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System Startup
 
Arm cortex-m4 programmer model
Arm cortex-m4 programmer modelArm cortex-m4 programmer model
Arm cortex-m4 programmer model
 

Destacado

A New Method for the Analysis of ppb
A New Method for the Analysis of ppb A New Method for the Analysis of ppb
A New Method for the Analysis of ppb Jennifer Maclachlan
 
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)James Clause
 
Risk Management Webinar
Risk Management WebinarRisk Management Webinar
Risk Management Webinarjanemangat
 
Sthack 2015 - Jonathan "@JonathanSalwan" Salwan - Dynamic Behavior Analysis U...
Sthack 2015 - Jonathan "@JonathanSalwan" Salwan - Dynamic Behavior Analysis U...Sthack 2015 - Jonathan "@JonathanSalwan" Salwan - Dynamic Behavior Analysis U...
Sthack 2015 - Jonathan "@JonathanSalwan" Salwan - Dynamic Behavior Analysis U...StHack
 
Matl Handling C
Matl Handling CMatl Handling C
Matl Handling CFFSafety
 
Code Instrumentation, Dynamic Tracing
Code Instrumentation, Dynamic TracingCode Instrumentation, Dynamic Tracing
Code Instrumentation, Dynamic TracingMartin Děcký
 
nullcon 2011 - Automatic Program Analysis using Dynamic Binary Instrumentation
nullcon 2011 - Automatic Program Analysis using Dynamic Binary Instrumentationnullcon 2011 - Automatic Program Analysis using Dynamic Binary Instrumentation
nullcon 2011 - Automatic Program Analysis using Dynamic Binary Instrumentationn|u - The Open Security Community
 
Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach Jonathan Salwan
 
Binary instrumentation - dc9723
Binary instrumentation - dc9723Binary instrumentation - dc9723
Binary instrumentation - dc9723Iftach Ian Amit
 
[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...
[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...
[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...GangSeok Lee
 
G4H Webcast: Automated Security Analysis of Mobile Applications with Mobile S...
G4H Webcast: Automated Security Analysis of Mobile Applications with Mobile S...G4H Webcast: Automated Security Analysis of Mobile Applications with Mobile S...
G4H Webcast: Automated Security Analysis of Mobile Applications with Mobile S...Ajin Abraham
 
GeeCon2016- High Performance Instrumentation (handout)
GeeCon2016- High Performance Instrumentation (handout)GeeCon2016- High Performance Instrumentation (handout)
GeeCon2016- High Performance Instrumentation (handout)Jaroslav Bachorik
 
Pressure Measurement
Pressure MeasurementPressure Measurement
Pressure MeasurementLiving Online
 
PRESSURE INSTRUMENTATION
PRESSURE INSTRUMENTATIONPRESSURE INSTRUMENTATION
PRESSURE INSTRUMENTATIONsanket kulkarni
 
Pressure measuring devices
Pressure measuring devicesPressure measuring devices
Pressure measuring devicesGauravsingh963
 
Gas chromatography . ppt
Gas chromatography . ppt  Gas chromatography . ppt
Gas chromatography . ppt shaisejacob
 
Environmental Analysis
Environmental  AnalysisEnvironmental  Analysis
Environmental AnalysisElijah Ezendu
 
Basic concepts of QA and QC
Basic concepts of QA and QCBasic concepts of QA and QC
Basic concepts of QA and QCGargi Nanda
 

Destacado (20)

A New Method for the Analysis of ppb
A New Method for the Analysis of ppb A New Method for the Analysis of ppb
A New Method for the Analysis of ppb
 
An Intro to EMS UK
An Intro to EMS UKAn Intro to EMS UK
An Intro to EMS UK
 
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
 
Risk Management Webinar
Risk Management WebinarRisk Management Webinar
Risk Management Webinar
 
Sthack 2015 - Jonathan "@JonathanSalwan" Salwan - Dynamic Behavior Analysis U...
Sthack 2015 - Jonathan "@JonathanSalwan" Salwan - Dynamic Behavior Analysis U...Sthack 2015 - Jonathan "@JonathanSalwan" Salwan - Dynamic Behavior Analysis U...
Sthack 2015 - Jonathan "@JonathanSalwan" Salwan - Dynamic Behavior Analysis U...
 
Matl Handling C
Matl Handling CMatl Handling C
Matl Handling C
 
Code Instrumentation, Dynamic Tracing
Code Instrumentation, Dynamic TracingCode Instrumentation, Dynamic Tracing
Code Instrumentation, Dynamic Tracing
 
nullcon 2011 - Automatic Program Analysis using Dynamic Binary Instrumentation
nullcon 2011 - Automatic Program Analysis using Dynamic Binary Instrumentationnullcon 2011 - Automatic Program Analysis using Dynamic Binary Instrumentation
nullcon 2011 - Automatic Program Analysis using Dynamic Binary Instrumentation
 
Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach
 
Binary instrumentation - dc9723
Binary instrumentation - dc9723Binary instrumentation - dc9723
Binary instrumentation - dc9723
 
[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...
[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...
[2011 CodeEngn Conference 05] Deok9 - DBI(Dynamic Binary Instrumentation)를 이용...
 
Valgrind
ValgrindValgrind
Valgrind
 
G4H Webcast: Automated Security Analysis of Mobile Applications with Mobile S...
G4H Webcast: Automated Security Analysis of Mobile Applications with Mobile S...G4H Webcast: Automated Security Analysis of Mobile Applications with Mobile S...
G4H Webcast: Automated Security Analysis of Mobile Applications with Mobile S...
 
GeeCon2016- High Performance Instrumentation (handout)
GeeCon2016- High Performance Instrumentation (handout)GeeCon2016- High Performance Instrumentation (handout)
GeeCon2016- High Performance Instrumentation (handout)
 
Pressure Measurement
Pressure MeasurementPressure Measurement
Pressure Measurement
 
PRESSURE INSTRUMENTATION
PRESSURE INSTRUMENTATIONPRESSURE INSTRUMENTATION
PRESSURE INSTRUMENTATION
 
Pressure measuring devices
Pressure measuring devicesPressure measuring devices
Pressure measuring devices
 
Gas chromatography . ppt
Gas chromatography . ppt  Gas chromatography . ppt
Gas chromatography . ppt
 
Environmental Analysis
Environmental  AnalysisEnvironmental  Analysis
Environmental Analysis
 
Basic concepts of QA and QC
Basic concepts of QA and QCBasic concepts of QA and QC
Basic concepts of QA and QC
 

Similar a 2010 02 instrumentation_and_runtime_measurement

HPC Application Profiling & Analysis
HPC Application Profiling & AnalysisHPC Application Profiling & Analysis
HPC Application Profiling & AnalysisRishi Pathak
 
HPC Application Profiling and Analysis
HPC Application Profiling and AnalysisHPC Application Profiling and Analysis
HPC Application Profiling and AnalysisRishi Pathak
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformGanesan Narayanasamy
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisBrendan Gregg
 
Containerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolContainerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolGanesan Narayanasamy
 
Performance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4SPerformance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4SGanesan Narayanasamy
 
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1Adam Dunkels
 
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...CanSecWest
 
2010 05 hands_on
2010 05 hands_on2010 05 hands_on
2010 05 hands_onPTIHPA
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir OverviewPTIHPA
 
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...Rogue Wave Software
 
Best Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing ClustersBest Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing ClustersIntel® Software
 
Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...Mingliang Liu
 
Security defined routing_cybergamut_v1_1
Security defined routing_cybergamut_v1_1Security defined routing_cybergamut_v1_1
Security defined routing_cybergamut_v1_1Joel W. King
 

Similar a 2010 02 instrumentation_and_runtime_measurement (20)

HPC Application Profiling & Analysis
HPC Application Profiling & AnalysisHPC Application Profiling & Analysis
HPC Application Profiling & Analysis
 
HPC Application Profiling and Analysis
HPC Application Profiling and AnalysisHPC Application Profiling and Analysis
HPC Application Profiling and Analysis
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
Containerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolContainerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor tool
 
Performance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4SPerformance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4S
 
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
 
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...
 
2010 05 hands_on
2010 05 hands_on2010 05 hands_on
2010 05 hands_on
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir Overview
 
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 
Best Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing ClustersBest Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing Clusters
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
BTCS501_MM_Ch9.pptx
BTCS501_MM_Ch9.pptxBTCS501_MM_Ch9.pptx
BTCS501_MM_Ch9.pptx
 
Os introduction
Os introductionOs introduction
Os introduction
 
Os introduction
Os introductionOs introduction
Os introduction
 
Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...Combining Phase Identification and Statistic Modeling for Automated Parallel ...
Combining Phase Identification and Statistic Modeling for Automated Parallel ...
 
Security defined routing_cybergamut_v1_1
Security defined routing_cybergamut_v1_1Security defined routing_cybergamut_v1_1
Security defined routing_cybergamut_v1_1
 
Lecture9
Lecture9Lecture9
Lecture9
 
Opmanager Workshop - Middle East
Opmanager Workshop - Middle EastOpmanager Workshop - Middle East
Opmanager Workshop - Middle East
 

Más de PTIHPA

Github:fi Presentation
Github:fi PresentationGithub:fi Presentation
Github:fi PresentationPTIHPA
 
Trace Visualization
Trace VisualizationTrace Visualization
Trace VisualizationPTIHPA
 
2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configurationPTIHPA
 
2010 03 papi_indiana
2010 03 papi_indiana2010 03 papi_indiana
2010 03 papi_indianaPTIHPA
 
Overview: Event Based Program Analysis
Overview: Event Based Program AnalysisOverview: Event Based Program Analysis
Overview: Event Based Program AnalysisPTIHPA
 
Switc Hpa
Switc HpaSwitc Hpa
Switc HpaPTIHPA
 
Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert HenschelPTIHPA
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In DetailPTIHPA
 
5 Vampir Configuration At IU
5 Vampir Configuration At IU5 Vampir Configuration At IU
5 Vampir Configuration At IUPTIHPA
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace VisualizationPTIHPA
 
4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir UsagePTIHPA
 
GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...PTIHPA
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorPTIHPA
 
Big Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopBig Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopPTIHPA
 

Más de PTIHPA (14)

Github:fi Presentation
Github:fi PresentationGithub:fi Presentation
Github:fi Presentation
 
Trace Visualization
Trace VisualizationTrace Visualization
Trace Visualization
 
2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration
 
2010 03 papi_indiana
2010 03 papi_indiana2010 03 papi_indiana
2010 03 papi_indiana
 
Overview: Event Based Program Analysis
Overview: Event Based Program AnalysisOverview: Event Based Program Analysis
Overview: Event Based Program Analysis
 
Switc Hpa
Switc HpaSwitc Hpa
Switc Hpa
 
Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert Henschel
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In Detail
 
5 Vampir Configuration At IU
5 Vampir Configuration At IU5 Vampir Configuration At IU
5 Vampir Configuration At IU
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace Visualization
 
4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage
 
GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
 
Big Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopBig Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing Workshop
 

2010 02 instrumentation_and_runtime_measurement

  • 2. Overview • Instrumentation – Automatic, manual and binary instrumentation • Run-time measurement – Behind the scenes, post-processing – Trace file format, overhead • Options, settings, parameters – Environment Variables – PAPI hardware performance counters – Memory allocation counters, application I/O calls – Filtering, grouping • FAQ and Issues
  • 4. Instrumentation in General Edit – Compile – Run Cycle Compiler Run Source Code Binary Results Edit – Compile – Run Cycle with VampirTrace Compiler Run Source Code Binary Results VT Wrapper Traces
  • 5. Compiler Wrappers • Easiest way of using VampirTrace • No source code modifications • In the build system of your application, substitute calls to the regular compiler with calls to the VampirTrace compiler wrappers – For compiling and linking – e.g. in the makefile change icc to vtcc • Rebuild the application • Run the application to produce trace data
  • 6. Instrumentation & Measurement • What do you need to do for it? – VampirTrace and a supported compiler • Instrumentation (automatic with compiler wrappers) CC = icc CC = vtcc CXX = icpc CXX = vtcxx F90 = ifc F90 = vtf90 MPICC = mpicc MPICC = vtcc -vt:cc mpicc • Re-compile & re-link • Trace Run (run with appropriate test data set) • More details later
  • 7. Compiler Wrappers Compiler Wrappers Captured events: • All user function entries and exits – If supported by the compiler (Intel, GNU, PGI, NEC, IBM) • MPI calls and messages – If the application is MPI parallel • OMP regions – If the application is OpenMP parallel
  • 8. Manual Instrumentation • Allows for detailed source code instrumentation – e.g. regions of functions such as loops • Can be combined with automatic instrumentation • Be sure to instrument all function exits! – Otherwise post-mortem analysis will fail • I personally consider this advanced usage of VampirTrace!
  • 9. Manual Instrumentation Manual Instrumentation • Add the following into our source code to instrument a region, e.g. C: (available for C++ and FORTRAN as well) #include "vt_user.h" ... VT_USER_START("Region_1"); ... VT_USER_END("Region_1"); ... • Compile with “-DVTRACE” – Otherwise, VampirTrace macros will expand to empty blocks, producing zero overhead vtcc -vt:inst manual prog.c -DVTRACE -o prog
  • 10. Binary Instrumentation • Using DYNINST – http://www.dyninst.org • Source should be compiled with “-g” switch • “vtunify” has to be run manually afterwards vtf90 -vt:inst dyninst prog.c -o prog
  • 11. Behind the Scenes Unifying - Post-Processing OTF Open Trace Format Tracing Overhead RUN-TIME MEASUREMENT
  • 12. Workflow 1) Instrumentation – Hide instrumentation in compiler wrappers – Use underlying compiler and add appropriate options CC=mpicc CC= vtcc -vt:cc mpicc 2) Test Run – Use representative test input – Set parameters, environment variables, etc. – Selective tracing 3) Get Trace
  • 13. Automatic Function Tracing • Uses compiler support to add tracing calls at every function entry and exit • Compilers supported: – GNU, Intel, PGI, PathScale, IBM, Sun Fortran, NEC • Binary instrumentation via Dyninst
  • 14. MPI and OpenMP Tracing • Tracing of MPI-1 and MPI-IO events via PMPI interface • Tracing of OpenMP directives via OPARI source-to-source instrumentation
  • 15. Hardware Performance Counter • Recording PAPI counter(s) at every function entry / exit • PAPI allows access to hardware (mostly CPU) counters, e.g. floating point operations, cache misses, exceptions • Can derive rates, e.g. GFlop/s of each function
  • 16. Memory and I/O Tracing • Tracing of memory allocation calls via libc built-in hooks • malloc, realloc, free, … • Tracing of I/O calls, accessed files, transferred data volume via wrappers for I/O calls • open, read, write, …
  • 17. Instrumentation & Measurement What does VampirTrace do in the background? • Trace Run: – Event data collection – Precise time measurement – Parallel timer synchronization – Collecting parallel process/thread traces – Collecting performance counters • from PAPI, • memory usage, • POSIX I/O calls and • fork/system/exec calls, and more … – Filtering and grouping of function calls 17
  • 18. Behind the Scenes • Trace data is written to a buffer in memory first • When this buffer is full, data is flushed to storage • After the application has run to completion, these trace files are unified to produce the final OTF trace • Most aspects of this behavior can be customized with environment variables
  • 20. Unifying - Post-Processing • Normally, trace data is unified automatically after the application has run to completion • This takes time – depending on the trace-data • Can be switched off by an environment variable • vtunify <number-of-trace-files> <trace-file-prefix> vtunify 16 my_trace
  • 21. How to Store Trace Data - Trace File Various trace file formats (for HPC): – VTF3 (TU Dresden) – Tau Trace Format (Univ. of Oregon, LANL and JSC/Jülich) – EPILOG (JSC/Jülich/Germany) – STF (Pallas GmbH, now Intel) – OTF (TU Dresden) • ASCII or binary file formats • single/multiple file(s) per trace • merge process traces to single file • multiple streams for parallel/selective I/O
  • 22. OTF – Open Trace Format • Open source trace file format – Available from the homepage of TU Dresden, ZIH http://www.tu-dresden.de/zih/otf/ • Includes powerful libotf for use in custom applications • API / Interfaces – High level interface for analysis tools – Low level interface for trace libraries • Actively developed – In cooperation with the University of Oregon and Lawrence Livermore National Laboratory
  • 23. Tracing Overhead • Measured on SGI Altix 4700, Itanium 2 1.6 GHz • Tracing overhead per function call (from test program with one million function calls, multiple repetitions) • Suppressed inlining: icc -O2 -ip-no-inlining VampirTrace Intel Trace Collector 3 PAPI counters 4.61 µs 9.64 µs 1 PAPI counter 4.47 µs 9.25 µs Without PAPI 0.92 µs 1.10 µs Filtered function 0.82 µs 1.04 µs
  • 24. Environment Variables PAPI hardware performance counters Memory allocation counters Application I/O calls Filtering Grouping OPTIONS, SETTINGS, PARAMETERS
  • 25. Environment Variables • By default, trace data is written to the ‘pwd’ • About everything of this can be customized with environment variables • Environment variables must be set prior to running the application, not prior to building the application
  • 26. Environment Variables VT_PFORM_GDIR Directory where final trace file is stored VT_PFORM_LDIR Directory for intermediate trace files VT_FILE_PREFIX Trace file name VT_BUFFER_SIZE Internal trace buffer size VT_MAX_FLUSHES Max number of buffer flushes VT_MEMTRACE Enable memory allocation tracing VT_IOTRACE Enable I/O tracing VT_MPITRACE Enable MPI tracing VT_FILTER_SPEC Name of filter file VT_GROUPS_SPEC Name of function groups file VT_COMPRESSION Compress trace files VT_METRICS List of PAPI counters
  • 27. PAPI Counter Environment Variables • PAPI counters can be included in traces – If PAPI is available on the platform – If VampirTrace was build with PAPI support • VT_METRICS can be used to specify a colon- separated list of PAPI counters export VT_METRICS=PAPI_FP_OPS:PAPI_L2_TCM • VampirTrace >5.8.1 will have a customizable separator as Component-PAPI counters will use colons in the counter-names
  • 28. Memory Counter • Memory allocation counters can be included in traces – If VampirTrace was build with memory allocations support – If GNU glibc is used on the platform • Memory function in glibc like “malloc” and “free” are traced • Environment variable VT_MEMTRACE export VT_MEMTRACE=yes
  • 29. I/O Counter • I/O counter can be included in traces – If VampirTrace was build with I/O tracing support • Standard I/O calls like “open” and “read” are recorded • Environment variable VT_IOTRACE export VT_IOTRACE=yes
  • 30. User defined Counter • Records program variables or any other numerical quantity #include "vt_user.h" int main() { unsigned int i, cid, cgid; cgid = VT_COUNT_GROUP_DEF(’loopindex’); cid = VT_COUNT_DEF("i", "#", VT_COUNT_TYPE_UNSIGNED, cgid); for( i = 1; i <= 100; i++ ) { VT_COUNT_UNSIGNED_VAL(cid, i); } return 0; } • Helps finding „that one loop-iteration“ which causes trouble
  • 32. Function Filtering • Filtering is one of the ways to reduce trace file size • Environment variable VT_FILTER_SPEC %> export VT_FILTER_SPEC=filter.spec • Filter definition file contains a list of filters my_*;test_* -- 1000 debug_* -- 0 calculate -- -1 * -- 1000000 • Filter rules can be global to all processes or only be assigned to specific ranks (see the manual for more details of rank specific filtering) • See also the vtfilter tool – Can generate a customized filter file – Can reduce the size of existing trace files
  • 33. Switch Tracing On/Off • Starting and stopping of tracing should be performed with care • Tracing has to be activated on the same level as it was switched off to ensure the consistency of the trace file • Useful if your program behaves in an iterative manner or if you are only interested in some parts of your application #include “vt_user.h” … VT_OFF(); for( i=1; i < 100; i++ ) { do something}; VT_ON(); … • Recompile your source code with the user macro “-DVTRACE” %> vtcc … -DVTRACE source_code.c …
  • 34. Selective Instrumentation • Selective instrumentation can help you to reduce the size of your trace file so that only those parts of interests will be recorded • One option to use selective instrumentation is to use a manual instrumentation instead of a automatic instrumentation %> vtcc -vt:inst manual … source_code.c • Another option is to modify your Makefile in such a way that a automatic instrumentation (default) is only applied to source files of interest (functions of interest)
  • 35. Function Grouping • Groups can be defined by the user to group related functions – Groups can be assigned different colors in Vampir, highlighting application behavior • Environment variable VT_GROUPS_SPEC export VT_GROUPS_SPEC=/path/to/groups.spec • Group file contains a list of groups with associated functions CALC=calculate MISC=my*;test UNKNOWN=*
  • 36. Advanced Performance Monitoring • CUDA wrapper library Application – Based on LD_PRELOAD – Usable with dynamically Function linked libraries enter leave – Little overhead Preload- Library (indirection) Wrapper- Function – No re-compilation (neither application nor library) enter leave CUDA Function
  • 37. Advanced Performance Monitoring • vtlibwrapgen foo.h – Abstraction layer for process monitoring monitor-gen – Dynamic and static libraries callback.inc.* vt_user.h – Requires library’s header file only – Portable make libmonitor/src libmonitor.so vtlibwrapgen -g SDL -o SDLwrap.c /usr/include/SDL/*.h vtlibwrapgen --build --shared -o libSDLwrap SDLwrap.c export LD_PRELOAD=$PWD/libSDLwrap.so <executable>