Vampir and VampirServer provide interactive visualization and analysis of trace data from parallel applications to help identify performance bottlenecks, with intuitive displays for browsing timeline, communication, and computation data across different levels of a parallel application from global to process to function. The toolset includes the VampirTrace instrumentation library and supports large trace files and high parallelism through a client-server architecture.
2. Motivation & Mission
• Motivation
– Parallel programming is about performance!
– Scaling to thousands of cores is required
– You need a decent MPI implementation, e.g. Open MPI
– You also need a ready-to-use performance monitoring and
analysis tool
• Mission
– Visualization of dynamics of complex parallel processes
– Requires two components
• Monitor/Collector (VampirTrace)
• Charts/Browser (Vampir)
– Available for major platforms
– Open Source (partially)
3. Event Trace Visualization
• Trace Visualization
– Alternative and supplement to automatic analysis
– Show dynamic run-time behavior graphically
– Provide statistics and performance metrics
• Global timeline for parallel processes/threads
• Process timeline plus performance counters
• Statistics summary display
• Message statistics
• More
– Interactive browsing, zooming, selecting
• Adapt statistics to zoom level (time interval)
• Also for very large and highly parallel traces
4. Vampir History
• PARvis at Research Center Jülich
• 1995: Vampir at Research Center Jülich
http://www.top500.org/reports/1995/vampir/vampir.html
– 1997: Vampir at TU Dresden
– 2006: new version VampirServer (or Vampir NG)
• Distributed storage, enhanced scalability
• Client/server architecture
– 2009: Vampir7 – redesign of GUI using QT
5. Vampir Toolset Architecture
CPU CPU CPU CPU
Vampir 7
Multi-Core Vampir Trace
Program Trace File
CPU CPU CPU CPU (OTF)
CPU CPU CPU CPU Vampir VampirServer
CPU CPU CPU CPU
Trace
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
Trace
Many-Core Bundle
CPU CPU CPU CPU
CPU
Program
CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
6. Vampir for Windows
• Vampir for UNIX
– VampirClassic
(single threaded) Vampir Classic
– VampirServer All in one, single threaded
(MPI parallel)
• Vampir for Windows Vampir Server
– Based on parallel service Parallelized Visualization
engine service engine
Sockets
(Motif)
– All new browser
• A beta of the new Vampir 7 for Windows
Browser for Linux
Threaded Windows
available at service DLL
API
GUI
www.vampir.eu
7. Usage order of the Vampir
Performance Analysis Toolset
1. Instrument your application with VampirTrace
2. Run your application with an appropriate test set
3. Analyze your trace file with Vampir
1. Small trace files with a low number of processes can be
analyzed on your local workstation
1. Start your local Vampir
2. Load trace file from your local disk
2. Large trace files should be stored on the cluster file system
1. Start VampirServer on your analysis cluster
2. Start your local Vampir
3. Connect local Vampir with the VampirServer on the analysis cluster
4. Load trace file from the cluster file system
8. Vampir Displays
The main displays of Vampir:
• Master Timeline (Global Timeline )
• Process and Counter Timeline
• Function Summary
• Message Summary
• Process Summary
• Communication Matrix
• Call Tree
17. Customizable Chart Layout
•No cluttering
Toolbars
•Time based alignment
Master
•View impact at a glance Timeline
Function
Summary
•Simple controls (hidden)
•User defined Secondary Timeline
– Combination Call Tree
– Rows and columns Process
– Arrangement Timeline
– Size Function Func. Group Context
Legend Summary View
Comprehensive Performance Tracking with
Dresden, September 15th Slide 17
Vampir 7.0
18. Sessions
• What is a session?
– Trace file
– Chart selection Trace
File
Config
File
– Layout (OTF)
Toolbars
TOOLBARS
Toolbars
– Preferences (i. e. colors) Master
Master
Master
– Chart options • Toolbars Function
Function
Function
Timeline
Timeline Timeline
Timeline
• Master Summary
Summary
Summary
• Scope of session properties • Secondary Timeline
• Process Timeline
Secondary Timeline
Secondary Timeline
– Identical for all traces Secondary Timeline
• Function Summary Call Tree
• Function Group Summary
– Trace specific Process
Process
• Call Tree
Call Tree
CallTree
– Matter of taste Timeline
Timeline Legend
• Function
• Context View
– Therefore: scope is Function
Function Func. Group
Func. Group
Func. Group Context
Context
customizable Legend
Legend
Legend Summary
Summary
Summary View
View
• Can be attached to trace
data
Comprehensive Performance Tracking with
Dresden, September 15th Slide 18
Vampir 7.0
20. Finding Bottlenecks
• Trace Visualization
– Vampir provides a number of display types
– Each allows many different options
• Advice
– Identify essential parts of an application (initialization,
main iteration, I/O, finalization)
– Identify important components of the code (serial
computation, MPI P2P, collective MPI, OpenMP)
– Make a hypothesis about performance problems
– Consider application’s internal workings if known
– Select the appropriate displays
– Use statistic displays in conjunction with timelines
22. Bottlenecks in Communication
• Communications as such (dominating over computation)
• Late sender, late receiver
• Point-to-point messages instead of collective communication
• Unmatched messages
• Overcharge of MPI’s buffers
• Bursts of large messages (bandwidth)
• Frequent short messages (latency)
• Unnecessary synchronization (barrier)
All of the above usually result in high MPI time share
26. Further Bottlenecks
• Unbalanced computation
– Single late comer
• Strictly serial parts of program
– Idle processes/threads
• Very frequent tiny function calls
• Sparse loops
32. Effects due to Tracing
• Measurement overhead
– Especially grave for tiny function calls
– Solve with selective instrumentation
• Long/frequent/asynchronous trace buffer flushes
• Too man concurrent counters
• Heisenbugs
33. Effects due to Tracing
Trace buffer flushes are explicitly marked in the trace.
It is rather harmless at the end of a trace as shown here.
34. Conclusion
– Performance analysis very important in HPC
– Use performance analysis tools for profiling and tracing
– Do not spend effort in DIY solutions,
e.g. like printf-debugging
– Use tracing tools with some precautions
• Overhead
• Data volume
– Let us know about problems and about feature wishes
– vampirsupport@zih.tu-dresden.de
35. Summary
• Vampir & VampirServer
– Interactive trace visualization and analysis
– Intuitive browsing and zooming
– Scalable to large trace data sizes (100GByte)
– Scalable to high parallelism (20000 processes)
• Vampir for Linux in progress, beta available
• VampirTrace
– Convenient instrumentation and measurement
– Hides away complicated details
– Provides many options and switches for experts
• VampirTrace is part of Open MPI > 1.3