SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Trace Visualization
Visualization and Analysis of MPI Resources
Motivation & Mission
• Motivation
  –   Parallel programming is about performance!
  –   Scaling to thousands of cores is required
  –   You need a decent MPI implementation, e.g. Open MPI
  –   You also need a ready-to-use performance monitoring and
      analysis tool

• Mission
  – Visualization of dynamics of complex parallel processes
  – Requires two components
       • Monitor/Collector (VampirTrace)
       • Charts/Browser (Vampir)
  – Available for major platforms
  – Open Source (partially)
Event Trace Visualization
• Trace Visualization
  – Alternative and supplement to automatic analysis
  – Show dynamic run-time behavior graphically
  – Provide statistics and performance metrics
     •   Global timeline for parallel processes/threads
     •   Process timeline plus performance counters
     •   Statistics summary display
     •   Message statistics
     •   More
  – Interactive browsing, zooming, selecting
     • Adapt statistics to zoom level (time interval)
     • Also for very large and highly parallel traces
Vampir History
• PARvis at Research Center Jülich
• 1995: Vampir at Research Center Jülich
  http://www.top500.org/reports/1995/vampir/vampir.html
  – 1997: Vampir at TU Dresden
  – 2006: new version VampirServer (or Vampir NG)
      • Distributed storage, enhanced scalability
      • Client/server architecture
  – 2009: Vampir7 – redesign of GUI using QT
Vampir Toolset Architecture

CPU         CPU         CPU         CPU
                                                                    Vampir 7
        Multi-Core                              Vampir   Trace
         Program                                 Trace    File
CPU         CPU         CPU         CPU                  (OTF)




CPU         CPU         CPU         CPU         Vampir              VampirServer
  CPU         CPU         CPU         CPU
                                                 Trace
      CPU         CPU         CPU         CPU

CPU         CPU         CPU         CPU
  CPU         CPU         CPU         CPU
      CPU         CPU         CPU         CPU
                                                            Trace
               Many-Core                                   Bundle
CPU         CPU    CPU              CPU
  CPU
                Program
              CPU     CPU             CPU
      CPU         CPU         CPU         CPU

CPU         CPU         CPU         CPU
  CPU         CPU         CPU         CPU
      CPU         CPU         CPU         CPU
Vampir for Windows
• Vampir for UNIX
  – VampirClassic
    (single threaded)                     Vampir Classic
  – VampirServer                    All in one, single threaded
    (MPI parallel)
• Vampir for Windows                      Vampir Server
  – Based on parallel service   Parallelized           Visualization
    engine                      service engine
                                               Sockets
                                                             (Motif)
  – All new browser
• A beta of the new                Vampir 7 for Windows
  Browser for Linux
                                Threaded                  Windows
  available at                  service DLL
                                                API
                                                              GUI
  www.vampir.eu
Usage order of the Vampir
        Performance Analysis Toolset
1. Instrument your application with VampirTrace
2. Run your application with an appropriate test set
3. Analyze your trace file with Vampir
     1. Small trace files with a low number of processes can be
        analyzed on your local workstation
         1.   Start your local Vampir
         2.   Load trace file from your local disk
     2. Large trace files should be stored on the cluster file system
         1.   Start VampirServer on your analysis cluster
         2.   Start your local Vampir
         3.   Connect local Vampir with the VampirServer on the analysis cluster
         4.   Load trace file from the cluster file system
Vampir Displays
The main displays of Vampir:
• Master Timeline (Global Timeline )
• Process and Counter Timeline
• Function Summary
• Message Summary
• Process Summary
• Communication Matrix
• Call Tree
Vampir 7: Displays for a WRF trace
Master Timeline ( Global Timeline )


      Master
      Timeline
Process and Counter Timeline




   Process
   Timeline


   Counter
   Timeline
Function Summary


                   Function
                   Summary
Message Summary
Process Summary




 Process
Summary
Communication Matrix




                 Communication
                    Matrix
Call Tree
Customizable Chart Layout

•No cluttering
                                            Toolbars
•Time based alignment
                                            Master
•View impact at a glance                    Timeline
                                                                                    Function
                                                                                    Summary
•Simple controls (hidden)
•User defined                               Secondary Timeline
 –   Combination                                                                    Call Tree
 –   Rows and columns                       Process
 –   Arrangement                            Timeline
 –   Size                                   Function                  Func. Group        Context
                                            Legend                    Summary            View




                            Comprehensive Performance Tracking with
  Dresden, September 15th                                                                       Slide 17
                                          Vampir 7.0
Sessions
• What is a session?
   –    Trace file
   –    Chart selection                                Trace
                                                       File
                                                                 Config
                                                                 File
   –    Layout                                         (OTF)
                                                                                     Toolbars
                                                                                     TOOLBARS
                                                                                       Toolbars
   –    Preferences (i. e. colors)                   Master
                                                     Master
                                                     Master
   –    Chart options                                • Toolbars                 Function
                                                                                Function
                                                                                      Function
                                                     Timeline
                                                     Timeline Timeline
                                                     Timeline
                                                     • Master                   Summary
                                                                                Summary
                                                                                     Summary
• Scope of session properties                        • Secondary Timeline
                                                     • Process Timeline
                                                     Secondary Timeline
                                                     Secondary Timeline
   –    Identical for all traces                     Secondary Timeline
                                                     • Function Summary         Call Tree
                                                     • Function Group Summary
   –    Trace specific                               Process
                                                     Process
                                                     • Call Tree
                                                                                Call Tree
                                                                                CallTree
   –    Matter of taste                              Timeline
                                                     Timeline Legend
                                                     • Function
                                                     • Context View
   –    Therefore: scope is                          Function
                                                     Function          Func. Group
                                                                       Func. Group
                                                                       Func. Group   Context
                                                                                     Context
        customizable                                 Legend
                                                     Legend
                                                     Legend            Summary
                                                                       Summary
                                                                       Summary       View
                                                                                     View

• Can be attached to trace
  data

                             Comprehensive Performance Tracking with
 Dresden, September 15th                                                                    Slide 18
                                           Vampir 7.0
Typical Performance Problems
Finding Bottlenecks
• Trace Visualization
   – Vampir provides a number of display types
   – Each allows many different options
• Advice
   – Identify essential parts of an application (initialization,
     main iteration, I/O, finalization)
   – Identify important components of the code (serial
     computation, MPI P2P, collective MPI, OpenMP)
   – Make a hypothesis about performance problems
   – Consider application’s internal workings if known
   – Select the appropriate displays
   – Use statistic displays in conjunction with timelines
Communication
Computation
Memory, I/O, etc.
Tracing itself

FINDING BOTTLENECKS
Bottlenecks in Communication
•   Communications as such (dominating over computation)
•   Late sender, late receiver
•   Point-to-point messages instead of collective communication
•   Unmatched messages
•   Overcharge of MPI’s buffers
•   Bursts of large messages (bandwidth)
•   Frequent short messages (latency)
•   Unnecessary synchronization (barrier)

      All of the above usually result in high MPI time share
Bottlenecks in Communication




       unnecessary MPI_Barriers
Bottlenecks in Communication




  Patterns of successive MPI_Allreduce calls
Bottlenecks in Communication




Inefficient implementation of MPI_Allgatherv
Further Bottlenecks

• Unbalanced computation
  – Single late comer


• Strictly serial parts of program
  – Idle processes/threads


• Very frequent tiny function calls
• Sparse loops
Further Bottlenecks




 Example: Idle OpenMP threads
Bottlenecks in Computation
• Memory bound computation
  – Inefficient L1/L2/L3 cache usage
  – TLB misses
  – Detectable via HW performance counters
• I/O bound computation
  – Slow input/output
  – Sequential I/O on single process
  – I/O load imbalance
• Exception handling
Bottlenecks in Computation




   Low FP rate due to heavy cache misses
Bottlenecks in Computation




   Low FP rate due to heavy FP exceptions
Bottlenecks in Computation




      Irregular slow I/O operations
Effects due to Tracing
• Measurement overhead
  – Especially grave for tiny function calls
  – Solve with selective instrumentation


• Long/frequent/asynchronous trace buffer flushes
• Too man concurrent counters

• Heisenbugs
Effects due to Tracing




   Trace buffer flushes are explicitly marked in the trace.
It is rather harmless at the end of a trace as shown here.
Conclusion
– Performance analysis very important in HPC

– Use performance analysis tools for profiling and tracing
– Do not spend effort in DIY solutions,
  e.g. like printf-debugging

– Use tracing tools with some precautions
   • Overhead
   • Data volume


– Let us know about problems and about feature wishes
– vampirsupport@zih.tu-dresden.de
Summary
• Vampir & VampirServer
   –   Interactive trace visualization and analysis
   –   Intuitive browsing and zooming
   –   Scalable to large trace data sizes (100GByte)
   –   Scalable to high parallelism (20000 processes)
• Vampir for Linux in progress, beta available

• VampirTrace
   – Convenient instrumentation and measurement
   – Hides away complicated details
   – Provides many options and switches for experts
• VampirTrace is part of Open MPI > 1.3

Más contenido relacionado

Destacado

Air Traffic
Air TrafficAir Traffic
Air Traffic
humair73
 
GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...
PTIHPA
 
Github:fi Presentation
Github:fi PresentationGithub:fi Presentation
Github:fi Presentation
PTIHPA
 
2010 05 hands_on
2010 05 hands_on2010 05 hands_on
2010 05 hands_on
PTIHPA
 
2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration
PTIHPA
 
Big Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopBig Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing Workshop
PTIHPA
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir Overview
PTIHPA
 
Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert Henschel
PTIHPA
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In Detail
PTIHPA
 
Overview: Event Based Program Analysis
Overview: Event Based Program AnalysisOverview: Event Based Program Analysis
Overview: Event Based Program Analysis
PTIHPA
 

Destacado (16)

Air Traffic
Air TrafficAir Traffic
Air Traffic
 
Appraisers Direct, Inc.
Appraisers Direct, Inc.Appraisers Direct, Inc.
Appraisers Direct, Inc.
 
Aca Talent
Aca TalentAca Talent
Aca Talent
 
A Common Sense Approach Electronic
A Common Sense Approach   ElectronicA Common Sense Approach   Electronic
A Common Sense Approach Electronic
 
critical thinking
critical thinkingcritical thinking
critical thinking
 
GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...
 
Github:fi Presentation
Github:fi PresentationGithub:fi Presentation
Github:fi Presentation
 
2010 05 hands_on
2010 05 hands_on2010 05 hands_on
2010 05 hands_on
 
2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration
 
Big Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopBig Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing Workshop
 
Ciclismo Neiva
Ciclismo NeivaCiclismo Neiva
Ciclismo Neiva
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir Overview
 
How to Win the Moment in Real Time Events
How to Win the Moment in Real Time EventsHow to Win the Moment in Real Time Events
How to Win the Moment in Real Time Events
 
Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert Henschel
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In Detail
 
Overview: Event Based Program Analysis
Overview: Event Based Program AnalysisOverview: Event Based Program Analysis
Overview: Event Based Program Analysis
 

Similar a Trace Visualization

2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace Visualization
PTIHPA
 
Know More About Rational Performance - Snehamoy K
Know More About Rational Performance - Snehamoy KKnow More About Rational Performance - Snehamoy K
Know More About Rational Performance - Snehamoy K
Roopa Nadkarni
 
3 know more_about_rational_performance_tester_8-1-snehamoy_k
3 know more_about_rational_performance_tester_8-1-snehamoy_k3 know more_about_rational_performance_tester_8-1-snehamoy_k
3 know more_about_rational_performance_tester_8-1-snehamoy_k
IBM
 
2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement
PTIHPA
 
Web Sphere Problem Determination Ext
Web Sphere Problem Determination ExtWeb Sphere Problem Determination Ext
Web Sphere Problem Determination Ext
Rohit Kelapure
 
HPC Application Profiling & Analysis
HPC Application Profiling & AnalysisHPC Application Profiling & Analysis
HPC Application Profiling & Analysis
Rishi Pathak
 
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
HostedbyConfluent
 
HPC Application Profiling and Analysis
HPC Application Profiling and AnalysisHPC Application Profiling and Analysis
HPC Application Profiling and Analysis
Rishi Pathak
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
MapR Technologies
 

Similar a Trace Visualization (20)

2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace Visualization
 
Know More About Rational Performance - Snehamoy K
Know More About Rational Performance - Snehamoy KKnow More About Rational Performance - Snehamoy K
Know More About Rational Performance - Snehamoy K
 
3 know more_about_rational_performance_tester_8-1-snehamoy_k
3 know more_about_rational_performance_tester_8-1-snehamoy_k3 know more_about_rational_performance_tester_8-1-snehamoy_k
3 know more_about_rational_performance_tester_8-1-snehamoy_k
 
The Joy of SciPy
The Joy of SciPyThe Joy of SciPy
The Joy of SciPy
 
OpenSAF Symposium_Architecture_and_Roadmap_Update9.19.11
OpenSAF Symposium_Architecture_and_Roadmap_Update9.19.11OpenSAF Symposium_Architecture_and_Roadmap_Update9.19.11
OpenSAF Symposium_Architecture_and_Roadmap_Update9.19.11
 
2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement
 
Web Sphere Problem Determination Ext
Web Sphere Problem Determination ExtWeb Sphere Problem Determination Ext
Web Sphere Problem Determination Ext
 
Numba
NumbaNumba
Numba
 
HPC Application Profiling & Analysis
HPC Application Profiling & AnalysisHPC Application Profiling & Analysis
HPC Application Profiling & Analysis
 
(ATS3-PLAT06) Handling “Big Data” with Pipeline Pilot (MapReduce/NoSQL)
(ATS3-PLAT06) Handling “Big Data” with Pipeline Pilot (MapReduce/NoSQL)(ATS3-PLAT06) Handling “Big Data” with Pipeline Pilot (MapReduce/NoSQL)
(ATS3-PLAT06) Handling “Big Data” with Pipeline Pilot (MapReduce/NoSQL)
 
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
 
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor AppsLibrato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
HPC Application Profiling and Analysis
HPC Application Profiling and AnalysisHPC Application Profiling and Analysis
HPC Application Profiling and Analysis
 
Plugin-able POS Solutions by Javascript @HDM9 Taiwan
Plugin-able POS Solutions by Javascript @HDM9 TaiwanPlugin-able POS Solutions by Javascript @HDM9 Taiwan
Plugin-able POS Solutions by Javascript @HDM9 Taiwan
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
 
Cybera - Clouds & other computational frameworks for science
Cybera - Clouds & other computational frameworks for scienceCybera - Clouds & other computational frameworks for science
Cybera - Clouds & other computational frameworks for science
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 

Trace Visualization

  • 1. Trace Visualization Visualization and Analysis of MPI Resources
  • 2. Motivation & Mission • Motivation – Parallel programming is about performance! – Scaling to thousands of cores is required – You need a decent MPI implementation, e.g. Open MPI – You also need a ready-to-use performance monitoring and analysis tool • Mission – Visualization of dynamics of complex parallel processes – Requires two components • Monitor/Collector (VampirTrace) • Charts/Browser (Vampir) – Available for major platforms – Open Source (partially)
  • 3. Event Trace Visualization • Trace Visualization – Alternative and supplement to automatic analysis – Show dynamic run-time behavior graphically – Provide statistics and performance metrics • Global timeline for parallel processes/threads • Process timeline plus performance counters • Statistics summary display • Message statistics • More – Interactive browsing, zooming, selecting • Adapt statistics to zoom level (time interval) • Also for very large and highly parallel traces
  • 4. Vampir History • PARvis at Research Center Jülich • 1995: Vampir at Research Center Jülich http://www.top500.org/reports/1995/vampir/vampir.html – 1997: Vampir at TU Dresden – 2006: new version VampirServer (or Vampir NG) • Distributed storage, enhanced scalability • Client/server architecture – 2009: Vampir7 – redesign of GUI using QT
  • 5. Vampir Toolset Architecture CPU CPU CPU CPU Vampir 7 Multi-Core Vampir Trace Program Trace File CPU CPU CPU CPU (OTF) CPU CPU CPU CPU Vampir VampirServer CPU CPU CPU CPU Trace CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Trace Many-Core Bundle CPU CPU CPU CPU CPU Program CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU
  • 6. Vampir for Windows • Vampir for UNIX – VampirClassic (single threaded) Vampir Classic – VampirServer All in one, single threaded (MPI parallel) • Vampir for Windows Vampir Server – Based on parallel service Parallelized Visualization engine service engine Sockets (Motif) – All new browser • A beta of the new Vampir 7 for Windows Browser for Linux Threaded Windows available at service DLL API GUI www.vampir.eu
  • 7. Usage order of the Vampir Performance Analysis Toolset 1. Instrument your application with VampirTrace 2. Run your application with an appropriate test set 3. Analyze your trace file with Vampir 1. Small trace files with a low number of processes can be analyzed on your local workstation 1. Start your local Vampir 2. Load trace file from your local disk 2. Large trace files should be stored on the cluster file system 1. Start VampirServer on your analysis cluster 2. Start your local Vampir 3. Connect local Vampir with the VampirServer on the analysis cluster 4. Load trace file from the cluster file system
  • 8. Vampir Displays The main displays of Vampir: • Master Timeline (Global Timeline ) • Process and Counter Timeline • Function Summary • Message Summary • Process Summary • Communication Matrix • Call Tree
  • 9. Vampir 7: Displays for a WRF trace
  • 10. Master Timeline ( Global Timeline ) Master Timeline
  • 11. Process and Counter Timeline Process Timeline Counter Timeline
  • 12. Function Summary Function Summary
  • 15. Communication Matrix Communication Matrix
  • 17. Customizable Chart Layout •No cluttering Toolbars •Time based alignment Master •View impact at a glance Timeline Function Summary •Simple controls (hidden) •User defined Secondary Timeline – Combination Call Tree – Rows and columns Process – Arrangement Timeline – Size Function Func. Group Context Legend Summary View Comprehensive Performance Tracking with Dresden, September 15th Slide 17 Vampir 7.0
  • 18. Sessions • What is a session? – Trace file – Chart selection Trace File Config File – Layout (OTF) Toolbars TOOLBARS Toolbars – Preferences (i. e. colors) Master Master Master – Chart options • Toolbars Function Function Function Timeline Timeline Timeline Timeline • Master Summary Summary Summary • Scope of session properties • Secondary Timeline • Process Timeline Secondary Timeline Secondary Timeline – Identical for all traces Secondary Timeline • Function Summary Call Tree • Function Group Summary – Trace specific Process Process • Call Tree Call Tree CallTree – Matter of taste Timeline Timeline Legend • Function • Context View – Therefore: scope is Function Function Func. Group Func. Group Func. Group Context Context customizable Legend Legend Legend Summary Summary Summary View View • Can be attached to trace data Comprehensive Performance Tracking with Dresden, September 15th Slide 18 Vampir 7.0
  • 20. Finding Bottlenecks • Trace Visualization – Vampir provides a number of display types – Each allows many different options • Advice – Identify essential parts of an application (initialization, main iteration, I/O, finalization) – Identify important components of the code (serial computation, MPI P2P, collective MPI, OpenMP) – Make a hypothesis about performance problems – Consider application’s internal workings if known – Select the appropriate displays – Use statistic displays in conjunction with timelines
  • 22. Bottlenecks in Communication • Communications as such (dominating over computation) • Late sender, late receiver • Point-to-point messages instead of collective communication • Unmatched messages • Overcharge of MPI’s buffers • Bursts of large messages (bandwidth) • Frequent short messages (latency) • Unnecessary synchronization (barrier)  All of the above usually result in high MPI time share
  • 23. Bottlenecks in Communication unnecessary MPI_Barriers
  • 24. Bottlenecks in Communication Patterns of successive MPI_Allreduce calls
  • 25. Bottlenecks in Communication Inefficient implementation of MPI_Allgatherv
  • 26. Further Bottlenecks • Unbalanced computation – Single late comer • Strictly serial parts of program – Idle processes/threads • Very frequent tiny function calls • Sparse loops
  • 27. Further Bottlenecks Example: Idle OpenMP threads
  • 28. Bottlenecks in Computation • Memory bound computation – Inefficient L1/L2/L3 cache usage – TLB misses – Detectable via HW performance counters • I/O bound computation – Slow input/output – Sequential I/O on single process – I/O load imbalance • Exception handling
  • 29. Bottlenecks in Computation Low FP rate due to heavy cache misses
  • 30. Bottlenecks in Computation Low FP rate due to heavy FP exceptions
  • 31. Bottlenecks in Computation Irregular slow I/O operations
  • 32. Effects due to Tracing • Measurement overhead – Especially grave for tiny function calls – Solve with selective instrumentation • Long/frequent/asynchronous trace buffer flushes • Too man concurrent counters • Heisenbugs
  • 33. Effects due to Tracing Trace buffer flushes are explicitly marked in the trace. It is rather harmless at the end of a trace as shown here.
  • 34. Conclusion – Performance analysis very important in HPC – Use performance analysis tools for profiling and tracing – Do not spend effort in DIY solutions, e.g. like printf-debugging – Use tracing tools with some precautions • Overhead • Data volume – Let us know about problems and about feature wishes – vampirsupport@zih.tu-dresden.de
  • 35. Summary • Vampir & VampirServer – Interactive trace visualization and analysis – Intuitive browsing and zooming – Scalable to large trace data sizes (100GByte) – Scalable to high parallelism (20000 processes) • Vampir for Linux in progress, beta available • VampirTrace – Convenient instrumentation and measurement – Hides away complicated details – Provides many options and switches for experts • VampirTrace is part of Open MPI > 1.3