SlideShare una empresa de Scribd logo
1 de 61
Descargar para leer sin conexión
Methods and practices to
analyze the performance of your
application with Intel® VTune™
Amplifier XE
Leo Borges
Intel Software Conference 2014 Brazil
May 2014
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE,
TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER
AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY
PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel
microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,
components, software, operations and functions. Any change to any of those factors may cause the results to vary. You
should consult other information and performance tests to assist you in fully evaluating your contemplated purchases,
including the performance of that product when combined with other products.
Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon Phi, Core, VTune, and Cilk are
trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that
are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and
other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on
microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended
for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for
Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information
regarding the specific instruction sets covered by this notice.
Notice revision #20110804
Legal Disclaimer & Optimization NoticeLegal Disclaimer & Optimization NoticeLegal Disclaimer & Optimization NoticeLegal Disclaimer & Optimization Notice
Copyright©Copyright©Copyright©Copyright© 2012,2012,2012,2012, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
2
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Agenda
• Intel® VTune Amplifier XE Intro
• Microarchitecture Review
• The Top-Down Characterization details
• Intel® VTune™ Amplifier XE Implementation
• Demo
**Sources for current presentation:
http://software.intel.com/en-us/articles/advanced-profiling-with-intel-
vtune-amplifier-xe-part-1-find-the-bottleneck
3
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Two Ways to Collect Data - Intel® VTune™ Amplifier XE
4
Software CollectorSoftware CollectorSoftware CollectorSoftware Collector
Hotspots, Concurrency, Locks & Waits
Hardware CollectorHardware CollectorHardware CollectorHardware Collector
Lightweight Hotspots, Advanced Analysis
Uses OS interrupts Uses the on chip Performance Monitoring
Unit (PMU)
Collects from a single process tree Collect system wide or from a
single process tree.
~10ms default resolution ~1ms default resolution
(finer granularity - finds small functions)
Collect on both Intel® and compatible
processors
Requires a genuine Intel® processor for
collection
Call stacks show calling sequence New! Optionally collect call stacks
Works in virtual environments Works in virtual environments only when
supported by the VM
(e.g., vSphere* 5.1)
No driver required Requires a driver
No special recompilesNo special recompilesNo special recompilesNo special recompiles ---- C, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, Assembly
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Two Ways to Collect Data - Intel® VTune™ Amplifier XE
5
Software CollectorSoftware CollectorSoftware CollectorSoftware Collector
Hotspots, Concurrency, Locks & Waits
Hardware CollectorHardware CollectorHardware CollectorHardware Collector
Lightweight Hotspots, Advanced Analysis
Uses OS interrupts Uses the on chip Performance Monitoring
Unit (PMU)
Collects from a single process tree Collect system wide or from a
single process tree.
~10ms default resolution ~1ms default resolution
(finer granularity - finds small functions)
Collect on both Intel® and compatible
processors
Requires a genuine Intel® processor for
collection
Call stacks show calling sequence New! Optionally collect call stacks
Works in virtual environments Works in virtual environments only when
supported by the VM
(e.g., vSphere* 5.1)
No driver required Requires a driver
No special recompilesNo special recompilesNo special recompilesNo special recompiles ---- C, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, Assembly
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Microarchitecture basics
6
FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute RetireRetireRetireRetire
• Classic 4-stage pipeline depicted here.
• Memory not shown.
• Pipeline on current processors capable of speculative
and out of order execution.
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Intuitive approach to EBS
• Use a small list of metrics to monitor level of
optimization
• Example 1: Cycles per instruction (CPI)
• Example 2: Instruction retirement ratio
m instructions issued n retired
Retirement ratio = n/m
% executed but not retired = (1 – n/m)*100
7
Intel Confidential
5/30/20
14
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Microarchitecture Review
8
FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit
The traditional 5-stage pipeline. Pipeline on current
processors capable of out of order execution.
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Microarchitecture Review
9
FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit
The traditional 5-stage pipeline. Pipeline on current
processors capable of out of order execution.
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Intel® Software Conference 2014Microarchitecture Review
10
FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit
FrontFrontFrontFront----EndEndEndEnd
The front-end fetches instructions IN ORDER, decodes them into
u-ops(micro-operations), and sends the u-ops to the back-end.
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Microarchitecture Review
11
FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit
FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd
The back-end receives u-ops, executes them OUT OF ORDER,
accesses memory as needed, and commits results to memory
IN ORDER.
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Microarchitecture Review
12
FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit
FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd
AllocationAllocationAllocationAllocation
Allocation is the point where u-ops transfer from the
front-end to the back-end. The front-end can allocate 4
u-ops per cycle.
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Microarchitecture Review
13
FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit
FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd
AllocationAllocationAllocationAllocation RetirementRetirementRetirementRetirement
Retirement is the point where u-ops leave the back-end. The
back-end can retire 4 u-ops per cycle.
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
And a New Term: the Pipeline Slot
14
FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit
FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd
4 Potential4 Potential4 Potential4 Potential
AllocationsAllocationsAllocationsAllocations
per Cycleper Cycleper Cycleper Cycle
4 Potential4 Potential4 Potential4 Potential
RetirementsRetirementsRetirementsRetirements
per Cycleper Cycleper Cycleper Cycle
In reality, there are many queues, buffers, and pieces of logic
throughout the pipeline to allow up to 4 allocations and 4
retirements per cycle.
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
And a New Term: the Pipeline Slot
15
FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit
FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd
4 Potential4 Potential4 Potential4 Potential
AllocationsAllocationsAllocationsAllocations
per Cycleper Cycleper Cycleper Cycle
4 Potential4 Potential4 Potential4 Potential
RetirementsRetirementsRetirementsRetirements
per Cycleper Cycleper Cycleper Cycle
The “Pipeline Slot” is an abstraction representing all the
resources needed to move one u-op through the pipeline.
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
ExecuteExecuteExecuteExecute
And a New Term: the Pipeline Slot
16
FetchFetchFetchFetch DecodeDecodeDecodeDecode MemoryMemoryMemoryMemory CommitCommitCommitCommit
FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd
There are 4 Pipeline Slots available every cycle.
S1
S2
S3
S4
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
And a New Term: the Pipeline Slot
17
FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit
FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd
Pipeline slots are filled with u-ops that travel from allocation
to retirement over multiple cycles.
S1
S2
S3
S4
S1
S2
S3
S4
S1
S2
S3
S4
S1
S2
S3
S4
S1
S2
S3
S4
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Cycles Per Instruction (CPI), a standard
measure, has some special kinks
For multi-core processors, CPI can get as low as 0.25 cycles
per instructions with current Intel processors.
Normally, something below CPI < ~1.0 is targeted for
better performances.
Some would suggest CPI must be targeted around ~0.75 to
0.50.
But is this correct to any architecture?
18
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Cycles Per Instruction (CPI), a standard
measure, has some special kinks
• Threads on each Intel® Xeon™ Phi core share a clock
If all 4 HW threads are active, each gets ¼ total cycles
• Multi-stage instruction decode requires two threads to utilize the
whole core – one thread only gets half
• With two ops/per cycle (U-V-pipe dual issue):
• To get thread CPI, multiply by the active threads
19
Threads perThreads perThreads perThreads per
CoreCoreCoreCore
BestBestBestBest CPICPICPICPI
perperperper CoreCoreCoreCore
1111 1.0
2222 0.5
3333 0.5
4444 0.5
Threads perThreads perThreads perThreads per
CoreCoreCoreCore
BestBestBestBest CPICPICPICPI
perperperper CoreCoreCoreCore
Best CPIBest CPIBest CPIBest CPI
per Threadper Threadper Threadper Thread
1 x1 x1 x1 x 1.0 = 1.0
2 x2 x2 x2 x 0.5 = 1.0
3 x3 x3 x3 x 0.5 = 1.5
4 x4 x4 x4 x 0.5 = 2.0
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
The Top-Down Characterization
What is it?
The Top-Down Characterization is:
• A new way to organize and use processor events to
identify the real hardware bottlenecks in
systems/applications
• Based on PMU events specifically designed for this task
• Integrated into Intel® VTune Amplifier XE for Core
• Available on Intel® Microarchitecture code named Sandy
Bridge and newer
20
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
The Top-Down Characterization
Each pipeline slot on each cycle is classified into 1 of 4 categories.
For each slot on each cycle:
21
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
The Top-Down Characterization
22
• Sum to 1.0
• Unit is “Percentage of total Pipeline Slots”
• This is the core of the new Top-Down
characterization
• Each category is further broken down depending on
available events
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
23
Back-EndFront-End
Latency Bandwith
Memory
Bound
Memory
Bound
Core
Bound
Core
Bound
L1
DRAM
Remote
DRAM
Local ou
Remote
L2
L3
DIV
Active
DIV
Active
Port
Utilization
Port
Utilization
0 .. 3 ports
Store
Bound
Store
Bound
ITLBITLB
Overhead
ICacheICache
Misses
DSB
Switches
Branch
Resteers
Retiring Bad
Speculation
Branch
Mispredict
Branch
Mispredict
Machine
Clears
Machine
Clears
General Microcode
Sequencer
Microcode
Sequencer
DSBMITE
Issues breakdown
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Examples of Metrics (Xeon™ Phi)
24
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Problem Area: L1 Cache Usage
• Significantly affects data access latency and therefore application performance
• Tuning Suggestions:
Software prefetching
Tile/block data access for cache size
Use streaming stores
If using 4K access stride, may be experiencing conflict misses
Examine Compiler prefetching (Compiler-generated L1 prefetches should not
miss)
25
MetricMetricMetricMetric FormulaFormulaFormulaFormula InvestigateInvestigateInvestigateInvestigate ifififif
L1
Misses
DATA_READ_MISS_OR_WRITE_MISS +
L1_DATA_HIT_INFLIGHT_PF1
L1 Hit
Rate
(DATA_READ_OR_WRITE – L1 Misses) /
DATA_READ_OR_WRITE
< 95%
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Problem Area: Data Access Latency
• Significantly affects application performance
• Tuning Suggestions:
Software prefetching
Tile/block data access for cache size
Use streaming stores
Check cache locality – turn off prefetching and use CACHE_FILL events - reduce
sharing if needed/possible
If using 64K access stride, may be experiencing conflict misses
26
MetricMetricMetricMetric FormulaFormulaFormulaFormula InvestigateInvestigateInvestigateInvestigate ifififif
Estimated
Latency
Impact
(CPU_CLK_UNHALTED
– EXEC_STAGE_CYCLES
– DATA_READ_OR_WRITE)
/ DATA_READ_OR_WRITE_MISS
>145
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Problem Area: TLB Usage
• Also affects data access latency and therefore application performance
• Tuning Suggestions:
Improve cache usage & data access latency
If L1 TLB miss/L2 TLB miss is high, try using large pages
For loops with multiple streams, try splitting into multiple loops
If data access stride is a large power of 2, consider padding between arrays by
one 4 KB page
27
MetricMetricMetricMetric FormulaFormulaFormulaFormula InvestInvestInvestInvest----
igateigateigateigate ifififif
L1 TLB miss ratio DATA_PAGE_WALK/DATA_READ_OR_WRITE > 1%
L2 TLB miss ratio LONG_DATA_PAGE_WALK
/ DATA_READ_OR_WRITE
> .1%
L1 TLB misses per L2
TLB miss
DATA_PAGE_WALK / LONG_DATA_PAGE_WALK > 100x
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Problem Area: VPU Usage
• Indicates whether an application is vectorized successfully and efficiently
• Tuning Suggestions:
Use the Compiler vectorization report!
For data dependencies preventing vectorization, try using Intel® Cilk™ Plus
#pragma SIMD (if safe!)
Align data and tell the Compiler!
Re-structure code if possible: Array notations, AOS->SOA
28
MetricMetricMetricMetric FormulaFormulaFormulaFormula InvestigateInvestigateInvestigateInvestigate ifififif
Vectorization
Intensity
VPU_ELEMENTS_ACTIVE /
VPU_INSTRUCTIONS_EXECUTED
<8 (DP), <16(SP)
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Problem Area: Memory Bandwidth
• Can increase data latency in the system or become a performance bottleneck
• Tuning Suggestions:
Improve locality in caches
Use streaming stores
Improve software prefetching
29
MetricMetricMetricMetric FormulaFormulaFormulaFormula InvestigateInvestigateInvestigateInvestigate ifififif
Memory
Bandwidth
(UNC_F_CH0_NORMAL_READ +
UNC_F_CH0_NORMAL_WRITE+
UNC_F_CH1_NORMAL_READ +
UNC_F_CH1_NORMAL_WRITE) * 64/time
< 80GB/sec
(practical peak
140GB/sec)
(with 8 memory
controllers)
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE
30
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
DEMO
31
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Running the General Exploration Collector
32
2. Select
“General
Exploration” for
your CPU
architecture
3. Click
“Start” to
begin
profiling
1. Click “New
Analysis” button
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
General Exploration Summary
33
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
34
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
35
Instructions Navigator New Open PropertiesInstructions Navigator New Open PropertiesInstructions Navigator New Open PropertiesInstructions Navigator New Open Properties New Open CompareNew Open CompareNew Open CompareNew Open Compare
ProjectProjectProjectProject ResultResultResultResult
ToolbarToolbarToolbarToolbar
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
36
ProjectProjectProjectProject
NavigatorNavigatorNavigatorNavigator
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
37
Result DisplayResult DisplayResult DisplayResult Display
TabsTabsTabsTabs
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
38
Result AnalysisResult AnalysisResult AnalysisResult Analysis
TypeTypeTypeType
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
39
Result ViewpointResult ViewpointResult ViewpointResult Viewpoint
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
40
ViewpointViewpointViewpointViewpoint
AlternatesAlternatesAlternatesAlternates
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
41
ResultResultResultResult ComponentsComponentsComponentsComponents
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
42
GridGridGridGrid PanePanePanePane
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
43
GridGridGridGrid PanePanePanePane
Grouping pullGrouping pullGrouping pullGrouping pull----downdowndowndown
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
44
StackStackStackStack
PanePanePanePane
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
45
TimelineTimelineTimelineTimeline
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
46
Filter/OptionsFilter/OptionsFilter/OptionsFilter/Options
BarBarBarBar
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
Intel Confidential47
5/30/20
14
Source View /Source View /Source View /Source View /
Per line localizationPer line localizationPer line localizationPer line localization
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
Intel Confidential48
5/30/20
14
Source View /Source View /Source View /Source View /
View / Hot spotView / Hot spotView / Hot spotView / Hot spot
Navigation controlsNavigation controlsNavigation controlsNavigation controls
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
Intel Confidential49
5/30/20
14
Assembly View /Assembly View /Assembly View /Assembly View /
View / Hot spotView / Hot spotView / Hot spotView / Hot spot
Navigation controlsNavigation controlsNavigation controlsNavigation controls
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
VTune™ Amplifier XE visualizes performance
Intel Confidential50
5/30/20
14
Assembly View /Assembly View /Assembly View /Assembly View /
AssemblyAssemblyAssemblyAssembly
groupingsgroupingsgroupingsgroupings
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Intel® Software Conference 2014
For event collection the coprocessor
is treated as a special HW
architecture
51
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Intel® Software Conference 2014
Project properties provides the
means to invoke data collection by
target type
52
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Intel® Software Conference 2014
Launch Application serves many
uses, from host/offload to native
execution
53
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Intel® Software Conference 2014Search directories have been reorganized to
speed symbol resolution during finalization
54
Notable coprocessor library paths:Notable coprocessor library paths:Notable coprocessor library paths:Notable coprocessor library paths:
/opt/mpss/3.2/sysroots/k1om-mpss-Linux/boot
/opt/mpss/3.2/sysroots/k1om-mpss-Linux/lib64
/opt/intel/composerxe/lib/mic
/opt/intel/composerxe/tbb/lib/mic
/opt/intel/composerxe/mkl/lib/mic
/opt/intel/mpi-rt/4.1.3/mic
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Intel® Software Conference 2014
General Exploration runs a set of events to
drive top-down analysis
55
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
For more information on Intel® Xeon
Phi™ and VTune™ Amplifier XE
56
Optimization on the coprocessor: http://software.intel.com/en-
us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-
coprocessors-part-1-optimization
http://software.intel.com/en-us/articles/optimization-and-
performance-tuning-for-intel-xeon-phi-coprocessors-part-2-
understanding
Coprocessor Performance Monitoring Unit:
http://software.intel.com/sites/default/files/forum/278102/intelr-
xeon-phitm-pmu-rev1.01.pdf
For general information: http://software.intel.com/mic-developer
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Grid is Based on Top-Down
57
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Use the Hover Text to Understand Metrics*
*Suggestions welcome: Submit issues if the text isn’t helpful
58
Copyright© 2013, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.
Event collections on the coprocessor can
generate volumes of data
dgemm: on 60+ cores
Tip: Use cpu-mask to reduce data set, while maintaining
the same accuracy.
59
Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.
Resources
Top-Down Characterization White Paper
http://software.intel.com/en-us/articles/how-to-tune-applications-
using-a-top-down-characterization-of-microarchitectural-issues
Tuning Guides
http://software.intel.com/en-us/articles/processor-specific-
performance-analysis-papers
60
Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Más contenido relacionado

La actualidad más candente

Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...Intel® Software
 
Performance out of the box developers
Performance   out of the box developersPerformance   out of the box developers
Performance out of the box developersMichelle Holley
 
5 pipeline arch_rationale
5 pipeline arch_rationale5 pipeline arch_rationale
5 pipeline arch_rationalevideos
 
Intel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabIntel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabMichelle Holley
 
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-ResolutionUltra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-ResolutionIntel® Software
 
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Intel® Software
 
Introduction to nfv movilforum
Introduction to nfv   movilforumIntroduction to nfv   movilforum
Introduction to nfv movilforumvideos
 
A Path to NFV/SDN - Intel. Michael Brennan, INTEL
A Path to NFV/SDN - Intel. Michael Brennan, INTELA Path to NFV/SDN - Intel. Michael Brennan, INTEL
A Path to NFV/SDN - Intel. Michael Brennan, INTELWalton Institute
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley
 
3 additional dpdk_theory(1)
3 additional dpdk_theory(1)3 additional dpdk_theory(1)
3 additional dpdk_theory(1)videos
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingMichelle Holley
 
More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upIntel® Software
 
Unleashing End-to_end TLS Security Leveraging NGINX with Intel(r) QuickAssist...
Unleashing End-to_end TLS Security Leveraging NGINX with Intel(r) QuickAssist...Unleashing End-to_end TLS Security Leveraging NGINX with Intel(r) QuickAssist...
Unleashing End-to_end TLS Security Leveraging NGINX with Intel(r) QuickAssist...Michelle Holley
 
Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...Michelle Holley
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAIntel® Software
 
1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hw1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hwvideos
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Slide_N
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAIntel® Software
 
6 profiling tools
6 profiling tools6 profiling tools
6 profiling toolsvideos
 

La actualidad más candente (20)

Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
 
Performance out of the box developers
Performance   out of the box developersPerformance   out of the box developers
Performance out of the box developers
 
5 pipeline arch_rationale
5 pipeline arch_rationale5 pipeline arch_rationale
5 pipeline arch_rationale
 
Intel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabIntel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/Lab
 
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-ResolutionUltra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
 
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
 
Introduction to nfv movilforum
Introduction to nfv   movilforumIntroduction to nfv   movilforum
Introduction to nfv movilforum
 
A Path to NFV/SDN - Intel. Michael Brennan, INTEL
A Path to NFV/SDN - Intel. Michael Brennan, INTELA Path to NFV/SDN - Intel. Michael Brennan, INTEL
A Path to NFV/SDN - Intel. Michael Brennan, INTEL
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
3 additional dpdk_theory(1)
3 additional dpdk_theory(1)3 additional dpdk_theory(1)
3 additional dpdk_theory(1)
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
 
More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff up
 
Unleashing End-to_end TLS Security Leveraging NGINX with Intel(r) QuickAssist...
Unleashing End-to_end TLS Security Leveraging NGINX with Intel(r) QuickAssist...Unleashing End-to_end TLS Security Leveraging NGINX with Intel(r) QuickAssist...
Unleashing End-to_end TLS Security Leveraging NGINX with Intel(r) QuickAssist...
 
Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
 
No[1][1]
No[1][1]No[1][1]
No[1][1]
 
1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hw1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hw
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
 
6 profiling tools
6 profiling tools6 profiling tools
6 profiling tools
 

Destacado

Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5
Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5
Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5Intel Software Brasil
 
Desafios do Desenvolvimento Multi-plataforma
Desafios do Desenvolvimento Multi-plataformaDesafios do Desenvolvimento Multi-plataforma
Desafios do Desenvolvimento Multi-plataformaIntel Software Brasil
 
Escreva sua App sem gastar energia, agora no KitKat
Escreva sua App sem gastar energia, agora no KitKatEscreva sua App sem gastar energia, agora no KitKat
Escreva sua App sem gastar energia, agora no KitKatIntel Software Brasil
 
Principais conceitos técnicas e modelos de programação paralela
Principais conceitos técnicas e modelos de programação paralelaPrincipais conceitos técnicas e modelos de programação paralela
Principais conceitos técnicas e modelos de programação paralelaIntel Software Brasil
 
Modernização de código em Xeon® e Xeon Phi™
Modernização de código em Xeon® e Xeon Phi™  Modernização de código em Xeon® e Xeon Phi™
Modernização de código em Xeon® e Xeon Phi™ Intel Software Brasil
 
Benchmarking para sistemas de alto desempenho
Benchmarking para sistemas de alto desempenhoBenchmarking para sistemas de alto desempenho
Benchmarking para sistemas de alto desempenhoIntel Software Brasil
 
Principais conceitos e técnicas em vetorização
Principais conceitos e técnicas em vetorizaçãoPrincipais conceitos e técnicas em vetorização
Principais conceitos e técnicas em vetorizaçãoIntel Software Brasil
 
Identificando Hotspots e Intel® VTune™ Amplifier - Intel Software Conference
Identificando Hotspots e Intel® VTune™ Amplifier - Intel Software ConferenceIdentificando Hotspots e Intel® VTune™ Amplifier - Intel Software Conference
Identificando Hotspots e Intel® VTune™ Amplifier - Intel Software ConferenceIntel Software Brasil
 
Vetorização e Otimização de Código - Intel Software Conference 2013
Vetorização e Otimização de Código - Intel Software Conference 2013Vetorização e Otimização de Código - Intel Software Conference 2013
Vetorização e Otimização de Código - Intel Software Conference 2013Intel Software Brasil
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Software Brasil
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Intel Software Brasil
 
Intel® MPI Library e OpenMP* - Intel Software Conference 2013
Intel® MPI Library e OpenMP* - Intel Software Conference 2013Intel® MPI Library e OpenMP* - Intel Software Conference 2013
Intel® MPI Library e OpenMP* - Intel Software Conference 2013Intel Software Brasil
 
Team Performance and Leadership
Team Performance and LeadershipTeam Performance and Leadership
Team Performance and LeadershipLeda Karabela
 

Destacado (20)

Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5
Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5
Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5
 
Desafios do Desenvolvimento Multi-plataforma
Desafios do Desenvolvimento Multi-plataformaDesafios do Desenvolvimento Multi-plataforma
Desafios do Desenvolvimento Multi-plataforma
 
Html5 fisl15
Html5 fisl15Html5 fisl15
Html5 fisl15
 
Escreva sua App sem gastar energia, agora no KitKat
Escreva sua App sem gastar energia, agora no KitKatEscreva sua App sem gastar energia, agora no KitKat
Escreva sua App sem gastar energia, agora no KitKat
 
IoT FISL15
IoT FISL15IoT FISL15
IoT FISL15
 
Principais conceitos técnicas e modelos de programação paralela
Principais conceitos técnicas e modelos de programação paralelaPrincipais conceitos técnicas e modelos de programação paralela
Principais conceitos técnicas e modelos de programação paralela
 
Modernização de código em Xeon® e Xeon Phi™
Modernização de código em Xeon® e Xeon Phi™  Modernização de código em Xeon® e Xeon Phi™
Modernização de código em Xeon® e Xeon Phi™
 
Benchmarking para sistemas de alto desempenho
Benchmarking para sistemas de alto desempenhoBenchmarking para sistemas de alto desempenho
Benchmarking para sistemas de alto desempenho
 
Principais conceitos e técnicas em vetorização
Principais conceitos e técnicas em vetorizaçãoPrincipais conceitos e técnicas em vetorização
Principais conceitos e técnicas em vetorização
 
Identificando Hotspots e Intel® VTune™ Amplifier - Intel Software Conference
Identificando Hotspots e Intel® VTune™ Amplifier - Intel Software ConferenceIdentificando Hotspots e Intel® VTune™ Amplifier - Intel Software Conference
Identificando Hotspots e Intel® VTune™ Amplifier - Intel Software Conference
 
Vetorização e Otimização de Código - Intel Software Conference 2013
Vetorização e Otimização de Código - Intel Software Conference 2013Vetorização e Otimização de Código - Intel Software Conference 2013
Vetorização e Otimização de Código - Intel Software Conference 2013
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance Computing
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
 
Intel® MPI Library e OpenMP* - Intel Software Conference 2013
Intel® MPI Library e OpenMP* - Intel Software Conference 2013Intel® MPI Library e OpenMP* - Intel Software Conference 2013
Intel® MPI Library e OpenMP* - Intel Software Conference 2013
 
Notes on NUMA architecture
Notes on NUMA architectureNotes on NUMA architecture
Notes on NUMA architecture
 
CV-LucianoPalma
CV-LucianoPalmaCV-LucianoPalma
CV-LucianoPalma
 
Kpi example
Kpi exampleKpi example
Kpi example
 
Team Performance and Leadership
Team Performance and LeadershipTeam Performance and Leadership
Team Performance and Leadership
 
CRNM - Health Study
CRNM - Health StudyCRNM - Health Study
CRNM - Health Study
 
Future continuous
Future continuousFuture continuous
Future continuous
 

Similar a Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Embree Ray Tracing Kernels
Embree Ray Tracing KernelsEmbree Ray Tracing Kernels
Embree Ray Tracing KernelsIntel® Software
 
Explore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEExplore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEIntel IT Center
 
Intel® VTune™ Amplifier - Intel Software Conference 2013
Intel® VTune™ Amplifier - Intel Software Conference 2013Intel® VTune™ Amplifier - Intel Software Conference 2013
Intel® VTune™ Amplifier - Intel Software Conference 2013Intel Software Brasil
 
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRaySoftware-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRayIntel® Software
 
Технологии Intel для виртуализации сетей операторов связи
Технологии Intel для виртуализации сетей операторов связиТехнологии Intel для виртуализации сетей операторов связи
Технологии Intel для виртуализации сетей операторов связиCisco Russia
 
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...tdc-globalcode
 
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...Igor José F. Freitas
 
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel Software Brasil
 
Austin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intelAustin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intelVu Hung Nguyen
 
Intel Mobile Launch Information
Intel Mobile Launch InformationIntel Mobile Launch Information
Intel Mobile Launch InformationAnna Yovka
 
Intel® Open Image Denoise in Unity*
Intel® Open Image Denoise in Unity*Intel® Open Image Denoise in Unity*
Intel® Open Image Denoise in Unity*Intel® Software
 
Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018AWS User Group Bengaluru
 
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY
 
How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%Gael Hofemeier
 
Droidcon2013 x86phones weggerle_taubert_intel
Droidcon2013 x86phones weggerle_taubert_intelDroidcon2013 x86phones weggerle_taubert_intel
Droidcon2013 x86phones weggerle_taubert_intelDroidcon Berlin
 
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura IntelTDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Inteltdc-globalcode
 
Intel Public Roadmap for Desktop, Mobile, Data Center
Intel Public Roadmap for Desktop, Mobile, Data CenterIntel Public Roadmap for Desktop, Mobile, Data Center
Intel Public Roadmap for Desktop, Mobile, Data CenterDr. Wilfred Lin (Ph.D.)
 
Accelerate Ceph performance via SPDK related techniques
Accelerate Ceph performance via SPDK related techniques Accelerate Ceph performance via SPDK related techniques
Accelerate Ceph performance via SPDK related techniques Ceph Community
 
Make your unity game faster, faster
Make your unity game faster, fasterMake your unity game faster, faster
Make your unity game faster, fasterIntel® Software
 

Similar a Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE (20)

Embree Ray Tracing Kernels
Embree Ray Tracing KernelsEmbree Ray Tracing Kernels
Embree Ray Tracing Kernels
 
Explore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEExplore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XE
 
Intel® VTune™ Amplifier - Intel Software Conference 2013
Intel® VTune™ Amplifier - Intel Software Conference 2013Intel® VTune™ Amplifier - Intel Software Conference 2013
Intel® VTune™ Amplifier - Intel Software Conference 2013
 
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRaySoftware-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
 
Технологии Intel для виртуализации сетей операторов связи
Технологии Intel для виртуализации сетей операторов связиТехнологии Intel для виртуализации сетей операторов связи
Технологии Intel для виртуализации сетей операторов связи
 
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
 
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
 
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
 
Austin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intelAustin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intel
 
Intel Mobile Launch Information
Intel Mobile Launch InformationIntel Mobile Launch Information
Intel Mobile Launch Information
 
Intel® Open Image Denoise in Unity*
Intel® Open Image Denoise in Unity*Intel® Open Image Denoise in Unity*
Intel® Open Image Denoise in Unity*
 
Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018
 
MeeGo Overview DeveloperDay Munich
MeeGo Overview DeveloperDay MunichMeeGo Overview DeveloperDay Munich
MeeGo Overview DeveloperDay Munich
 
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
 
How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%
 
Droidcon2013 x86phones weggerle_taubert_intel
Droidcon2013 x86phones weggerle_taubert_intelDroidcon2013 x86phones weggerle_taubert_intel
Droidcon2013 x86phones weggerle_taubert_intel
 
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura IntelTDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
 
Intel Public Roadmap for Desktop, Mobile, Data Center
Intel Public Roadmap for Desktop, Mobile, Data CenterIntel Public Roadmap for Desktop, Mobile, Data Center
Intel Public Roadmap for Desktop, Mobile, Data Center
 
Accelerate Ceph performance via SPDK related techniques
Accelerate Ceph performance via SPDK related techniques Accelerate Ceph performance via SPDK related techniques
Accelerate Ceph performance via SPDK related techniques
 
Make your unity game faster, faster
Make your unity game faster, fasterMake your unity game faster, faster
Make your unity game faster, faster
 

Más de Intel Software Brasil

Desafios do Desenvolvimento Multiplataforma
Desafios do Desenvolvimento MultiplataformaDesafios do Desenvolvimento Multiplataforma
Desafios do Desenvolvimento MultiplataformaIntel Software Brasil
 
Yocto no 1 IoT Day da Telefonica/Vivo
Yocto no 1 IoT Day da Telefonica/VivoYocto no 1 IoT Day da Telefonica/Vivo
Yocto no 1 IoT Day da Telefonica/VivoIntel Software Brasil
 
Otávio Salvador - Yocto project reduzindo -time to market- do seu próximo pr...
Otávio Salvador - Yocto project  reduzindo -time to market- do seu próximo pr...Otávio Salvador - Yocto project  reduzindo -time to market- do seu próximo pr...
Otávio Salvador - Yocto project reduzindo -time to market- do seu próximo pr...Intel Software Brasil
 
O uso de tecnologias Intel na implantação de sistemas de alto desempenho
O uso de tecnologias Intel na implantação de sistemas de alto desempenhoO uso de tecnologias Intel na implantação de sistemas de alto desempenho
O uso de tecnologias Intel na implantação de sistemas de alto desempenhoIntel Software Brasil
 
Escreva sua App Android sem gastar energia - Intel Sw Day
Escreva sua App Android sem gastar energia - Intel Sw DayEscreva sua App Android sem gastar energia - Intel Sw Day
Escreva sua App Android sem gastar energia - Intel Sw DayIntel Software Brasil
 
Using multitouch and sensors in Java
Using multitouch and sensors in JavaUsing multitouch and sensors in Java
Using multitouch and sensors in JavaIntel Software Brasil
 
Entenda de onde vem toda a potência do Intel® Xeon Phi™
Entenda de onde vem toda a potência do Intel® Xeon Phi™ Entenda de onde vem toda a potência do Intel® Xeon Phi™
Entenda de onde vem toda a potência do Intel® Xeon Phi™ Intel Software Brasil
 
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...Intel Software Brasil
 
Livros eletrônicos interativos com html5 e e pub3
Livros eletrônicos interativos com html5 e e pub3Livros eletrônicos interativos com html5 e e pub3
Livros eletrônicos interativos com html5 e e pub3Intel Software Brasil
 
Intel XDK New - Intel Software Day 2013
Intel XDK New - Intel Software Day 2013Intel XDK New - Intel Software Day 2013
Intel XDK New - Intel Software Day 2013Intel Software Brasil
 

Más de Intel Software Brasil (16)

Desafios do Desenvolvimento Multiplataforma
Desafios do Desenvolvimento MultiplataformaDesafios do Desenvolvimento Multiplataforma
Desafios do Desenvolvimento Multiplataforma
 
Yocto no 1 IoT Day da Telefonica/Vivo
Yocto no 1 IoT Day da Telefonica/VivoYocto no 1 IoT Day da Telefonica/Vivo
Yocto no 1 IoT Day da Telefonica/Vivo
 
IoT TDC Floripa 2014
IoT TDC Floripa 2014IoT TDC Floripa 2014
IoT TDC Floripa 2014
 
Otávio Salvador - Yocto project reduzindo -time to market- do seu próximo pr...
Otávio Salvador - Yocto project  reduzindo -time to market- do seu próximo pr...Otávio Salvador - Yocto project  reduzindo -time to market- do seu próximo pr...
Otávio Salvador - Yocto project reduzindo -time to market- do seu próximo pr...
 
Html5 tdc floripa_2014
Html5 tdc floripa_2014Html5 tdc floripa_2014
Html5 tdc floripa_2014
 
O uso de tecnologias Intel na implantação de sistemas de alto desempenho
O uso de tecnologias Intel na implantação de sistemas de alto desempenhoO uso de tecnologias Intel na implantação de sistemas de alto desempenho
O uso de tecnologias Intel na implantação de sistemas de alto desempenho
 
Escreva sua App Android sem gastar energia - Intel Sw Day
Escreva sua App Android sem gastar energia - Intel Sw DayEscreva sua App Android sem gastar energia - Intel Sw Day
Escreva sua App Android sem gastar energia - Intel Sw Day
 
Using multitouch and sensors in Java
Using multitouch and sensors in JavaUsing multitouch and sensors in Java
Using multitouch and sensors in Java
 
Entenda de onde vem toda a potência do Intel® Xeon Phi™
Entenda de onde vem toda a potência do Intel® Xeon Phi™ Entenda de onde vem toda a potência do Intel® Xeon Phi™
Entenda de onde vem toda a potência do Intel® Xeon Phi™
 
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
 
Livros eletrônicos interativos com html5 e e pub3
Livros eletrônicos interativos com html5 e e pub3Livros eletrônicos interativos com html5 e e pub3
Livros eletrônicos interativos com html5 e e pub3
 
Intel XDK New - Intel Software Day 2013
Intel XDK New - Intel Software Day 2013Intel XDK New - Intel Software Day 2013
Intel XDK New - Intel Software Day 2013
 
Hackeando a Sala de Aula
Hackeando a Sala de AulaHackeando a Sala de Aula
Hackeando a Sala de Aula
 
Android Native Apps Hands On
Android Native Apps Hands OnAndroid Native Apps Hands On
Android Native Apps Hands On
 
Android Fat Binaries
Android Fat BinariesAndroid Fat Binaries
Android Fat Binaries
 
Android Native Apps Development
Android Native Apps DevelopmentAndroid Native Apps Development
Android Native Apps Development
 

Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

  • 1. Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE Leo Borges Intel Software Conference 2014 Brazil May 2014
  • 2. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Legal Disclaimer & Optimization NoticeLegal Disclaimer & Optimization NoticeLegal Disclaimer & Optimization NoticeLegal Disclaimer & Optimization Notice Copyright©Copyright©Copyright©Copyright© 2012,2012,2012,2012, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. 2
  • 3. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Agenda • Intel® VTune Amplifier XE Intro • Microarchitecture Review • The Top-Down Characterization details • Intel® VTune™ Amplifier XE Implementation • Demo **Sources for current presentation: http://software.intel.com/en-us/articles/advanced-profiling-with-intel- vtune-amplifier-xe-part-1-find-the-bottleneck 3
  • 4. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Two Ways to Collect Data - Intel® VTune™ Amplifier XE 4 Software CollectorSoftware CollectorSoftware CollectorSoftware Collector Hotspots, Concurrency, Locks & Waits Hardware CollectorHardware CollectorHardware CollectorHardware Collector Lightweight Hotspots, Advanced Analysis Uses OS interrupts Uses the on chip Performance Monitoring Unit (PMU) Collects from a single process tree Collect system wide or from a single process tree. ~10ms default resolution ~1ms default resolution (finer granularity - finds small functions) Collect on both Intel® and compatible processors Requires a genuine Intel® processor for collection Call stacks show calling sequence New! Optionally collect call stacks Works in virtual environments Works in virtual environments only when supported by the VM (e.g., vSphere* 5.1) No driver required Requires a driver No special recompilesNo special recompilesNo special recompilesNo special recompiles ---- C, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, Assembly
  • 5. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Two Ways to Collect Data - Intel® VTune™ Amplifier XE 5 Software CollectorSoftware CollectorSoftware CollectorSoftware Collector Hotspots, Concurrency, Locks & Waits Hardware CollectorHardware CollectorHardware CollectorHardware Collector Lightweight Hotspots, Advanced Analysis Uses OS interrupts Uses the on chip Performance Monitoring Unit (PMU) Collects from a single process tree Collect system wide or from a single process tree. ~10ms default resolution ~1ms default resolution (finer granularity - finds small functions) Collect on both Intel® and compatible processors Requires a genuine Intel® processor for collection Call stacks show calling sequence New! Optionally collect call stacks Works in virtual environments Works in virtual environments only when supported by the VM (e.g., vSphere* 5.1) No driver required Requires a driver No special recompilesNo special recompilesNo special recompilesNo special recompiles ---- C, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, Assembly
  • 6. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Microarchitecture basics 6 FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute RetireRetireRetireRetire • Classic 4-stage pipeline depicted here. • Memory not shown. • Pipeline on current processors capable of speculative and out of order execution.
  • 7. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Intuitive approach to EBS • Use a small list of metrics to monitor level of optimization • Example 1: Cycles per instruction (CPI) • Example 2: Instruction retirement ratio m instructions issued n retired Retirement ratio = n/m % executed but not retired = (1 – n/m)*100 7 Intel Confidential 5/30/20 14
  • 8. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Microarchitecture Review 8 FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit The traditional 5-stage pipeline. Pipeline on current processors capable of out of order execution.
  • 9. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Microarchitecture Review 9 FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit The traditional 5-stage pipeline. Pipeline on current processors capable of out of order execution.
  • 10. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Intel® Software Conference 2014Microarchitecture Review 10 FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit FrontFrontFrontFront----EndEndEndEnd The front-end fetches instructions IN ORDER, decodes them into u-ops(micro-operations), and sends the u-ops to the back-end.
  • 11. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Microarchitecture Review 11 FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd The back-end receives u-ops, executes them OUT OF ORDER, accesses memory as needed, and commits results to memory IN ORDER.
  • 12. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Microarchitecture Review 12 FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd AllocationAllocationAllocationAllocation Allocation is the point where u-ops transfer from the front-end to the back-end. The front-end can allocate 4 u-ops per cycle.
  • 13. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Microarchitecture Review 13 FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd AllocationAllocationAllocationAllocation RetirementRetirementRetirementRetirement Retirement is the point where u-ops leave the back-end. The back-end can retire 4 u-ops per cycle.
  • 14. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. And a New Term: the Pipeline Slot 14 FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd 4 Potential4 Potential4 Potential4 Potential AllocationsAllocationsAllocationsAllocations per Cycleper Cycleper Cycleper Cycle 4 Potential4 Potential4 Potential4 Potential RetirementsRetirementsRetirementsRetirements per Cycleper Cycleper Cycleper Cycle In reality, there are many queues, buffers, and pieces of logic throughout the pipeline to allow up to 4 allocations and 4 retirements per cycle.
  • 15. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. And a New Term: the Pipeline Slot 15 FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd 4 Potential4 Potential4 Potential4 Potential AllocationsAllocationsAllocationsAllocations per Cycleper Cycleper Cycleper Cycle 4 Potential4 Potential4 Potential4 Potential RetirementsRetirementsRetirementsRetirements per Cycleper Cycleper Cycleper Cycle The “Pipeline Slot” is an abstraction representing all the resources needed to move one u-op through the pipeline.
  • 16. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. ExecuteExecuteExecuteExecute And a New Term: the Pipeline Slot 16 FetchFetchFetchFetch DecodeDecodeDecodeDecode MemoryMemoryMemoryMemory CommitCommitCommitCommit FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd There are 4 Pipeline Slots available every cycle. S1 S2 S3 S4
  • 17. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. And a New Term: the Pipeline Slot 17 FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd Pipeline slots are filled with u-ops that travel from allocation to retirement over multiple cycles. S1 S2 S3 S4 S1 S2 S3 S4 S1 S2 S3 S4 S1 S2 S3 S4 S1 S2 S3 S4
  • 18. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Cycles Per Instruction (CPI), a standard measure, has some special kinks For multi-core processors, CPI can get as low as 0.25 cycles per instructions with current Intel processors. Normally, something below CPI < ~1.0 is targeted for better performances. Some would suggest CPI must be targeted around ~0.75 to 0.50. But is this correct to any architecture? 18
  • 19. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Cycles Per Instruction (CPI), a standard measure, has some special kinks • Threads on each Intel® Xeon™ Phi core share a clock If all 4 HW threads are active, each gets ¼ total cycles • Multi-stage instruction decode requires two threads to utilize the whole core – one thread only gets half • With two ops/per cycle (U-V-pipe dual issue): • To get thread CPI, multiply by the active threads 19 Threads perThreads perThreads perThreads per CoreCoreCoreCore BestBestBestBest CPICPICPICPI perperperper CoreCoreCoreCore 1111 1.0 2222 0.5 3333 0.5 4444 0.5 Threads perThreads perThreads perThreads per CoreCoreCoreCore BestBestBestBest CPICPICPICPI perperperper CoreCoreCoreCore Best CPIBest CPIBest CPIBest CPI per Threadper Threadper Threadper Thread 1 x1 x1 x1 x 1.0 = 1.0 2 x2 x2 x2 x 0.5 = 1.0 3 x3 x3 x3 x 0.5 = 1.5 4 x4 x4 x4 x 0.5 = 2.0
  • 20. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. The Top-Down Characterization What is it? The Top-Down Characterization is: • A new way to organize and use processor events to identify the real hardware bottlenecks in systems/applications • Based on PMU events specifically designed for this task • Integrated into Intel® VTune Amplifier XE for Core • Available on Intel® Microarchitecture code named Sandy Bridge and newer 20
  • 21. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. The Top-Down Characterization Each pipeline slot on each cycle is classified into 1 of 4 categories. For each slot on each cycle: 21
  • 22. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. The Top-Down Characterization 22 • Sum to 1.0 • Unit is “Percentage of total Pipeline Slots” • This is the core of the new Top-Down characterization • Each category is further broken down depending on available events
  • 23. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. 23 Back-EndFront-End Latency Bandwith Memory Bound Memory Bound Core Bound Core Bound L1 DRAM Remote DRAM Local ou Remote L2 L3 DIV Active DIV Active Port Utilization Port Utilization 0 .. 3 ports Store Bound Store Bound ITLBITLB Overhead ICacheICache Misses DSB Switches Branch Resteers Retiring Bad Speculation Branch Mispredict Branch Mispredict Machine Clears Machine Clears General Microcode Sequencer Microcode Sequencer DSBMITE Issues breakdown
  • 24. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Examples of Metrics (Xeon™ Phi) 24
  • 25. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Problem Area: L1 Cache Usage • Significantly affects data access latency and therefore application performance • Tuning Suggestions: Software prefetching Tile/block data access for cache size Use streaming stores If using 4K access stride, may be experiencing conflict misses Examine Compiler prefetching (Compiler-generated L1 prefetches should not miss) 25 MetricMetricMetricMetric FormulaFormulaFormulaFormula InvestigateInvestigateInvestigateInvestigate ifififif L1 Misses DATA_READ_MISS_OR_WRITE_MISS + L1_DATA_HIT_INFLIGHT_PF1 L1 Hit Rate (DATA_READ_OR_WRITE – L1 Misses) / DATA_READ_OR_WRITE < 95%
  • 26. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Problem Area: Data Access Latency • Significantly affects application performance • Tuning Suggestions: Software prefetching Tile/block data access for cache size Use streaming stores Check cache locality – turn off prefetching and use CACHE_FILL events - reduce sharing if needed/possible If using 64K access stride, may be experiencing conflict misses 26 MetricMetricMetricMetric FormulaFormulaFormulaFormula InvestigateInvestigateInvestigateInvestigate ifififif Estimated Latency Impact (CPU_CLK_UNHALTED – EXEC_STAGE_CYCLES – DATA_READ_OR_WRITE) / DATA_READ_OR_WRITE_MISS >145
  • 27. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Problem Area: TLB Usage • Also affects data access latency and therefore application performance • Tuning Suggestions: Improve cache usage & data access latency If L1 TLB miss/L2 TLB miss is high, try using large pages For loops with multiple streams, try splitting into multiple loops If data access stride is a large power of 2, consider padding between arrays by one 4 KB page 27 MetricMetricMetricMetric FormulaFormulaFormulaFormula InvestInvestInvestInvest---- igateigateigateigate ifififif L1 TLB miss ratio DATA_PAGE_WALK/DATA_READ_OR_WRITE > 1% L2 TLB miss ratio LONG_DATA_PAGE_WALK / DATA_READ_OR_WRITE > .1% L1 TLB misses per L2 TLB miss DATA_PAGE_WALK / LONG_DATA_PAGE_WALK > 100x
  • 28. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Problem Area: VPU Usage • Indicates whether an application is vectorized successfully and efficiently • Tuning Suggestions: Use the Compiler vectorization report! For data dependencies preventing vectorization, try using Intel® Cilk™ Plus #pragma SIMD (if safe!) Align data and tell the Compiler! Re-structure code if possible: Array notations, AOS->SOA 28 MetricMetricMetricMetric FormulaFormulaFormulaFormula InvestigateInvestigateInvestigateInvestigate ifififif Vectorization Intensity VPU_ELEMENTS_ACTIVE / VPU_INSTRUCTIONS_EXECUTED <8 (DP), <16(SP)
  • 29. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Problem Area: Memory Bandwidth • Can increase data latency in the system or become a performance bottleneck • Tuning Suggestions: Improve locality in caches Use streaming stores Improve software prefetching 29 MetricMetricMetricMetric FormulaFormulaFormulaFormula InvestigateInvestigateInvestigateInvestigate ifififif Memory Bandwidth (UNC_F_CH0_NORMAL_READ + UNC_F_CH0_NORMAL_WRITE+ UNC_F_CH1_NORMAL_READ + UNC_F_CH1_NORMAL_WRITE) * 64/time < 80GB/sec (practical peak 140GB/sec) (with 8 memory controllers)
  • 30. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE 30
  • 31. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. DEMO 31
  • 32. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Running the General Exploration Collector 32 2. Select “General Exploration” for your CPU architecture 3. Click “Start” to begin profiling 1. Click “New Analysis” button
  • 33. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. General Exploration Summary 33
  • 34. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance 34
  • 35. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance 35 Instructions Navigator New Open PropertiesInstructions Navigator New Open PropertiesInstructions Navigator New Open PropertiesInstructions Navigator New Open Properties New Open CompareNew Open CompareNew Open CompareNew Open Compare ProjectProjectProjectProject ResultResultResultResult ToolbarToolbarToolbarToolbar
  • 36. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance 36 ProjectProjectProjectProject NavigatorNavigatorNavigatorNavigator
  • 37. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance 37 Result DisplayResult DisplayResult DisplayResult Display TabsTabsTabsTabs
  • 38. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance 38 Result AnalysisResult AnalysisResult AnalysisResult Analysis TypeTypeTypeType
  • 39. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance 39 Result ViewpointResult ViewpointResult ViewpointResult Viewpoint
  • 40. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance 40 ViewpointViewpointViewpointViewpoint AlternatesAlternatesAlternatesAlternates
  • 41. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance 41 ResultResultResultResult ComponentsComponentsComponentsComponents
  • 42. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance 42 GridGridGridGrid PanePanePanePane
  • 43. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance 43 GridGridGridGrid PanePanePanePane Grouping pullGrouping pullGrouping pullGrouping pull----downdowndowndown
  • 44. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance 44 StackStackStackStack PanePanePanePane
  • 45. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance 45 TimelineTimelineTimelineTimeline
  • 46. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance 46 Filter/OptionsFilter/OptionsFilter/OptionsFilter/Options BarBarBarBar
  • 47. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance Intel Confidential47 5/30/20 14 Source View /Source View /Source View /Source View / Per line localizationPer line localizationPer line localizationPer line localization
  • 48. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance Intel Confidential48 5/30/20 14 Source View /Source View /Source View /Source View / View / Hot spotView / Hot spotView / Hot spotView / Hot spot Navigation controlsNavigation controlsNavigation controlsNavigation controls
  • 49. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance Intel Confidential49 5/30/20 14 Assembly View /Assembly View /Assembly View /Assembly View / View / Hot spotView / Hot spotView / Hot spotView / Hot spot Navigation controlsNavigation controlsNavigation controlsNavigation controls
  • 50. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. VTune™ Amplifier XE visualizes performance Intel Confidential50 5/30/20 14 Assembly View /Assembly View /Assembly View /Assembly View / AssemblyAssemblyAssemblyAssembly groupingsgroupingsgroupingsgroupings
  • 51. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Intel® Software Conference 2014 For event collection the coprocessor is treated as a special HW architecture 51
  • 52. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Intel® Software Conference 2014 Project properties provides the means to invoke data collection by target type 52
  • 53. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Intel® Software Conference 2014 Launch Application serves many uses, from host/offload to native execution 53
  • 54. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Intel® Software Conference 2014Search directories have been reorganized to speed symbol resolution during finalization 54 Notable coprocessor library paths:Notable coprocessor library paths:Notable coprocessor library paths:Notable coprocessor library paths: /opt/mpss/3.2/sysroots/k1om-mpss-Linux/boot /opt/mpss/3.2/sysroots/k1om-mpss-Linux/lib64 /opt/intel/composerxe/lib/mic /opt/intel/composerxe/tbb/lib/mic /opt/intel/composerxe/mkl/lib/mic /opt/intel/mpi-rt/4.1.3/mic
  • 55. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Intel® Software Conference 2014 General Exploration runs a set of events to drive top-down analysis 55
  • 56. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. For more information on Intel® Xeon Phi™ and VTune™ Amplifier XE 56 Optimization on the coprocessor: http://software.intel.com/en- us/articles/optimization-and-performance-tuning-for-intel-xeon-phi- coprocessors-part-1-optimization http://software.intel.com/en-us/articles/optimization-and- performance-tuning-for-intel-xeon-phi-coprocessors-part-2- understanding Coprocessor Performance Monitoring Unit: http://software.intel.com/sites/default/files/forum/278102/intelr- xeon-phitm-pmu-rev1.01.pdf For general information: http://software.intel.com/mic-developer
  • 57. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Grid is Based on Top-Down 57
  • 58. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Use the Hover Text to Understand Metrics* *Suggestions welcome: Submit issues if the text isn’t helpful 58
  • 59. Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Event collections on the coprocessor can generate volumes of data dgemm: on 60+ cores Tip: Use cpu-mask to reduce data set, while maintaining the same accuracy. 59
  • 60. Copyright©Copyright©Copyright©Copyright© 2013,2013,2013,2013, Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved.Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners. Resources Top-Down Characterization White Paper http://software.intel.com/en-us/articles/how-to-tune-applications- using-a-top-down-characterization-of-microarchitectural-issues Tuning Guides http://software.intel.com/en-us/articles/processor-specific- performance-analysis-papers 60