SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
GameDevelopers
CONFERENCE
Antoine Cohade
Sr. Dev. Rel. Engineer
@acohade1
Michael Apodaca
Principal Engineer, 3D SW architect
@gatorfax
Special thanks to contributors & reviewers: A. Lake, S. Junkins, S. McCalla, T. Schluessler
@IntelSoftware @IntelGraphics
Legal Notices and Disclaimers
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as
well as any warranty arising from course of performance, course of dealing, or usage in trade.
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel
a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.
The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are
available on request.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system
configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com].
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark,
are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult
other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other
products. For more complete information visit www.intel.com/benchmarks.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors.
These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any
optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain
optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information
regarding the specific instruction sets covered by this notice.
Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your
system hardware, software or configuration may affect your actual performance.
Intel, Core and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others
© Intel Corporation.
2
@IntelSoftware @IntelGraphics
Agenda
• Graphics Pipeline Basics
• for_each( hardwareBlock in Gen11 {
describe(hardwareBlock);
give(performanceTip**);
});
• Tiled-Based Rendering
• Coarse Pixel Shading
3
**Most Performance tips also apply to Gen9
@IntelSoftware @IntelGraphics
@IntelSoftware @IntelGraphics 4
Graphics Pipeline 101
3D Pipeline
Compute Pipeline
Command
Streamer
Geometry Raster/ZDraw() Compute
Memory
Pixel
Backend
Dispatch() Command
Streamer
Compute
Memory
@IntelSoftware @IntelGraphics 5
How does that match to Gen?
55
L3/Tile$/URB
Bank
Bank
Bank
Bank
Bank
Bank
Bank
Bank
Slice Common
Pixel Backend / Color$
Rasterizer
Z / Depth$
Unslice
GAGEOM/FF POSH
GTI CS GuC BLIT
Media FF
VD
VD VE
@IntelSoftware @IntelGraphics 6
How does that match to Gen?
Un-Slice (1,2)
1. Global Assets: Command Streamer, Ring
interface, pwr mgmt, Blitter, Thread Dispatch
2. Geometry Fixed Functions: 3D HW pipeline
(geometry)
And also:
Media FF: Low power HW encode, decode,
& video processing
Slice (3,4)
3. Slice-Common/L3: Setup, Rasterizer, Z,
L3 Cache, and Pixel processing
4. Sub-Slices: Array of Execution Units, Inst
Caches (IC$), Texture Samplers w/ Caches,
Load/Store Units
66
L3/Tile$/URB
Bank
Bank
Bank
Bank
Bank
Bank
Bank
Bank
Slice Common
Pixel Backend / Color$
Rasterizer
Z / Depth$
Unslice
GAGEOM/FF POSH
GTI CS GuC BLIT
Media FF
VD
VD VE
Pixel
backend
Raster/Z
2. Geometry 1. Command streamer
Memory
3.
Compute
4.
Compute
4.
@IntelSoftware @IntelGraphics 7
Command Streamer
§ Reads and dispatches command buffer
• 3D commands to Geometry/Fixed Functions
• Compute commands to Thread Dispatcher
§ Builds indirect command buffers if needed
Performance tips:
§ Avoid small empty drawIndirects – use ExecuteIndirect
or MultiDrawIndirect Extensions!
§ Separate and regroup 3D and Compute workloads
together to get the most of the hardware
Unslice
GAGEOM/FF
CS
Geometry/Fixed Functions
§ Vertex fetch/transform/write in order into Memory
§ 1.5x over Gen9 (from 4 to 6 attributes (float4) /clk)
Performance tips:
§ Pack attributes into float4 when possible
§ Hold your Prim. count on a leash
• CPU occlusion and size culling
• Level Of Details (LODs)
• Frustum culling
• Backface culling
Unslice architecture
7
@IntelSoftware @IntelGraphics 8
Slice Common Architecture: Rasterizer
§ Handles Clipping, Viewport Scaling, orientation/thin
triangles culling, and of course rasterization!
§ Hierarchical Rasterization up to 256 pix/clk (16 * 4x4)
• Determines if each 4x4(span) is fully covered,
partially, or void
• If partially, fine grain raster at 4x4 px/clk
Performance tip: Beware of small triangles - use LODs
and impostors:
Slice Common
Rasterizer
88
15 px in 1 clk J 3 px in 3 clk L
@IntelSoftware @IntelGraphics 9
Slice Common Architecture: 2 Z/Depth$ Pipes
Handles Hierarchical-Z …
§ 2 modes: PlaneZ or Min/Max
§ up to 2 x 64 px/clk
§ 2 x 12kB cache
Early-Z test can happen under 1 major condition:
• We don’t need the Pixel Shader to know the final depth
Performance tip: Avoid when possible:
§ Discard in Pixel Shader (PS) -> test can happen before but late-Z write
§ Writing depth manually in the PS -> both test and write happen late
9
Slice Common
Z / Depth$
… and Intermediate-Z (per px)
§ up to 2 x 16 px/clk
§ 2 x 32kB cache
@IntelSoftware @IntelGraphics
Gen Basic Building Block: Execution Unit (EU)
10
Execution Unit (EU)
ALU 0
ALU 1
Branch
grf. …
... ...
... ...
... ...
... ...
… ...
General register file(grf) arf.
Send
Thread Control
Execution Unit (EU)
ALU 0
ALU 1
Branch
GRFGRF
......
......
......
......
......
General Register File (GRF)
Send
Thread Control
@IntelSoftware @IntelGraphics 11
Execution Unit (EU)
ALU 0
ALU 1
Natively 2 x SIMD4
§ Each ALU supports fp. & int math
§ 8 x 32-bit or 16 x 16-bit ops/cycle
§ ALU1 capable of high throughput
transcendental Math
§ Min FPU instruction latency is
2 clocks
§ FPUs are fully pipelined across
threads
§ instructions complete every cycle
Gen Basic Building Block: Execution Unit (EU)
Full rate: MUL, ADD,
MAD, MIN/MAX, CMP
1/4 rate: SQRT/RSQRT,
LOG/EXP, TRIG
1/8 rate: POW, DIV
@IntelSoftware @IntelGraphics 12
Execution Unit (EU)
GRFGRF
......
......
......
......
......
General Register File (GRF)General Register File (GRF)
12
Gen Basic Building Block: Execution Unit (EU)
7 Threads
§ Cover latency for Load, LDS, etc.
§ Each can run a diff. shader
28kB register space
§ 7 x 4 kB
§ 128 registers (grf.)/ thread
§ grf. are 8 x 32-bit (32B)
@IntelSoftware @IntelGraphics 13
Execution Unit (EU)
Thread Control
......
......
......
......
......
13
Gen Basic Building Block: Execution Unit (EU)
Thread Control
§ Thread Arbiter picks
instructions to run from
runnable thread(s)
§ Dispatches instructions to
functional units
§ co-issues up to 4 instructions
per cycle
@IntelSoftware @IntelGraphics
EU SIMD Explained
Physically SIMD4x2 but logically support 1/2/4/8/16
or 32-wide instructions
§ > SIMD8 instructions pair adjacent GRF.
• SIMD16 would pair 2 physical GRF. to a single
logical 64B grf.
§ Compiler makes the decision: for 3D workload using
either 8 or 16-wide
1414
Intel GPA has metrics to help and
we now have a standalone Shader
Compiler !!
Performance tip: Reduce register pressure:
§ High SIMD/reduced spills
§ Better codegen / scheduling
GRF Usage: <64 64-128 >128
Instruction used
SIMD16
J
SIMD8
K
SIMD8 with Spills
L
@IntelSoftware @IntelGraphics
How To Reduce Register Pressure
Don’t:
§ Branch on constant buffer conditions
§ Non uniform access to buffer data
§ Excessive variable decl. (esp. arrays)
Do’s:
§ Use Partial precision (min16float)
§ Move common code outside branches
1515
A lot more details in the Gen Graphics performance guide:
https://software.intel.com/en-us/documentation/graphics-api-performance-guide-for-intel-processor-graphics-gen9
GRF Usage: <64 64-128 >128
Instruction used
SIMD16
J
SIMD8
K
SIMD8 with Spills
L
@IntelSoftware @IntelGraphics
Sampler
Subslice: An Array Of 8 EUs With a Sampler
Sampler takes request from the EUs, fetches texels from
memory, and returns them to the EUs:
§ Handles Texture Decompression
§ 4tx/clk bilinear and point < 64b and aniso. 2x 32b
• >64 formats and aniso. 4x runs at ½ speed
§ Multiple levels of cache
Sampler performance tips:
§ Increase spatial locality
§ Reduce texture state changes
§ Use simple filtering modes (aniso. 16x is 1/16th bilinear
rate!!)
16
@IntelSoftware @IntelGraphics
Subslice: An Array Of 8 EUs
CS thread groups are guaranteed running on a single subslice:
§ Gen hardware is able to run 56 hardware threads/subslice
§ thread group size == 1024 are an issue
If we run in SIMD16, we’d have 1024/16 = 64 hardware threads,
which is bigger than the 56 available. To compensate, the shader
compiler has to use SIMD32.
§ This causes 2 issues:
• Occupancy is low (32 HW threads/56 available = 57%)
• Potential spill fills, as this doubles the register count required.
Performance tip: 8x8 thread groups is great on all hardware!
17
@IntelSoftware @IntelGraphics
DataPort
SLM
2 Subslices grouped together with…
Shared Local Memory (SLM)
§ 128kB
§ 128B/clk
§ ~1/4 latency vs. Gen9
Performance tips:
§ Using too much SLM can hurt occupancy
§ Regroup memory access to ensure full cache line access (64B)
18
Data Port (LD/ST)
§ Handles communication with L3: 64B/clk
§ UAV read/writes
Performance tip: Only create UAVs when necessary
@IntelSoftware @IntelGraphics
Slice Common Architecture: Pixel Backend
Slice Common
Pixel Backend / Color$
Handles color writes and blending to render target
§ Writes and blend at 16 px/clk
§ Has its own 2 x 32k color cache
2:1 Lossless compression
§ Saves bandwidth when writing RT data
§ Saves bandwidth when display engine reads displayable RT data
§ Saves bandwidth when texture sampler reads intermediate RT data
§ MSAA textures are not compressed
19
@IntelSoftware @IntelGraphics 20
Summary – 8 Subslices = 64 EUs
20
@IntelSoftware @IntelGraphics 21
Tile-Based Deferred Rendering (TBDR)
Problem statement: in many cases, bandwidth becomes the limiting factor for low-thermal
design point (TDP) platforms.
Solution(s):
§ Tile-based rendering is a well-known industry solution for low-power platforms; especially when discarding
intermediate data
§ Significant amount of bandwidth and execution cycles can be saved for discarded triangles; e.g. frustum or backface
culled
§ Enable opportunistically for scenarios the benefit
the most
@IntelSoftware @IntelGraphics
Position-Only Shading
Tiled-Based Rendering
(PTBR)
Position-Only Shading pipeline generates per-tile visibility
stream; using (driver-generated) position-only vertex shader
Render pipeline replays full vertex processing and per-tile
pixel processing only for visible primitives
Object Visibility Recorder (OVR) ensures consistency between
pipelines
Dynamic enabling of PTBR pipeline allows driver to choose
based on optimal conditions; e.g., bandwidth-limited render
passes
22
Vertex
Processing
Clip/Cull
OVR
Primitive
Assembly
Visibility
Buffer(s)
Visibility
Buffer(s)
Visibility
Buffer(s)
Visibility
Buffer(s)
Visibility
Buffer(s)
Visibility
Buffer(s)
Vertex
Processing
Tessellation
Geometry
Processing
Clip/Cull
Raster
Primitive
Assembly
Pixel
Processing
PerTile
Visibility
Stream
Per-Tile
@IntelSoftware @IntelGraphics
PTBR Performance Guidelines
Use explicit API render passes and discard:
§ VkRenderPass/VkSubpass with
VK_ATTACHMENT_STORE_OP_DONT_CARE
§ (RS5) ID3D12GraphicsCommandList4::EndRenderPass with
D3D12_RENDER_PASS_ENDING_ACCESS_TYPE_DISCARD
During high-bandwidth render passes (e.g., MSAA, blending) avoid:
§ Tessellation, Geometry Shading and/or Stream Output,
and Compute Shading
§ Use per-attribute vertex buffers - position only
GPA: per-Draw performance analysis will disable PTBR
23
@IntelSoftware @IntelGraphics
Coarse Pixel Shading (CPS): Motivation
§ Pixel density increasing
§ Performance not scaling at same density as pixel resolution
§ Need mechanisms to hit high frame rate and not waste pixel throughput
§ Software solutions:
• Dynamic Resolution Rendering
• Checkerboard rendering
• …
§ Can also explore hardware solutions…
Intel has highly optimized
open source implementations
of these available
Topic of these slides
24
@IntelSoftware @IntelGraphics
Coarse Pixel Shading Overview
Set Pixel Shading Rate to be:
§ Floating point value between 1 and 4 in X and Y dimension
§ Rounded to integer value to determine actual shading rate
HW support for 2 modes of operation:
§ Foveated:
• Specify a CPS rate for inner and outer elliptical region
• Specify an inner and outer ellipse, smooth transition between them
§ Draw call:
• Set a state for all future Draw() calls
• Set multiple times per frame
Will be exposed in D3D12 (aka Variable Rate Shading) as well as future
Vulkan extensions.
Per pixel visibility
Shading once per
X * Y pixel group
Inner elliptical region
Outer elliptical regionSmooth transition
25
@IntelSoftware @IntelGraphics 26
Coarse Pixel Shading (CPS)
• Proof of Concept integration into Unreal
Engine SunTemple demo
• CPS rate applied to all geometry when under
motion at 2x2 or 4x4 rates
• Early perf gains between ~20-40%
• Microsoft Variable Rate Shading Talk
happening now !
• Come see the demo at the Intel Booth!
@IntelSoftware @IntelGraphics
Conclusion
§ 64EUs * SIMD4x2 * 2ops/clk = 1024 ops/clk
• Up to 1 Tflops !!
• And you now have the keys to make the most of it
§ New performance features allowing to get even more from the hardware
• CPS
• PTBR
§ Biggest gaming announcement in 2019 – Raising the mainstream gaming bar
§ A lot more details in the Gen11 whitepaper : https://software.intel.com/en-
us/download/graphics-api-architecture-guide-for-intel-processor-graphics-gen11
27
Questions?
28
@IntelSoftware @IntelGraphics
Gen11 Memory Hierarchy
292929
@IntelSoftware @IntelGraphics 30
Backup
Key Peak Metrics Gen9 GT2 Gen11 GT2
Slice Attribute
# of Slices 1 1
# of Sub-Slices 3 8
# of Cores (Eus) 24 (3x8) 64 (8x8)
Single Precision FLOPs per ClkClock (MAD) 384 1024
Half Precision FLOPs per ClkClock (MAD) 768 2048
Int8 IOPs per ClkClock 768 1024
Register File Per Sub-Slice 224KB 224KB
# of Samplers 3 8
Point/Bilinear Texels/ClkClock (32bpt) 12 32
Point/Bilinear Texels/ClkClock (64bpt) 12 32
Trilinear Texels/ClkClock (32bpt) 6 16
Shared Local Memory Total 192KB(=3 x 64KB)* 512KB(=8 x 64KB)
Slice-Common Attributes
Pixels/ClkClock (RGBA8) wo. Alpha Blend 8 16
Pixels/ClkClock (RGBA8) w. Alpha Blend 8 16
HiZ Zixels/ClkClock 64 128
L3$ Cache 768 KB 3072 KB
Geometry Attributes
Vertex Attributes Per ClkClock 4 6
Primitive/ClkClock (backface Cull – strips) 1 1
Primitive/ClkClock (backface Cull – lists) 0.67 0.67
Global Attributes
GTI Bandwidth (Bytes/ClkClock) R: 64
W: 32
R: 64
W: 64
LLC Configuration 2-8MB TBD
DRAM Configuration 2x64 LPDDR3/DDR4 4x32 LPDDR4/DDR4
Peak DRAM Data Rate (Mbps) Up to 2Ch 2400 Up to 2Ch 3733
Max Peak DRAM BW (GBps) Up to 38.4 GBps Up to 59.7 GBps
*Note - Gen9 L3$ includes SLM, Data and Fixed Function Data

Más contenido relacionado

La actualidad más candente

3 additional dpdk_theory(1)
3 additional dpdk_theory(1)3 additional dpdk_theory(1)
3 additional dpdk_theory(1)videos
 
Fpga Device Selection
Fpga Device SelectionFpga Device Selection
Fpga Device SelectionVikram Singh
 
FPGA Camp - Intellitech Presentation
FPGA Camp - Intellitech PresentationFPGA Camp - Intellitech Presentation
FPGA Camp - Intellitech PresentationFPGA Central
 
4th gen intelcoreprocessor family
4th gen intelcoreprocessor family4th gen intelcoreprocessor family
4th gen intelcoreprocessor familyAhnku Toh
 
6 profiling tools
6 profiling tools6 profiling tools
6 profiling toolsvideos
 
Placas base evolucion[1]
Placas base evolucion[1]Placas base evolucion[1]
Placas base evolucion[1]zuzanitah
 
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...Andrey Kudryavtsev
 
Hp 4520s-calpella-sclassintel10282pvrtllan
Hp 4520s-calpella-sclassintel10282pvrtllanHp 4520s-calpella-sclassintel10282pvrtllan
Hp 4520s-calpella-sclassintel10282pvrtllanKhobaich Hicham
 
8 intel network builders overview
8 intel network builders overview8 intel network builders overview
8 intel network builders overviewvideos
 
Expanding The Micro Blaze System
Expanding  The Micro Blaze  SystemExpanding  The Micro Blaze  System
Expanding The Micro Blaze Systemiuui
 
Kernel Memory Protection by an Insertable Hypervisor which has VM Introspec...
Kernel Memory Protection by an Insertable Hypervisor which has VM Introspec...Kernel Memory Protection by an Insertable Hypervisor which has VM Introspec...
Kernel Memory Protection by an Insertable Hypervisor which has VM Introspec...Kuniyasu Suzaki
 
Management configuration
Management configurationManagement configuration
Management configurationmckan1974
 
53-rand-test-gen-validation-1530392
53-rand-test-gen-validation-153039253-rand-test-gen-validation-1530392
53-rand-test-gen-validation-1530392Joel Storm
 
”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016Kuniyasu Suzaki
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloadsinside-BigData.com
 
Denis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python PerformanceDenis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python PerformanceSergey Arkhipov
 

La actualidad más candente (20)

3 additional dpdk_theory(1)
3 additional dpdk_theory(1)3 additional dpdk_theory(1)
3 additional dpdk_theory(1)
 
P55 g 10
P55 g 10P55 g 10
P55 g 10
 
Fpga Device Selection
Fpga Device SelectionFpga Device Selection
Fpga Device Selection
 
FPGA Camp - Intellitech Presentation
FPGA Camp - Intellitech PresentationFPGA Camp - Intellitech Presentation
FPGA Camp - Intellitech Presentation
 
4th gen intelcoreprocessor family
4th gen intelcoreprocessor family4th gen intelcoreprocessor family
4th gen intelcoreprocessor family
 
6 profiling tools
6 profiling tools6 profiling tools
6 profiling tools
 
Placas base evolucion[1]
Placas base evolucion[1]Placas base evolucion[1]
Placas base evolucion[1]
 
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
 
Hp 4520s-calpella-sclassintel10282pvrtllan
Hp 4520s-calpella-sclassintel10282pvrtllanHp 4520s-calpella-sclassintel10282pvrtllan
Hp 4520s-calpella-sclassintel10282pvrtllan
 
8 intel network builders overview
8 intel network builders overview8 intel network builders overview
8 intel network builders overview
 
Asar resume
Asar resumeAsar resume
Asar resume
 
Expanding The Micro Blaze System
Expanding  The Micro Blaze  SystemExpanding  The Micro Blaze  System
Expanding The Micro Blaze System
 
Dx diag 21102014
Dx diag 21102014Dx diag 21102014
Dx diag 21102014
 
Kernel Memory Protection by an Insertable Hypervisor which has VM Introspec...
Kernel Memory Protection by an Insertable Hypervisor which has VM Introspec...Kernel Memory Protection by an Insertable Hypervisor which has VM Introspec...
Kernel Memory Protection by an Insertable Hypervisor which has VM Introspec...
 
Management configuration
Management configurationManagement configuration
Management configuration
 
53-rand-test-gen-validation-1530392
53-rand-test-gen-validation-153039253-rand-test-gen-validation-1530392
53-rand-test-gen-validation-1530392
 
”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloads
 
Denis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python PerformanceDenis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python Performance
 
report gusp
report guspreport gusp
report gusp
 

Similar a The Architecture of Intel Processor Graphics: Gen 11

Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks
 
Performance out of the box developers
Performance   out of the box developersPerformance   out of the box developers
Performance out of the box developersMichelle Holley
 
Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*Intel® Software
 
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive... Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...Databricks
 
Optimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core ArchitecturesOptimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core Architecturespsteinb
 
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning AcceleratorDeep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning Acceleratorinside-BigData.com
 
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...Databricks
 
Intel Core X-seires processors
Intel Core X-seires processorsIntel Core X-seires processors
Intel Core X-seires processorsLow Hong Chuan
 
Getting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® GraphicsGetting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® GraphicsIntel® Software
 
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...Intel® Software
 
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIntel® Software
 
More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upIntel® Software
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Software
 
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Intel® Software
 
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...Intel® Software
 
P4/FPGA, Packet Acceleration
P4/FPGA, Packet AccelerationP4/FPGA, Packet Acceleration
P4/FPGA, Packet AccelerationLiz Warner
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationIntel IT Center
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and futureboxu42
 
DUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSDUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSAndrey Kudryavtsev
 
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chipinside-BigData.com
 

Similar a The Architecture of Intel Processor Graphics: Gen 11 (20)

Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Performance out of the box developers
Performance   out of the box developersPerformance   out of the box developers
Performance out of the box developers
 
Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*
 
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive... Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 
Optimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core ArchitecturesOptimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core Architectures
 
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning AcceleratorDeep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
 
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
 
Intel Core X-seires processors
Intel Core X-seires processorsIntel Core X-seires processors
Intel Core X-seires processors
 
Getting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® GraphicsGetting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® Graphics
 
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
 
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for Intel
 
More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff up
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
 
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
 
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
 
P4/FPGA, Packet Acceleration
P4/FPGA, Packet AccelerationP4/FPGA, Packet Acceleration
P4/FPGA, Packet Acceleration
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel Station
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and future
 
DUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSDUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOS
 
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
 

Más de DESMOND YUEN

2022-AI-Index-Report_Master.pdf
2022-AI-Index-Report_Master.pdf2022-AI-Index-Report_Master.pdf
2022-AI-Index-Report_Master.pdfDESMOND YUEN
 
Small Is the New Big
Small Is the New BigSmall Is the New Big
Small Is the New BigDESMOND YUEN
 
Intel® Blockscale™ ASIC Product Brief
Intel® Blockscale™ ASIC Product BriefIntel® Blockscale™ ASIC Product Brief
Intel® Blockscale™ ASIC Product BriefDESMOND YUEN
 
Cryptography Processing with 3rd Gen Intel Xeon Scalable Processors
Cryptography Processing with 3rd Gen Intel Xeon Scalable ProcessorsCryptography Processing with 3rd Gen Intel Xeon Scalable Processors
Cryptography Processing with 3rd Gen Intel Xeon Scalable ProcessorsDESMOND YUEN
 
Intel 2021 Product Security Report
Intel 2021 Product Security ReportIntel 2021 Product Security Report
Intel 2021 Product Security ReportDESMOND YUEN
 
How can regulation keep up as transformation races ahead? 2022 Global regulat...
How can regulation keep up as transformation races ahead? 2022 Global regulat...How can regulation keep up as transformation races ahead? 2022 Global regulat...
How can regulation keep up as transformation races ahead? 2022 Global regulat...DESMOND YUEN
 
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, MoreNASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, MoreDESMOND YUEN
 
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...DESMOND YUEN
 
PUTTING PEOPLE FIRST: ITS IS SMART COMMUNITIES AND CITIES
PUTTING PEOPLE FIRST:  ITS IS SMART COMMUNITIES AND  CITIESPUTTING PEOPLE FIRST:  ITS IS SMART COMMUNITIES AND  CITIES
PUTTING PEOPLE FIRST: ITS IS SMART COMMUNITIES AND CITIESDESMOND YUEN
 
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPEBUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPEDESMOND YUEN
 
An Introduction to Semiconductors and Intel
An Introduction to Semiconductors and IntelAn Introduction to Semiconductors and Intel
An Introduction to Semiconductors and IntelDESMOND YUEN
 
Changing demographics and economic growth bloom
Changing demographics and economic growth bloomChanging demographics and economic growth bloom
Changing demographics and economic growth bloomDESMOND YUEN
 
Intel’s Impacts on the US Economy
Intel’s Impacts on the US EconomyIntel’s Impacts on the US Economy
Intel’s Impacts on the US EconomyDESMOND YUEN
 
2021 private networks infographics
2021 private networks infographics2021 private networks infographics
2021 private networks infographicsDESMOND YUEN
 
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...DESMOND YUEN
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI TodayDESMOND YUEN
 
Increasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksIncreasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksDESMOND YUEN
 
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...DESMOND YUEN
 
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm.""Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."DESMOND YUEN
 
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...DESMOND YUEN
 

Más de DESMOND YUEN (20)

2022-AI-Index-Report_Master.pdf
2022-AI-Index-Report_Master.pdf2022-AI-Index-Report_Master.pdf
2022-AI-Index-Report_Master.pdf
 
Small Is the New Big
Small Is the New BigSmall Is the New Big
Small Is the New Big
 
Intel® Blockscale™ ASIC Product Brief
Intel® Blockscale™ ASIC Product BriefIntel® Blockscale™ ASIC Product Brief
Intel® Blockscale™ ASIC Product Brief
 
Cryptography Processing with 3rd Gen Intel Xeon Scalable Processors
Cryptography Processing with 3rd Gen Intel Xeon Scalable ProcessorsCryptography Processing with 3rd Gen Intel Xeon Scalable Processors
Cryptography Processing with 3rd Gen Intel Xeon Scalable Processors
 
Intel 2021 Product Security Report
Intel 2021 Product Security ReportIntel 2021 Product Security Report
Intel 2021 Product Security Report
 
How can regulation keep up as transformation races ahead? 2022 Global regulat...
How can regulation keep up as transformation races ahead? 2022 Global regulat...How can regulation keep up as transformation races ahead? 2022 Global regulat...
How can regulation keep up as transformation races ahead? 2022 Global regulat...
 
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, MoreNASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
 
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
 
PUTTING PEOPLE FIRST: ITS IS SMART COMMUNITIES AND CITIES
PUTTING PEOPLE FIRST:  ITS IS SMART COMMUNITIES AND  CITIESPUTTING PEOPLE FIRST:  ITS IS SMART COMMUNITIES AND  CITIES
PUTTING PEOPLE FIRST: ITS IS SMART COMMUNITIES AND CITIES
 
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPEBUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
 
An Introduction to Semiconductors and Intel
An Introduction to Semiconductors and IntelAn Introduction to Semiconductors and Intel
An Introduction to Semiconductors and Intel
 
Changing demographics and economic growth bloom
Changing demographics and economic growth bloomChanging demographics and economic growth bloom
Changing demographics and economic growth bloom
 
Intel’s Impacts on the US Economy
Intel’s Impacts on the US EconomyIntel’s Impacts on the US Economy
Intel’s Impacts on the US Economy
 
2021 private networks infographics
2021 private networks infographics2021 private networks infographics
2021 private networks infographics
 
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI Today
 
Increasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksIncreasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery Networks
 
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
 
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm.""Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
 
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
 

Último

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Último (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

The Architecture of Intel Processor Graphics: Gen 11

  • 1. GameDevelopers CONFERENCE Antoine Cohade Sr. Dev. Rel. Engineer @acohade1 Michael Apodaca Principal Engineer, 3D SW architect @gatorfax Special thanks to contributors & reviewers: A. Lake, S. Junkins, S. McCalla, T. Schluessler
  • 2. @IntelSoftware @IntelGraphics Legal Notices and Disclaimers No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com]. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. Intel, Core and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others © Intel Corporation. 2
  • 3. @IntelSoftware @IntelGraphics Agenda • Graphics Pipeline Basics • for_each( hardwareBlock in Gen11 { describe(hardwareBlock); give(performanceTip**); }); • Tiled-Based Rendering • Coarse Pixel Shading 3 **Most Performance tips also apply to Gen9 @IntelSoftware @IntelGraphics
  • 4. @IntelSoftware @IntelGraphics 4 Graphics Pipeline 101 3D Pipeline Compute Pipeline Command Streamer Geometry Raster/ZDraw() Compute Memory Pixel Backend Dispatch() Command Streamer Compute Memory
  • 5. @IntelSoftware @IntelGraphics 5 How does that match to Gen? 55 L3/Tile$/URB Bank Bank Bank Bank Bank Bank Bank Bank Slice Common Pixel Backend / Color$ Rasterizer Z / Depth$ Unslice GAGEOM/FF POSH GTI CS GuC BLIT Media FF VD VD VE
  • 6. @IntelSoftware @IntelGraphics 6 How does that match to Gen? Un-Slice (1,2) 1. Global Assets: Command Streamer, Ring interface, pwr mgmt, Blitter, Thread Dispatch 2. Geometry Fixed Functions: 3D HW pipeline (geometry) And also: Media FF: Low power HW encode, decode, & video processing Slice (3,4) 3. Slice-Common/L3: Setup, Rasterizer, Z, L3 Cache, and Pixel processing 4. Sub-Slices: Array of Execution Units, Inst Caches (IC$), Texture Samplers w/ Caches, Load/Store Units 66 L3/Tile$/URB Bank Bank Bank Bank Bank Bank Bank Bank Slice Common Pixel Backend / Color$ Rasterizer Z / Depth$ Unslice GAGEOM/FF POSH GTI CS GuC BLIT Media FF VD VD VE Pixel backend Raster/Z 2. Geometry 1. Command streamer Memory 3. Compute 4. Compute 4.
  • 7. @IntelSoftware @IntelGraphics 7 Command Streamer § Reads and dispatches command buffer • 3D commands to Geometry/Fixed Functions • Compute commands to Thread Dispatcher § Builds indirect command buffers if needed Performance tips: § Avoid small empty drawIndirects – use ExecuteIndirect or MultiDrawIndirect Extensions! § Separate and regroup 3D and Compute workloads together to get the most of the hardware Unslice GAGEOM/FF CS Geometry/Fixed Functions § Vertex fetch/transform/write in order into Memory § 1.5x over Gen9 (from 4 to 6 attributes (float4) /clk) Performance tips: § Pack attributes into float4 when possible § Hold your Prim. count on a leash • CPU occlusion and size culling • Level Of Details (LODs) • Frustum culling • Backface culling Unslice architecture 7
  • 8. @IntelSoftware @IntelGraphics 8 Slice Common Architecture: Rasterizer § Handles Clipping, Viewport Scaling, orientation/thin triangles culling, and of course rasterization! § Hierarchical Rasterization up to 256 pix/clk (16 * 4x4) • Determines if each 4x4(span) is fully covered, partially, or void • If partially, fine grain raster at 4x4 px/clk Performance tip: Beware of small triangles - use LODs and impostors: Slice Common Rasterizer 88 15 px in 1 clk J 3 px in 3 clk L
  • 9. @IntelSoftware @IntelGraphics 9 Slice Common Architecture: 2 Z/Depth$ Pipes Handles Hierarchical-Z … § 2 modes: PlaneZ or Min/Max § up to 2 x 64 px/clk § 2 x 12kB cache Early-Z test can happen under 1 major condition: • We don’t need the Pixel Shader to know the final depth Performance tip: Avoid when possible: § Discard in Pixel Shader (PS) -> test can happen before but late-Z write § Writing depth manually in the PS -> both test and write happen late 9 Slice Common Z / Depth$ … and Intermediate-Z (per px) § up to 2 x 16 px/clk § 2 x 32kB cache
  • 10. @IntelSoftware @IntelGraphics Gen Basic Building Block: Execution Unit (EU) 10 Execution Unit (EU) ALU 0 ALU 1 Branch grf. … ... ... ... ... ... ... ... ... … ... General register file(grf) arf. Send Thread Control Execution Unit (EU) ALU 0 ALU 1 Branch GRFGRF ...... ...... ...... ...... ...... General Register File (GRF) Send Thread Control
  • 11. @IntelSoftware @IntelGraphics 11 Execution Unit (EU) ALU 0 ALU 1 Natively 2 x SIMD4 § Each ALU supports fp. & int math § 8 x 32-bit or 16 x 16-bit ops/cycle § ALU1 capable of high throughput transcendental Math § Min FPU instruction latency is 2 clocks § FPUs are fully pipelined across threads § instructions complete every cycle Gen Basic Building Block: Execution Unit (EU) Full rate: MUL, ADD, MAD, MIN/MAX, CMP 1/4 rate: SQRT/RSQRT, LOG/EXP, TRIG 1/8 rate: POW, DIV
  • 12. @IntelSoftware @IntelGraphics 12 Execution Unit (EU) GRFGRF ...... ...... ...... ...... ...... General Register File (GRF)General Register File (GRF) 12 Gen Basic Building Block: Execution Unit (EU) 7 Threads § Cover latency for Load, LDS, etc. § Each can run a diff. shader 28kB register space § 7 x 4 kB § 128 registers (grf.)/ thread § grf. are 8 x 32-bit (32B)
  • 13. @IntelSoftware @IntelGraphics 13 Execution Unit (EU) Thread Control ...... ...... ...... ...... ...... 13 Gen Basic Building Block: Execution Unit (EU) Thread Control § Thread Arbiter picks instructions to run from runnable thread(s) § Dispatches instructions to functional units § co-issues up to 4 instructions per cycle
  • 14. @IntelSoftware @IntelGraphics EU SIMD Explained Physically SIMD4x2 but logically support 1/2/4/8/16 or 32-wide instructions § > SIMD8 instructions pair adjacent GRF. • SIMD16 would pair 2 physical GRF. to a single logical 64B grf. § Compiler makes the decision: for 3D workload using either 8 or 16-wide 1414 Intel GPA has metrics to help and we now have a standalone Shader Compiler !! Performance tip: Reduce register pressure: § High SIMD/reduced spills § Better codegen / scheduling GRF Usage: <64 64-128 >128 Instruction used SIMD16 J SIMD8 K SIMD8 with Spills L
  • 15. @IntelSoftware @IntelGraphics How To Reduce Register Pressure Don’t: § Branch on constant buffer conditions § Non uniform access to buffer data § Excessive variable decl. (esp. arrays) Do’s: § Use Partial precision (min16float) § Move common code outside branches 1515 A lot more details in the Gen Graphics performance guide: https://software.intel.com/en-us/documentation/graphics-api-performance-guide-for-intel-processor-graphics-gen9 GRF Usage: <64 64-128 >128 Instruction used SIMD16 J SIMD8 K SIMD8 with Spills L
  • 16. @IntelSoftware @IntelGraphics Sampler Subslice: An Array Of 8 EUs With a Sampler Sampler takes request from the EUs, fetches texels from memory, and returns them to the EUs: § Handles Texture Decompression § 4tx/clk bilinear and point < 64b and aniso. 2x 32b • >64 formats and aniso. 4x runs at ½ speed § Multiple levels of cache Sampler performance tips: § Increase spatial locality § Reduce texture state changes § Use simple filtering modes (aniso. 16x is 1/16th bilinear rate!!) 16
  • 17. @IntelSoftware @IntelGraphics Subslice: An Array Of 8 EUs CS thread groups are guaranteed running on a single subslice: § Gen hardware is able to run 56 hardware threads/subslice § thread group size == 1024 are an issue If we run in SIMD16, we’d have 1024/16 = 64 hardware threads, which is bigger than the 56 available. To compensate, the shader compiler has to use SIMD32. § This causes 2 issues: • Occupancy is low (32 HW threads/56 available = 57%) • Potential spill fills, as this doubles the register count required. Performance tip: 8x8 thread groups is great on all hardware! 17
  • 18. @IntelSoftware @IntelGraphics DataPort SLM 2 Subslices grouped together with… Shared Local Memory (SLM) § 128kB § 128B/clk § ~1/4 latency vs. Gen9 Performance tips: § Using too much SLM can hurt occupancy § Regroup memory access to ensure full cache line access (64B) 18 Data Port (LD/ST) § Handles communication with L3: 64B/clk § UAV read/writes Performance tip: Only create UAVs when necessary
  • 19. @IntelSoftware @IntelGraphics Slice Common Architecture: Pixel Backend Slice Common Pixel Backend / Color$ Handles color writes and blending to render target § Writes and blend at 16 px/clk § Has its own 2 x 32k color cache 2:1 Lossless compression § Saves bandwidth when writing RT data § Saves bandwidth when display engine reads displayable RT data § Saves bandwidth when texture sampler reads intermediate RT data § MSAA textures are not compressed 19
  • 20. @IntelSoftware @IntelGraphics 20 Summary – 8 Subslices = 64 EUs 20
  • 21. @IntelSoftware @IntelGraphics 21 Tile-Based Deferred Rendering (TBDR) Problem statement: in many cases, bandwidth becomes the limiting factor for low-thermal design point (TDP) platforms. Solution(s): § Tile-based rendering is a well-known industry solution for low-power platforms; especially when discarding intermediate data § Significant amount of bandwidth and execution cycles can be saved for discarded triangles; e.g. frustum or backface culled § Enable opportunistically for scenarios the benefit the most
  • 22. @IntelSoftware @IntelGraphics Position-Only Shading Tiled-Based Rendering (PTBR) Position-Only Shading pipeline generates per-tile visibility stream; using (driver-generated) position-only vertex shader Render pipeline replays full vertex processing and per-tile pixel processing only for visible primitives Object Visibility Recorder (OVR) ensures consistency between pipelines Dynamic enabling of PTBR pipeline allows driver to choose based on optimal conditions; e.g., bandwidth-limited render passes 22 Vertex Processing Clip/Cull OVR Primitive Assembly Visibility Buffer(s) Visibility Buffer(s) Visibility Buffer(s) Visibility Buffer(s) Visibility Buffer(s) Visibility Buffer(s) Vertex Processing Tessellation Geometry Processing Clip/Cull Raster Primitive Assembly Pixel Processing PerTile Visibility Stream Per-Tile
  • 23. @IntelSoftware @IntelGraphics PTBR Performance Guidelines Use explicit API render passes and discard: § VkRenderPass/VkSubpass with VK_ATTACHMENT_STORE_OP_DONT_CARE § (RS5) ID3D12GraphicsCommandList4::EndRenderPass with D3D12_RENDER_PASS_ENDING_ACCESS_TYPE_DISCARD During high-bandwidth render passes (e.g., MSAA, blending) avoid: § Tessellation, Geometry Shading and/or Stream Output, and Compute Shading § Use per-attribute vertex buffers - position only GPA: per-Draw performance analysis will disable PTBR 23
  • 24. @IntelSoftware @IntelGraphics Coarse Pixel Shading (CPS): Motivation § Pixel density increasing § Performance not scaling at same density as pixel resolution § Need mechanisms to hit high frame rate and not waste pixel throughput § Software solutions: • Dynamic Resolution Rendering • Checkerboard rendering • … § Can also explore hardware solutions… Intel has highly optimized open source implementations of these available Topic of these slides 24
  • 25. @IntelSoftware @IntelGraphics Coarse Pixel Shading Overview Set Pixel Shading Rate to be: § Floating point value between 1 and 4 in X and Y dimension § Rounded to integer value to determine actual shading rate HW support for 2 modes of operation: § Foveated: • Specify a CPS rate for inner and outer elliptical region • Specify an inner and outer ellipse, smooth transition between them § Draw call: • Set a state for all future Draw() calls • Set multiple times per frame Will be exposed in D3D12 (aka Variable Rate Shading) as well as future Vulkan extensions. Per pixel visibility Shading once per X * Y pixel group Inner elliptical region Outer elliptical regionSmooth transition 25
  • 26. @IntelSoftware @IntelGraphics 26 Coarse Pixel Shading (CPS) • Proof of Concept integration into Unreal Engine SunTemple demo • CPS rate applied to all geometry when under motion at 2x2 or 4x4 rates • Early perf gains between ~20-40% • Microsoft Variable Rate Shading Talk happening now ! • Come see the demo at the Intel Booth!
  • 27. @IntelSoftware @IntelGraphics Conclusion § 64EUs * SIMD4x2 * 2ops/clk = 1024 ops/clk • Up to 1 Tflops !! • And you now have the keys to make the most of it § New performance features allowing to get even more from the hardware • CPS • PTBR § Biggest gaming announcement in 2019 – Raising the mainstream gaming bar § A lot more details in the Gen11 whitepaper : https://software.intel.com/en- us/download/graphics-api-architecture-guide-for-intel-processor-graphics-gen11 27
  • 30. @IntelSoftware @IntelGraphics 30 Backup Key Peak Metrics Gen9 GT2 Gen11 GT2 Slice Attribute # of Slices 1 1 # of Sub-Slices 3 8 # of Cores (Eus) 24 (3x8) 64 (8x8) Single Precision FLOPs per ClkClock (MAD) 384 1024 Half Precision FLOPs per ClkClock (MAD) 768 2048 Int8 IOPs per ClkClock 768 1024 Register File Per Sub-Slice 224KB 224KB # of Samplers 3 8 Point/Bilinear Texels/ClkClock (32bpt) 12 32 Point/Bilinear Texels/ClkClock (64bpt) 12 32 Trilinear Texels/ClkClock (32bpt) 6 16 Shared Local Memory Total 192KB(=3 x 64KB)* 512KB(=8 x 64KB) Slice-Common Attributes Pixels/ClkClock (RGBA8) wo. Alpha Blend 8 16 Pixels/ClkClock (RGBA8) w. Alpha Blend 8 16 HiZ Zixels/ClkClock 64 128 L3$ Cache 768 KB 3072 KB Geometry Attributes Vertex Attributes Per ClkClock 4 6 Primitive/ClkClock (backface Cull – strips) 1 1 Primitive/ClkClock (backface Cull – lists) 0.67 0.67 Global Attributes GTI Bandwidth (Bytes/ClkClock) R: 64 W: 32 R: 64 W: 64 LLC Configuration 2-8MB TBD DRAM Configuration 2x64 LPDDR3/DDR4 4x32 LPDDR4/DDR4 Peak DRAM Data Rate (Mbps) Up to 2Ch 2400 Up to 2Ch 3733 Max Peak DRAM BW (GBps) Up to 38.4 GBps Up to 59.7 GBps *Note - Gen9 L3$ includes SLM, Data and Fixed Function Data