SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
A 2.5D CULLING FOR FORWARD+
AMD
Takahiro Harada
2 | A 2.5D culling for Forward+ | Takahiro Harada
AGENDA
 Forward+
–  Forward, Deferred, Forward+
–  Problem description
 2.5D culling
 Results
3 | A 2.5D culling for Forward+ | Takahiro Harada
FORWARD+
4 | A 2.5D culling for Forward+ | Takahiro Harada
REAL-TIME SOLUTION COMPARISON
 Rendering equation
 Forward
 Deferred
 Forward+
5 | A 2.5D culling for Forward+ | Takahiro Harada
FORWARD RENDERING PIPELINE
 Depth prepass
–  Fills z buffer
 Prevent overdraw for shading
 Shading
–  Geometry is rendered
–  Pixel shader
 Iterate through light list set for each object
 Evaluates materials for the lights
6 | A 2.5D culling for Forward+ | Takahiro Harada
FORWARD+ RENDERING PIPELINE
 Depth prepass
–  Fills z buffer
 Prevent overdraw for shading
 Used for pixel position reconstruction for light culling
 Light culling
–  Culls light per tile basis
–  Input: z buffer, light buffer
–  Output: light list per tile
 Shading
–  Geometry is rendered
–  Pixel shader
 Iterate through light list calculated in light culling
 Evaluates materials for the lights
1
2
3
[1,2,3][1] [2,3]
7 | A 2.5D culling for Forward+ | Takahiro Harada
CREATING A FRUSTUM FOR A TILE
 An edge @SS == A plane @VS
 A tile (4 edges) @SS == 4 planes @VS
–  Open frustum (no bound in Z direction)
 Max and min Z is used to cap
8 | A 2.5D culling for Forward+ | Takahiro Harada
LONG FRUSTUM
 Screen space culling is not always sufficient
–  Create a frustum from max and min depth values
–  Edge of objects
–  Captures a lot of unnecessary lights
9 | A 2.5D culling for Forward+ | Takahiro Harada
LONG FRUSTUM
 Screen space culling is not always sufficient
–  Create a frustum from max and min depth values
–  Edge of objects
–  Captures a lot of unnecessary lights
0 lights
25 lights
50 lights
10 | A 2.5D culling for Forward+ | Takahiro Harada
GET WORSE IN A COMPLEX SCENE
0 lights
100 lights
200 lights
11 | A 2.5D culling for Forward+ | Takahiro Harada
QUESTION
 Want to reduce false positives
 Can we improve the culling without adding much overhead?
–  Computation time, memory
–  Culling itself is an optimization
–  Spending a lot of resources for it does not make sense
 Using a 3D grid is a natural extension
–  Uses too much memory
12 | A 2.5D culling for Forward+ | Takahiro Harada
2.5D CULLING
13 | A 2.5D culling for Forward+ | Takahiro Harada
2.5D CULLING
 Additional memory usage
–  0B global memory
–  4B local memory per WG (can compress more if you want)
 Additional computation complexity
–  A few bit and arithmetic instructions
–  A few lines of codes for light culling
–  No changes for other stages
 Additional runtime overhead
–  < 10% compared to the original light culling
14 | A 2.5D culling for Forward+ | Takahiro Harada
IDEA
 Split frustum in z direction
–  Uniform split for a frustum
–  Varying split among frustums
(a) (b)
15 | A 2.5D culling for Forward+ | Takahiro Harada
FRUSTUM CONSTRUCTION
 Calculate depth bound
–  max and min values of depth
 Split depth direction into 32 cells
–  Min value and cell size
 Flag occupied cell
 A 32bit depth mask per work group
A tile
16 | A 2.5D culling for Forward+ | Takahiro Harada
FRUSTUM CONSTRUCTION
 Calculate depth bound
–  max and min values of depth
 Split depth direction into 32 cells
–  Min value and cell size
 Flag occupied cell
 A 32bit depth mask per work group
7 7 7 7
7 7 7 2
7 7 2 1
7 2 1 0
Depth mask = 11100001
A tile
0 1 2 3 4 5 6 7
17 | A 2.5D culling for Forward+ | Takahiro Harada
LIGHT CULLING
 If a light overlaps to the frustum
–  Calculate depth mask for the light
–  Check overlap using the depth mask of the frustum
 Depth mask & Depth mask
–  11100001 & 00011000 = 00000000
Depth mask = 11100001
Depth mask = 00011000
18 | A 2.5D culling for Forward+ | Takahiro Harada
LIGHT CULLING
 If a light overlaps to the frustum
–  Calculate depth mask for the light
–  Check overlap using the depth mask of the frustum
 Depth mask & Depth mask
–  11100001 & 00110000 = 00100000
Depth mask = 11100001
Depth mask = 00110000
19 | A 2.5D culling for Forward+ | Takahiro Harada
CODE
Original With 2.5D culling
20 | A 2.5D culling for Forward+ | Takahiro Harada
RESULTS
21 | A 2.5D culling for Forward+ | Takahiro Harada
LIGHT CULLING
22 | A 2.5D culling for Forward+ | Takahiro Harada
LIGHT CULLING + 2.5D CULLING
23 | A 2.5D culling for Forward+ | Takahiro Harada
COMPARISON
1"
10"
100"
1000"
10000"
1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 16" 17" 18" 19" 20" 21" 22" 23"
Number'of'*les'
Number'of'lights'(x10)'
With"2.5D"culling"
Without"2.5D"culling"
220 lights/frustum -> 120 lights/frustum
24 | A 2.5D culling for Forward+ | Takahiro Harada
LIGHT CULLING
25 | A 2.5D culling for Forward+ | Takahiro Harada
LIGHT CULLING + 2.5D CULLING
26 | A 2.5D culling for Forward+ | Takahiro Harada
COMPARISON
1"
10"
100"
1000"
10000"
1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 16" 17"
Number'of'*les'
Number'of'lights'(x10)'
With"2.5D"culling"
Without"2.5D"culling"
27 | A 2.5D culling for Forward+ | Takahiro Harada
PERFORMANCE
0"
1"
2"
3"
4"
5"
6"
1024" 2048" 3072" 4096"
!me$(ms)$
Number$of$lights$
Forward+"w."frustum"culling"
Forward+"w."2.5D"
Deferred"
28 | A 2.5D culling for Forward+ | Takahiro Harada
CONCLUSION
 Proposed 2.5D culling which
–  Additional memory usage
  0B global memory
  4B local memory per WG (can compress more if you want)
–  Additional compute complexity
  3 lines of pseudo codes for light culling
  No changes for other stages
–  Additional runtime overhead
  < 10% compared to the original light culling
 Showed that 2.5D culling reduces false positives

Más contenido relacionado

La actualidad más candente

The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
Johan Andersson
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Johan Andersson
 

La actualidad más candente (20)

A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
 
Rendering Techniques in Deus Ex: Mankind Divided
Rendering Techniques in Deus Ex: Mankind DividedRendering Techniques in Deus Ex: Mankind Divided
Rendering Techniques in Deus Ex: Mankind Divided
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
 
Destruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance FieldsDestruction Masking in Frostbite 2 using Volume Distance Fields
Destruction Masking in Frostbite 2 using Volume Distance Fields
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Frostbite on Mobile
Frostbite on MobileFrostbite on Mobile
Frostbite on Mobile
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-rendering
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
Beyond porting
Beyond portingBeyond porting
Beyond porting
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
 
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in FrostbitePhysically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
 

Similar a A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012) (6)

new_age_graphics_android_x86
new_age_graphics_android_x86new_age_graphics_android_x86
new_age_graphics_android_x86
 
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA TuringSIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
 
Linear Programming
Linear ProgrammingLinear Programming
Linear Programming
 
Linear Programming
Linear ProgrammingLinear Programming
Linear Programming
 
Linear Programming
Linear ProgrammingLinear Programming
Linear Programming
 
Efficient LDI Representation (TPCG 2008)
Efficient LDI Representation (TPCG 2008)Efficient LDI Representation (TPCG 2008)
Efficient LDI Representation (TPCG 2008)
 

Más de Takahiro Harada

201907 Radeon ProRender2.0@Siggraph2019
201907 Radeon ProRender2.0@Siggraph2019201907 Radeon ProRender2.0@Siggraph2019
201907 Radeon ProRender2.0@Siggraph2019
Takahiro Harada
 
Physics Tutorial, GPU Physics (GDC2010)
Physics Tutorial, GPU Physics (GDC2010)Physics Tutorial, GPU Physics (GDC2010)
Physics Tutorial, GPU Physics (GDC2010)
Takahiro Harada
 
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Takahiro Harada
 
Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)
Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)
Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)
Takahiro Harada
 
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
Takahiro Harada
 
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Takahiro Harada
 

Más de Takahiro Harada (16)

201907 Radeon ProRender2.0@Siggraph2019
201907 Radeon ProRender2.0@Siggraph2019201907 Radeon ProRender2.0@Siggraph2019
201907 Radeon ProRender2.0@Siggraph2019
 
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
[2018 GDC] Real-Time Ray-Tracing Techniques for Integration into Existing Ren...
 
Introduction to OpenCL (Japanese, OpenCLの基礎)
Introduction to OpenCL (Japanese, OpenCLの基礎)Introduction to OpenCL (Japanese, OpenCLの基礎)
Introduction to OpenCL (Japanese, OpenCLの基礎)
 
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
 
確率的ライトカリング 理論と実装 (CEDEC2016)
確率的ライトカリング 理論と実装 (CEDEC2016)確率的ライトカリング 理論と実装 (CEDEC2016)
確率的ライトカリング 理論と実装 (CEDEC2016)
 
Introducing Firerender for 3DS Max
Introducing Firerender for 3DS MaxIntroducing Firerender for 3DS Max
Introducing Firerender for 3DS Max
 
[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays
[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays
[2016 GDC] Multiplatform GPU Ray-Tracing Solutions With FireRender and FireRays
 
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
 
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
Introduction to Bidirectional Path Tracing (BDPT) & Implementation using Open...
 
Foveated Ray Tracing for VR on Multiple GPUs
Foveated Ray Tracing for VR on Multiple GPUsFoveated Ray Tracing for VR on Multiple GPUs
Foveated Ray Tracing for VR on Multiple GPUs
 
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
 
Physics Tutorial, GPU Physics (GDC2010)
Physics Tutorial, GPU Physics (GDC2010)Physics Tutorial, GPU Physics (GDC2010)
Physics Tutorial, GPU Physics (GDC2010)
 
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
 
Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)
Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)
Heterogeneous Particle based Simulation (SIGGRAPH ASIA 2011)
 
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
A Parallel Constraint Solver for a Rigid Body Simulation (SIGGRAPH ASIA 2011)
 
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)

  • 1. A 2.5D CULLING FOR FORWARD+ AMD Takahiro Harada
  • 2. 2 | A 2.5D culling for Forward+ | Takahiro Harada AGENDA  Forward+ –  Forward, Deferred, Forward+ –  Problem description  2.5D culling  Results
  • 3. 3 | A 2.5D culling for Forward+ | Takahiro Harada FORWARD+
  • 4. 4 | A 2.5D culling for Forward+ | Takahiro Harada REAL-TIME SOLUTION COMPARISON  Rendering equation  Forward  Deferred  Forward+
  • 5. 5 | A 2.5D culling for Forward+ | Takahiro Harada FORWARD RENDERING PIPELINE  Depth prepass –  Fills z buffer  Prevent overdraw for shading  Shading –  Geometry is rendered –  Pixel shader  Iterate through light list set for each object  Evaluates materials for the lights
  • 6. 6 | A 2.5D culling for Forward+ | Takahiro Harada FORWARD+ RENDERING PIPELINE  Depth prepass –  Fills z buffer  Prevent overdraw for shading  Used for pixel position reconstruction for light culling  Light culling –  Culls light per tile basis –  Input: z buffer, light buffer –  Output: light list per tile  Shading –  Geometry is rendered –  Pixel shader  Iterate through light list calculated in light culling  Evaluates materials for the lights 1 2 3 [1,2,3][1] [2,3]
  • 7. 7 | A 2.5D culling for Forward+ | Takahiro Harada CREATING A FRUSTUM FOR A TILE  An edge @SS == A plane @VS  A tile (4 edges) @SS == 4 planes @VS –  Open frustum (no bound in Z direction)  Max and min Z is used to cap
  • 8. 8 | A 2.5D culling for Forward+ | Takahiro Harada LONG FRUSTUM  Screen space culling is not always sufficient –  Create a frustum from max and min depth values –  Edge of objects –  Captures a lot of unnecessary lights
  • 9. 9 | A 2.5D culling for Forward+ | Takahiro Harada LONG FRUSTUM  Screen space culling is not always sufficient –  Create a frustum from max and min depth values –  Edge of objects –  Captures a lot of unnecessary lights 0 lights 25 lights 50 lights
  • 10. 10 | A 2.5D culling for Forward+ | Takahiro Harada GET WORSE IN A COMPLEX SCENE 0 lights 100 lights 200 lights
  • 11. 11 | A 2.5D culling for Forward+ | Takahiro Harada QUESTION  Want to reduce false positives  Can we improve the culling without adding much overhead? –  Computation time, memory –  Culling itself is an optimization –  Spending a lot of resources for it does not make sense  Using a 3D grid is a natural extension –  Uses too much memory
  • 12. 12 | A 2.5D culling for Forward+ | Takahiro Harada 2.5D CULLING
  • 13. 13 | A 2.5D culling for Forward+ | Takahiro Harada 2.5D CULLING  Additional memory usage –  0B global memory –  4B local memory per WG (can compress more if you want)  Additional computation complexity –  A few bit and arithmetic instructions –  A few lines of codes for light culling –  No changes for other stages  Additional runtime overhead –  < 10% compared to the original light culling
  • 14. 14 | A 2.5D culling for Forward+ | Takahiro Harada IDEA  Split frustum in z direction –  Uniform split for a frustum –  Varying split among frustums (a) (b)
  • 15. 15 | A 2.5D culling for Forward+ | Takahiro Harada FRUSTUM CONSTRUCTION  Calculate depth bound –  max and min values of depth  Split depth direction into 32 cells –  Min value and cell size  Flag occupied cell  A 32bit depth mask per work group A tile
  • 16. 16 | A 2.5D culling for Forward+ | Takahiro Harada FRUSTUM CONSTRUCTION  Calculate depth bound –  max and min values of depth  Split depth direction into 32 cells –  Min value and cell size  Flag occupied cell  A 32bit depth mask per work group 7 7 7 7 7 7 7 2 7 7 2 1 7 2 1 0 Depth mask = 11100001 A tile 0 1 2 3 4 5 6 7
  • 17. 17 | A 2.5D culling for Forward+ | Takahiro Harada LIGHT CULLING  If a light overlaps to the frustum –  Calculate depth mask for the light –  Check overlap using the depth mask of the frustum  Depth mask & Depth mask –  11100001 & 00011000 = 00000000 Depth mask = 11100001 Depth mask = 00011000
  • 18. 18 | A 2.5D culling for Forward+ | Takahiro Harada LIGHT CULLING  If a light overlaps to the frustum –  Calculate depth mask for the light –  Check overlap using the depth mask of the frustum  Depth mask & Depth mask –  11100001 & 00110000 = 00100000 Depth mask = 11100001 Depth mask = 00110000
  • 19. 19 | A 2.5D culling for Forward+ | Takahiro Harada CODE Original With 2.5D culling
  • 20. 20 | A 2.5D culling for Forward+ | Takahiro Harada RESULTS
  • 21. 21 | A 2.5D culling for Forward+ | Takahiro Harada LIGHT CULLING
  • 22. 22 | A 2.5D culling for Forward+ | Takahiro Harada LIGHT CULLING + 2.5D CULLING
  • 23. 23 | A 2.5D culling for Forward+ | Takahiro Harada COMPARISON 1" 10" 100" 1000" 10000" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 16" 17" 18" 19" 20" 21" 22" 23" Number'of'*les' Number'of'lights'(x10)' With"2.5D"culling" Without"2.5D"culling" 220 lights/frustum -> 120 lights/frustum
  • 24. 24 | A 2.5D culling for Forward+ | Takahiro Harada LIGHT CULLING
  • 25. 25 | A 2.5D culling for Forward+ | Takahiro Harada LIGHT CULLING + 2.5D CULLING
  • 26. 26 | A 2.5D culling for Forward+ | Takahiro Harada COMPARISON 1" 10" 100" 1000" 10000" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 16" 17" Number'of'*les' Number'of'lights'(x10)' With"2.5D"culling" Without"2.5D"culling"
  • 27. 27 | A 2.5D culling for Forward+ | Takahiro Harada PERFORMANCE 0" 1" 2" 3" 4" 5" 6" 1024" 2048" 3072" 4096" !me$(ms)$ Number$of$lights$ Forward+"w."frustum"culling" Forward+"w."2.5D" Deferred"
  • 28. 28 | A 2.5D culling for Forward+ | Takahiro Harada CONCLUSION  Proposed 2.5D culling which –  Additional memory usage   0B global memory   4B local memory per WG (can compress more if you want) –  Additional compute complexity   3 lines of pseudo codes for light culling   No changes for other stages –  Additional runtime overhead   < 10% compared to the original light culling  Showed that 2.5D culling reduces false positives