Modern Graphics Abstractions & Real-Time Ray Tracing discusses Halcyon, a graphics rendering system built from scratch using modern graphics APIs. Halcyon uses render handles, commands, backends, devices, and graphs to provide an efficient and flexible rendering system that works across APIs. It also details virtual multi-GPU capabilities that allow developers to test multi-GPU code even on single-GPU machines.
3. Halcyon Rendering
▪ “PICA PICA” and Halcyon built from scratch
▪ Implemented lots of bespoke technology
▪ Only modern abstractions (DX12, VK, Metal 2, …)
▪ Minimal effort to add a new API or platform
▪ Efficient and flexible rendering was a major focus
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
4. Halcyon Rendering
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
Render Handles
Render Commands
Render Backend
Render Device
Render Backend
Render DeviceRender Device
Render Backend
Render Proxy
Render Graph Render Graph
Application
6. Render Backend
▪ Live-reloadable DLLs
▪ Enumerates adapters and capabilities
▪ Swap chain support
▪ Extensions (i.e. ray tracing, sub groups, …)
▪ Determine adapter(s) to use
S E E D // Halcyon + Vulkan
7. Render Backend
▪ Provides debugging and profiling
▪ RenderDoc integration, validation layers, …
▪ Create and destroy render devices
S E E D // Halcyon + Vulkan
8. Render Backend
S E E D // Halcyon + Vulkan
▪ Direct3D 12
▪ Vulkan 1.1
▪ Metal 2
▪ Proxy
▪ Mock
9. Render Backend
S E E D // Halcyon + Vulkan
▪ Direct3D 12
▪ Shader Model 6.X
▪ DirectX Ray Tracing
▪ Bindless Resources
▪ Explicit Multi-GPU
▪ DirectML (soon..)
▪ …
10. Render Backend
S E E D // Halcyon + Vulkan
▪ Vulkan 1.1
▪ Sub-groups
▪ Descriptor indexing
▪ External memory
▪ Multi-draw indirect
▪ Ray tracing (soon..)
▪ …
11. Render Backend
S E E D // Halcyon + Vulkan
▪ Metal 2
▪ Early development
▪ Primarily desktop
▪ Argument buffers
▪ Machine learning
▪ …
12. Render Backend
S E E D // Halcyon + Vulkan
▪ Proxy
▪ Discussed later in the presentation
13. Render Backend
S E E D // Halcyon + Vulkan
▪ Mock
▪ Performs resource tracking and validation
▪ Command stream is parsed and evaluated
▪ No submission to an API
▪ Useful for unit tests and debugging
15. Render Device
S E E D // Halcyon + Vulkan
▪ Abstraction of a logical GPU adapter
▪ e.g. VkDevice, ID3D12Device, …
▪ Provides interface to GPU queues
▪ Command list submission
16. Render Device
S E E D // Halcyon + Vulkan
▪ Ownership of GPU resources
▪ Create & Destroy
▪ Lifetime tracking of resources
▪ Mapping render handles → device resources
18. S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
Render Handles
▪ Resources associated by handle
▪ Lightweight (64 bits)
▪ Constant-time lookup
▪ Type safety (i.e. buffer vs texture)
▪ Can be serialized or transmitted
▪ Generational for safety
▪ e.g. double-delete, usage after delete
19. S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
Render Handles
ID3D12Resource ID3D12Resource
DX12: Adapter 2
ID3D12Resource
DX12: Adapter 3
ID3D12Resource
DX12: Adapter 1DX12: Adapter 0
Render Handle
▪ Handles allow one-to-many cardinality [handle->devices]
▪ Each device can have a unique representation of the handle
20. S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
Render Handles
▪ Can query if a device has a handle loaded
▪ Safely add and remove devices
▪ Handle owned by application, representation can reload on device
ID3D12Resource ID3D12Resource
DX12: Adapter 2
ID3D12Resource
DX12: Adapter 3
ID3D12Resource
DX12: Adapter 1DX12: Adapter 0
Render Handle
21. S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
Render Handles
▪ Shared resources are supported
▪ Primary device owner, secondaries alias primary
ID3D12Resource ID3D12Resource
DX12: Adapter 2
ID3D12Resource
DX12: Adapter 3
ID3D12Resource
DX12: Adapter 1DX12: Adapter 0
Render Handle
22. S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
Render Handles
▪ Can also mix and match backends in the same process!
▪ Made debugging VK implementation much easier
▪ DX12 on left half of screen, VK on right half of screen
ID3D12Resource ID3D12Resource
VK: Adapter 0
VkImage
Proxy: Adapter 0
Render Handle
DX12: Adapter 1DX12: Adapter 0
Render Handle
24. Render Commands
▪ Draw
▪ DrawIndirect
▪ Dispatch
▪ DispatchIndirect
▪ UpdateBuffer
▪ UpdateTexture
▪ CopyBuffer
▪ CopyTexture
▪ Barriers
▪ Transitions
▪ BeginTiming
▪ EndTiming
▪ ResolveTimings
▪ BeginEvent
▪ EndEvent
▪ BeginRenderPass
▪ EndRenderPass
▪ RayTrace
▪ UpdateTopLevel
▪ UpdateBottomLevel
▪ UpdateShaderTable
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
25. Render Commands
▪ Queue type specified
▪ Spec validation
▪ Allowed to run?
▪ e.g. draws on compute
▪ Automatic scheduling
▪ Where can it run?
▪ Async compute
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
26. Render Commands
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
27. Render Command List
▪ Encodes high level commands
▪ Tracks queue types encountered
▪ Queue mask indicating scheduling rules
▪ Commands are stateless - parallel recording
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
28. Render Compilation
▪ Render command lists are “compiled”
▪ Translation to low level API
▪ Can compile once, submit multiple times
▪ Serial operation (memcpy speed)
▪ Perfect redundant state filtering
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
30. Render Graph
▪ Inspired by FrameGraph [O’Donnell 2017]
▪ Automatically handle transient resources
▪ Import explicitly managed resources
▪ Automatic resource transitions
▪ Render target batching
▪ DiscardResource
▪ Memory aliasing barriers
▪ …
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
31. Render Graph
▪ Basic memory management
▪ Not targeting current consoles
▪ Fine grained memory reuse sub-optimal with current PC drivers
▪ Lose ~5% on aliasing barriers and discards
▪ Automatic queue scheduling
▪ Ongoing research
▪ Need heuristics on task duration and bottlenecks
▪ e.g. Memory vs ALU
▪ Not enough to specify dependencies
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
32. Render Graph
▪ Frame Graph → Render Graph: No concept of a “frame”
▪ Fully automatic transitions and split barriers
▪ Single implementation, regardless of backend
▪ Translation from high level render command stream
▪ API differences hidden from render graph
▪ Support for mGPU
▪ Mostly implicit and automatic
▪ Can specify a scheduling policy
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
33. Render Graph
▪ Composition of multiple graphs at varying frequencies
▪ Same GPU: async compute
▪ mGPU: graphs per GPU
▪ Out-of-core: server cluster, remote streaming
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
34. Render Graph
▪ Composition of multiple graphs at varying frequencies
▪ e.g. translucency, refraction, global illumination
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
35. Render Graph
▪ Two phases
▪ Graph construction
▪ Specify inputs and outputs
▪ Serial operation (by design)
▪ Graph evaluation
▪ Highly parallelized
▪ Record high level render commands
▪ Automatic barriers and transitions
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
37. Render Graph
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
▪ Explicit heterogeneous mGPU
▪ Parallel fork-join approach
▪ Resources copied through system
memory using copy queue
▪ ~1ms for every 15mb transferred
▪ Minimize PCI-E transfers
▪ Immutable data replicated
▪ Tightly pack data
GPU1 GPU2 GPU3 GPU4
Partition1[Primary]
Partition2[Secondary]
Partition3[Secondary]
Partition4[Secondary]
38. Render Graph
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
▪ Workloads are divided into partitions
▪ Based on GPU device count
▪ Single primary device
▪ Other devices are secondaries
▪ Variety of scheduling and transfer
patterns are necessary
▪ Simple rules engine
GPU1 GPU2 GPU3 GPU4
Partition1[Primary]
Partition2[Secondary]
Partition3[Secondary]
Partition4[Secondary]
39. Render Graph
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
▪ Run ray generation on primary GPU
▪ Copy results in sub-regions to other
GPUs
▪ Run tracing phases on separate GPUs
▪ Copy tracing results back to primary
GPU
▪ Run filtering on primary GPU
40. Render Graph
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
▪ Only width is divided
▪ Simplifies textures vs. buffers
▪ Passes are unaware of GPU count
41. ▪ Lots of fun coordinate snapping bugs
▪ i.e. 3 GPUs partitioned to 0.33333…
42. ▪ Lots of fun coordinate snapping bugs
▪ 16 GPUs! (because, why not?)
43. Render Graph
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
▪ RenderGraphSchedule
▪ NoDevices → Pass is disabled
▪ AllDevices → Pass runs on all devices
▪ PrimaryDevice → Pass only runs on primary device
▪ SecondaryDevices → Pass runs on secondaries if count > 1, otherwise primary
▪ OnlySecondaryDevices → Pass only runs on secondary devices, disabled unless mGPU
Requested Per Pass →
44. Render Graph
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
▪ RenderTransferPartition
▪ PartitionAll → Select all partitions from device
▪ PartitionIsolated → Select isolated region from device
▪ RenderTransferFilter
▪ AllDevices → Transfer completes on all devices
▪ PrimaryDevice → Transfer completes on the primary device
▪ SecondaryDevices → Transfer completes on all secondary devices
Requested Per Pass →
45. Render Graph
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
▪ PartitionAll → PartitionAll
▪ Copies full resource on one GPU to full resource on all specified GPUs
▪ PartitionAll → PartitionIsolated
▪ Copies full resource on one GPU to isolated regions on all specified GPUs (partial copies)
▪ PartitionIsolated → PartitionAll
▪ (Invalid configuration)
▪ PartitionIsolated → PartitionIsolated
▪ Copies isolated region on one GPU to isolated regions on all specified GPUs (partial copies)
46. Devices this pass will run on
Schedule transfers in or out
Scaling work dimensions for each GPU
49. Render Graph
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
▪ Some bugs were subtle
▪ Weird cell shading? ☺
▪ Incorrect transfers?
▪ Transfers in (input data)
▪ Transfers out (result data)
▪ Incorrect scheduling?
▪ Pass not running
▪ Pass running when it shouldn’t
▪ Partition window
51. Render Graph
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
▪ Implicit data flow via explicit scopes
▪ “Long-distance” extensible parameter passing
▪ Scope given to each render pass
▪ Can create nested scope for sub-graph
▪ Results stored into scope
▪ Hygiene via nesting and shadowing
{
gbuffer <- render_opaque()
gbuffer <- render_decals(gbuffer)
{
gbuffer <- render_opaque()
render_lighting(gbuffer)
} -> envmap
apply_envmap(gbuffer, envmap)
}
struct RenderGraphAreaLight
{
RenderGraphResource triangleLightList;
uint32 triangleCount;
};
52. Render Graph
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
▪ Lookup by type
▪ scope.get<T>() -> &T
▪ Parameters in “plain old data” structs
▪ RenderGraphResource, RenderHandle
▪ float, int, mat4, etc.
{
gbuffer <- render_opaque()
gbuffer <- render_decals(gbuffer)
{
gbuffer <- render_opaque()
render_lighting(gbuffer)
} -> envmap
apply_envmap(gbuffer, envmap)
}
struct RenderGraphAreaLight
{
RenderGraphResource triangleLightList;
uint32 triangleCount;
};
53. Render Graph DSL
▪ Experimental
▪ Macro Magic
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
55. S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
Virtual Multi-GPU
▪ Most developers have single GPU
▪ Uncommon for 2 GPU machines
▪ Rare for 3+ GPU
▪ Practical for show floor and cranking up to 11
▪ Impractical for regular development ☺
56. S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
Virtual Multi-GPU
▪ Build device indirection table
▪ Virtual device index → adapter index
DX12: Adapter 2
ID3D12ResourceID3D12Resource
DX12: Adapter 1
ID3D12Resource
DX12: Adapter 0
Device 0 Device 1 Device 2 Device 3 Device 4 Device 5
57. S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
Virtual Multi-GPU
▪ Create multiple instances of a device
▪ Virtual GPUs execute sequentially (WDDM)
DX12: Adapter 2
ID3D12ResourceID3D12Resource
DX12: Adapter 1
ID3D12Resource
DX12: Adapter 0
Device 0 Device 1 Device 2 Device 3 Device 4 Device 5
58. S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
Virtual Multi-GPU
▪ Increases overall wall time (don’t use for profiling)
▪ Amazing for development and testing!
DX12: Adapter 2
ID3D12ResourceID3D12Resource
DX12: Adapter 1
ID3D12Resource
DX12: Adapter 0
Device 0 Device 1 Device 2 Device 3 Device 4 Device 5
59. S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
Virtual Multi-GPU
▪ PICA PICA developers all had 1 GPU
▪ Limited testing with 2 GPUs
▪ Show floor at GDC 2018 was 4 GPUs
▪ Virtual-only testing…
▪ Crossed fingers
▪ Worked flawlessly!
60. S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
Virtual Multi-GPU
▪ Develop and debug multi-GPU with only a single GPU
▪ Virtual mGPU reliably reproduces most bugs!
▪ Entire features developed without physical mGPU
▪ i.e. Surfel GI (the night before GDC.. ☺)
62. ▪ Remote render backend
▪ Any API / Any OS
Render Proxy
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
63. ▪ Render API calls are routed remotely
▪ Uses gRPC (high performance RPC framework)
▪ Use an API on an incompatible OS
▪ e.g. Direct3D 12 on macOS or Linux
Render Proxy
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
64. ▪ Scale large workloads with a GPU cluster
▪ Some API as render graph mGPU
▪ Only rendering is routed, scene state is local
▪ Work from the couch!
▪ i.e. DirectX ray tracing on a MacBook ☺
Render Proxy
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
69. Machine Learning
▪ Deep reinforcement learning
▪ Rendering 36 semantic views
▪ Training with TensorFlow
▪ On-premise GPU cluster
▪ Inference with TensorFlow
▪ CPU AVX2
▪ In-process
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
70. Machine Learning
▪ Adding inferencing with DirectML
▪ Hardware accelerated inferencing operators
▪ Resource management
▪ Schedule ML work explicitly
▪ Interleave ML work with other GPU workloads
▪ Fall back for other APIs
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
71. Machine Learning
▪ Treat trained ML models like any other 3D asset
▪ Render Graph abstractions
▪ Reference the same render resources
▪ Similar to chaining compute passes
▪ Record “meta” render commands
▪ Backends can fuse or transform, if desired
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
72. Machine Learning
▪ Provide operators as render commands
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
▪ Activation
▪ Convolution
▪ Elementwise
▪ FC
▪ GRU
▪ LSTM
▪ MatMul
▪ Normalization
▪ Pooling
▪ Random
▪ RNN
▪ etc.
74. Asset Pipelines
▪ Geometry
▪ Animations
▪ Shaders
▪ Sounds
▪ Music
▪ Textures
▪ Scenes
▪ etc.
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
75. Asset Pipelines
▪ Everything is content addressable
▪ Hash of data is the identity
▪ Sha256
▪ Merkle trees!
▪ Dependency evaluation
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
77. Asset Pipelines
▪ Containerized, running on Kubernetes
▪ Google Cloud Platform
▪ On-Premise Cluster
▪ AMD 1950X TR
▪ NV Titan V
▪ Communication using gRPC and Protobuf
▪ Google Cloud Storage
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
78. Asset Pipelines
▪ Analytics with Prometheus and Grafana
▪ Publish custom metrics
▪ Scraped into rich UI
▪ Collecting data is important!
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
81. Shaders
▪ Exclusively HLSL
▪ Shader Model 6.X
▪ Majority are compute shaders
▪ Performance is critical
▪ Group shared memory
▪ Wave-ops / Sub-groups
S E E D // Halcyon + Vulkan
82. Shaders
▪ No reflection
▪ Avoid costly lookups
▪ Only explicit bindings
▪ … except for validation
▪ Extensive use of HLSL spaces
▪ Updates at varying frequency
▪ Bindless
S E E D // Halcyon + Vulkan
83. Shaders
S E E D // Halcyon + Vulkan
SPIR-VDXIL
Vulkan 1.1Direct3D 12
ISPCMSL
SPIRV-CROSS
AVX2, …Metal 2
DXC
HLSL
85. Hybrid Rendering
Direct Shadows
(ray trace or raster)
Direct Lighting
(compute)
Reflections
(ray trace or compute)
G-Buffer
(raster)
Global Illumination
(ray trace and compute)
Post Processing
(compute)
Transparency & Translucency
(ray trace and compute)
Ambient Occlusion
(ray trace or compute)
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
96. ▪ Launch ray towards light
▪ Handled by Miss Shader (DXR)
▪ Soft shadows?
▪ Random direction from cone [PBRT]
▪ SVGF-style reconstruction [Schied 2017]
97. Hard Raytraced Shadows Soft Raytraced Shadows (Unfiltered) Soft Raytraced Shadows (Filtered)
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
98. Transparent
Shadows
▪ Hard to get right with raster
▪ [McGuire17]
▪ Trace towards light
▪ Like opaque shadow tracing
▪ Accumulate absorption
▪ Product of colors
▪ Thin film approximation
▪ Density absorption easy extension
▪ Can also be soft!
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
99. Transparent Shadows
▪ Keep tracing until
▪ Hit opaque surface or all light is absorbed or miss
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
105. Transparency & Translucency
▪ Raytracing enables accurate light scattering
▪ Transparency
▪ Order-independent (OIT)
▪ Multiple index-of-refraction transitions
▪ Variable roughness, refractions and absorption
▪ Translucency
▪ Light scattering inside homogeneous medium
▪ We do this in texture-space
▪ Handle view-dependent terms & dynamic changes to
the environment Texture-Space Glass and Translucency
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
106. Transparency
Works for clear and rough glass
▪ Clear
▪ No filtering required
▪ Rough
▪ Microfacet Models for Refraction
through Rough Surfaces
[Walter2007]
▪ More samples + temporal filtering
▪ Apply phase function & BTDF
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
107. Translucency
▪ For every valid position & normal
▪ Flip normal and push (ray) inside
▪ Launch rays in uniform sphere dist.
▪ (Importance-sample phase function)
▪ Compute lighting at intersection
▪ Gather all samples
▪ Update value in texture
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
108.
109. GI
▪ Can’t solve GI every frame
▪ Our GI budget: 250k rays / frame
▪ World space surfels
▪ Easy to accumulate
▪ Distribute on the fly
▪ No parameterization
▪ Smooth result by construction
▪ Position, normal, radius, animation info
▪ Spawn surfels in world from view
▪ Surfel path tracing with incremental
irradiance
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
110. Round 1: Victory!
▪ Initial DXR-enabled games to have a set of ray tracing features
▪ Reflections
▪ Shadows
▪ AO
▪ …
▪ Developers learning how to incorporate ray tracing in their existing pipelines
▪ Adapt existing backends to support the transition
▪ Akin to first round of launch titles on a new console
▪ Let’s get ready for Round 2?
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
112. Transparency
▪ Transparency is far from being solved!
▪ Glass
▪ Clear/rough + filtered + shadowed
▪ Particles & Volumetric Effects
▪ Can use miss shader to update volumes
▪ Raymarching in hit shaders?
▪ Non-trivial blending & filtering
▪ PICA PICA: texture-space OIT with
refractions and scattering
▪ Not perfect, but one step closer
▪ Denoising techniques don’t work so well with transparency (and 1-2 spp)
Transparency in the Maxwell Renderer
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
113. Partial Coverage
▪ Foliage
▪ Can still do alpha test (i.e.: any-hit)
▪ Animated becomes a real problem
▪ Defocus effects such as motion blur
and depth of field are still intractable
▪ Need partial coverage denoising
@ 1-2 spp
PBRT (Matt Pharr, Wenzel Jakob, Greg Humphreys)
Disney / Pixar
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
114. Specialized Denoising
▪ PICA PICA: specialized denoising & reconstruction for
view-dependent, light-dependent or any other term you
can monitor variance across frames [Stachowiak2018]
▪ Can use general denoising on the whole image, but…
▪ Denoising filter around a specific term will achieve greater quality
▪ Potentially converges faster
▪ One algorithm to rule-them-all?
▪ Lately…
▪ Gradient Estimation for Real-Time Adaptive Temporal Filtering
[Schied 2018]
▪ Lots of progress with DL approaches
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
115. Geometry++
▪ Ray-Triangle intersection is the fast path
▪ Well-known problem [Möller 1997] [Baldwin 2016]
▪ Intersection of procedural geometry currently done via Intersection Shaders
▪ Allows custom/parametric intersection code, for arbitrary shapes
▪ Current (slow) path for procedural geometry
▪ Procedural: hair, water, and particles are still a challenge
▪ Hair: all done parametrically via intersection shaders?
▪ Water: can be done via triangles, but reflections + refractions are challenging
▪ Also, handle view-dependent LODing (ie.: water patches) vs ray-dependent LODing?
▪ DXR: Need ways to accelerate for non-obvious primitives
▪ Cone-BVH for ray-marching? (à-la-Aaltonen in Claybook)
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
116. Taking Advantage of DXR
▪ DirectX offers easy interoperability between raster, compute and raytracing
▪ Raytracing, rasterization and compute shaders can share code & types
▪ Evaluate your actual HLSL material shaders - directly usable for a hybrid raytracing pipeline
▪ The output from one stage can feed data for another
▪ Write to UAVs, read in next stage
▪ Prepare rays to be launched, and trace on another (i.e.: mGPU)
▪ Interop extends opportunities for solving new sparse problems
▪ Use the power of each stage to your advantage
▪ RT opens the door to an entirely new class of techniques for games
▪ DXR provides a playground where research & gamedevs can collaborate!
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
117. Performance (1/)
▪ Best tracing performance comes from one massive static BVH?
▪ Viable for a big static scene, but most games are not static…
▪ Animated Geometry: Acceleration Structure construction is fairly fast
▪ But not fast enough to run vertex shaders for all the animated geometry and build AS from
scratch every frame
▪ Top Level: has to be rebuilt whenever anything interesting happens
▪ Animated objects, objects inserted/removed, LOD swap
▪ LOD selection/management and how that ties to Shader Table management
▪ Bottom Level: is there a sweet spot?
▪ Lots of individual rigid objects: might have to bake transforms and merge into larger ones
▪ How much memory can you spare for better tracing performance?
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
118. Performance (2/)
▪ 2+ levels hierarchies?
▪ + : Allow to split worlds into update frequencies, and avoid updates if nothing has changed
▪ - : Tracking the object-to-world transforms becomes difficult: ray needs a transform stack
▪ - : Apps can end up creating a large instance hierarchy of single-node trees → bad
▪ Idea: have two bottom level BVH's, one for static and one for dynamic geo?
▪ Static: Built & left alone, traverse it first
▪ Dynamic: Rebuilt in the background as a long running task. Refitted every frame
▪ + Supports many animated characters
▪ While not taking a hit for rebuilding BVH for static geometry
▪ - Doesn’t solve massive number of materials that games typically have…
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
119. Performance (3/)
▪ Example: 1K materials, fire 100k secondary rays
▪ (Potentially) need to run 1k shaders each on a batch of 100 hits. Viable workload?
▪ Can’t just shove all material variations in the SBT and let IHVs optimize, right?
▪ How aggressively do shader variations / materials in game scenes need to be
merged?
1 or 2 orders of magnitude?
▪ Uber Shaders + Texture Space Shading?
▪ PICA PICA Uber Shaders: Rebinding constants quite frequently was not an issue
▪ Use texture-space gbuffers, or is that too much memory?
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
120. Future Ray Tracing Research
▪ RTRT literature has focused on accelerating “correct” raytracing
▪ Lots new challenges with game ray tracing
▪ Literature has to adjust to game ray tracing constraints
▪ Games → budgeted amount rays/frame, rays/pixel, fixed frame times & memory budgets
▪ Games → Light transport caches: surfels, voxels, lightmaps
▪ Q: How many rays-per-pixel needed to have real-time perfect indirect diffuse?
▪ Q: What has to happen to get fully robust PBR real-time ray traced lighting?
▪ Perf R&D: Efficient sampling / integration strategies & reconstruction filtering
▪ Perf R&D: Caching of materials and partial solutions
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
121. Global Illumination
▪ Open problem even in offline rendering
▪ Variance still too high
▪ Can reduce frequency
▪ Prefiltering, path-space filtering
▪ Denoising & reconstruction
▪ Pinhole GI & lighting is not solved
▪ Incoherent shading → intractable performance
▪ Have to resort to caching to amortize shading
▪ PICA PICA: caching of GI via surfels
▪ Some issues: only spawn surfels from what you see
▪ Need to solve GI for user-generated content →
[Ritchell 2011]
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
122. Summary
▪ Modern graphics abstractions are important for efficiency without verbosity
▪ Real-time ray tracing significantly improves visuals, but still have a lots to
do to bridge offline and real-time
▪ DXR provides a playground where research and gamedevs can collaborate!
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
123. References
S E E D // Modern Graphics Abstractions & Real-Time Ray Tracing
▪ [Andersson 2018] Johan Andersson, Colin Barré-Brisebois.“DirectX: Evolving Microsoft's Graphics Platform”.
available online
▪ [McGuire 2017] McGuire, Morgan and Mara, Michael. “Phenomenological Transparency”.
available online
▪ [O’Donnell 2017] Yuriy O’Donnell. “Frame Graph: Extensible Rendering Architecture in Frostbite”.
available online
▪ [Schied 2017] Schied, Christoph et. al. “Spatiotemporal Variance-Guided Filtering: Real-Time Reconstruction for Path-
Traced Global Illumination”.
available online
▪ [Stachowiak 2018] Tomasz Stachowiak. “Towards Effortless Photorealism Through Real-Time Raytracing”
available online
▪ [Wihlidal 2018] Graham Wihlidal. “Halcyon + Vulkan”.
available online
124. Thanks
▪ Matthäus Chajdas
▪ Rys Sommefeldt
▪ Timothy Lottes
▪ Tobias Hector
▪ Neil Henning
▪ John Kessenich
▪ Hai Nguyen
▪ Nuno Subtil
▪ Adam Sawicki
▪ Alon Or-bach
▪ Baldur Karlsson
▪ Cort Stratton
▪ Mathias Schott
▪ Rolando Caloca
▪ Sebastian Aaltonen
▪ Hans-Kristian Arntzen
▪ Yuriy O’Donnell
▪ Arseny Kapoulkine
▪ Tex Riddell
▪ Marcelo Lopez Ruiz
▪ Lei Zhang
▪ Greg Roth
▪ Noah Fredriks
▪ Qun Lin
▪ Ehsan Nasiri,
▪ Steven Perron
▪ Alan Baker
▪ Diego Novillo
▪ Tomasz Stachowiak
125. Thanks
▪ SEED
▪ Johan Andersson
▪ Jasper Bekkers
▪ Joakim Bergdahl
▪ Ken Brown
▪ Dean Calver
▪ Dirk de la Hunt
▪ Jenna Frisk
▪ Paul Greveson
▪ Henrik Halen
▪ Effeli Holst
▪ Andrew Lauritzen
▪ Magnus Nordin
▪ Niklas Nummelin
▪ Anastasia Opara
▪ Kristoffer Sjöö
▪ Ida Winterhaven
▪ Tomasz Stachowiak
▪ Microsoft
▪ Chas Boyd
▪ Ivan Nevraev
▪ Amar Patel
▪ Matt Sandy
▪ NVIDIA
▪ Tomas Akenine-Möller
▪ Nir Benty
▪ Jiho Choi
▪ Peter Harrison
▪ Alex Hyder
▪ Jon Jansen
▪ Aaron Lefohn
▪ Ignacio Llamas
▪ Henry Moreton
▪ Martin Stich
126. S E E D / / S E A R C H F O R E X T R A O R D I N A R Y E X P E R I E N C E S D I V I S I O N
S T O C K H O L M – L O S A N G E L E S – M O N T R É A L – R E M O T E
S E E D . E A . C O M
W E ‘ R E H I R I N G !