Philip Hammer of DECK13 Interactive GmbH presented techniques used in rendering The Surge. Key points included: using physically based rendering with GGX BRDF; clustered deferred rendering with lighting computed on GPU; deferred decals for details; and optimizing shaders for AMD GCN occupancy. Future work focuses on new deferred approaches like bindless decals, improved materials, and migrating to Vulkan and DX12.
2. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Introduction
● DECK13 Interactive released “The Surge” in 2017
○ New IP, new publisher
○ Overhauled tech (Fledge Engine / 3th Generation)
○ Award Winning: Best German Game, Best Graphics, Best PC-/Console-Game (Deutscher Entwicklerpreis 2017)
○ Our most ambitious game from DECK13
3. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Introduction
● Team of around 70 people in Frankfurt
○ Tech Department: ~11 people (engine + game code)
○ Art- & Sound-Outsourcing
● Myself
○ Since 2006 @ DECK13
○ Working on rendering / engine / graphics / shaders
○ Worked on The Surge, Lords of the Fallen, Venetica, Ankh, Jack Keane, etc.
● The results and techniques presented in this article is
the work of many people in the Deck13 Tech department.
4. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
● Fledge Gen 1 (2009: Blood Knights, Tiger & Chicken)
○ PS3, Xbox 360, PC / D3D9, iOS (iPad 2 and up)
○ Deferred Rendering, Direct Lighting only, Minimal Multithreading
● Fledge Gen 2 (2012: Lords of the Fallen)
○ PS4, Xbox One, PC / D3D11
○ Volumetric Lighting, Direct & Indirect Lighting, Task-based multithreaded rendering
● Fledge Gen 3 (2014: The Surge)
○ PS4 (+Pro), Xbox One (+X), PC / D3D11
○ Physically-based rendering, Clustered Deferred Rendering, GPU Particles
● Fledge Gen 4 (2017/2018: The Surge 2)
○ Vulkan, D3D12
○ Currently in the making .. stay tuned
Tech Evolution
5. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
● Fledge Gen 1 (2009: Blood Knights, Tiger & Chicken)
○ PS3, Xbox 360, PC / D3D9, iOS (iPad 2 and up)
○ Deferred Rendering, Direct Lighting only, Minimal Multithreading
● Fledge Gen 2 (2012: Lords of the Fallen)
○ PS4, Xbox One, PC / D3D11
○ Volumetric Lighting, Direct & Indirect Lighting, Task-based multithreaded rendering
● Fledge Gen 3 (2014: The Surge)
○ PS4 (+Pro), Xbox One (+X), PC / D3D11
○ Physically-based rendering, Clustered Deferred Rendering, GPU Particles
● Fledge Gen 4 (2017/2018: The Surge 2)
○ Vulkan, D3D12
○ Currently in the making .. stay tuned
Tech Evolution
Today’s topics
6. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Tech Evolution
● The Surge Tech (Gen 3)
○ Stable Framerate across all platforms
PS4: 1080p @ 30 FPS
PS4 Pro: 1620p @ 30 FPS or 1080p @ 60 FPS
Xbox One: 900p @ 30 FPS
Xbox One X: 1800p @ 30 FPS or 1080p @ 60 FPS
○ Physical-Based Rendering
○ Clustered Deferred Rendering
○ Volumetric Lighting
○ GPU Particles
○ Screen-space Reflections
○ etc.
● New things in the making (Gen 4) - short peak into the future towards the end
7. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Physical-based Rendering
● Switched from (non-PBR) Blinn-Phong to
GGX Cook-Torrance BRDF [1]
○ De-facto industry standard.
○ More material data to drive the BRDF
● Artists needed to adapt (Workflow, Tools, Mindset)
○ Lots of pitfalls (no arbitrary texture data)
○ Adoption process was rather unproblematic -
most tools (Substance, Marmoset) already provide PBR workflow
● We use “Metalness-Workflow”
○ Artist provide Albedo, Normal, Roughness and Metalness textures
○ Metalness is a mask to treat the albedo differently in specular lighting
8. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Physical-based Rendering
● Direct Lighting: 100% dynamic lights
○ 16 shadowmaps rendered into atlas (4kx4k - 8kx8k, D16_FLOAT)
○ If cap reached, the shadowmap isn’t updated anymore and
virtually becomes static
● Image-based lighting
○ Precomputed, parallax corrected environment probes (Artist placed)
○ Specular probe is 256x256 cubemap (BC6_UFLOAT)
with GGX filtered importance sampled mip chain [2]
○ Diffuse lighting is simply the 6th mip level of probe
(“incorrect”, but visually equivalent with proper irradiance)
○ IBL pass can be modified with simple, multiplicative “ambient lights” [3]
9. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Physical-based Rendering
● G-Buffer breakdown
10. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Physical-based Rendering
● G-Buffer breakdown
11. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Physical-based Rendering
X Y Z W
RT0 (8:8:8:8) Albedo RGB Material-ID
RT1 (10:10:10:2) VS Normal XYZ -
RT2 (8:8:8:8) Roughness Metalness Occlusion [shared]
RT3 (16:16) Motion Vectors XY - -
12. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Physical-based Rendering
● Material-ID indexes directly into StructuredBuffer to query per-material data
○ Save G-Buffer space
● [shared] - per-pixel context dependent
○ mutual exclusive material data
○ based on per-material data
○ Emissive Mask
■ Defines whether or not to interpret albedo as emissive
■ Emissive combined in final “combine” pass
■ Effectively saves dedicated emissive channel
○ Translucency
13. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Clustered Deferred Rendering
● Switch from rasterization-based light volume rendering to full (async) compute-based approach
○ Low CPU overhead
■ Light culling runs entirely on GPU
■ Filling a buffer with light infos instead of dispatching thousands of drawcalls
○ Advantages on GPU
■ No need to fetch G-Buffer for every light
○ Async Compute: Lighting runs in parallel to shadow rendering (at least on consoles)
○ But: many more optimizations necessary to get better perf
● Could render environment probes in the same pass
○ Environment probes are still clustered but rendered in a separate (pixelshader) pass together with SSR
14. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Clustered Deferred Rendering
● Divide view frustum into a 3D grid
○ In our case: 16 x 8 x 24
● Culling: Assign lights to grid cells
○ Upload light culling info to GPU (StructuredBuffer with Position, AABB, etc.)
○ Create list of light indices for each cell (single large uint buffer)
● Dispatch lighting compute shader
○ In fact we dispatch twice: unshadowed and shadowed lights
○ Unshadowed can run in parallel with shadowmap generation
● Can use cluster information also for forward rendering
○ We do this for our lit transparent objects
○ Simply compute grid cell index for a position and query light list
15. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Deferred Decals
● Decals play a major role in our environment art
○ Static: Logos/Signs, Material Layers (Sand, Water Puddles, Rust, etc.), Color Variations
○ Dynamic: Blood, Explosion Marks, etc.
● Extremely flexible
● Break uniform look of heavily instanced scenes
● Adds lot of large- and small-scale details
20. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Deferred Decals
● Modifies G-Buffer by alpha-blending onto it
○ Therefore, lighting is “free” since it’s done afterwards
● 2 methods for tangentspace reconstruction
○ Surface Normal (use G-Buffer normal)
○ Planar (use decal projection direction)
● Full PBR support + many per-decal features (add. Mask, UV modifiers, etc.)
● Implementation rasterization-based deferred
○ Rasterize geometry (boxes) for each decal
○ CPU bottleneck with large number of decals
21. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Deferred Decals
● Common issue with Deferred Decals: Wrong Mip Selection due to screenspace gradients
○ Problem: Texture leaks around depth discontinuities
22. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Deferred Decals
● Common issue with Deferred Decals: Wrong Mip Selection due to screenspace gradients
○ Problem: Texture leaks around depth discontinuities
○ Common solution: Use highest mip
■ Causes flickering in distance due to oversampling (no mips)
■ Texture cache hit
○ Our solution: Use mip0 only with large depth discontinuities
// Sample 2 quads
const float4 d0 = depthSampler.Gather(sampler_point_clamp, screenUV, int2(-1, -1));
const float4 d1 = depthSampler.Gather(sampler_point_clamp, screenUV, int2(0, 0));
const float4 dCross = float4 (d0.z, d0.y, d1.y, d1.z);
const float dC = d.w;
// Find suitable neighbor screen positions in x and y so we can compute proper gradients
// Select based on the smallest different in depth
const bool useFirstMip = any(abs(dC.xxxx - dCross) > 0.001);
if (useFirstMip)
albedoTex.SampleLevel(..);
else
albedoTex.Sample(..);
d0.x d0.y
d0.z d0.w
d1.x d1.y
d1.z d1.w
23. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
“Object Decals”
● Alternative Term: “Blend Meshes”
● Alpha-Blend arbitrary meshes on the G-Buffer
○ Artists can create simple plane-”decals” with custom UV setup
○ Efficiently add small, high-res details like panels, rivets, LED, etc.
○ Works also on skinned objects (e.g. logos on Exo-Gear)
1 Base G-Buffer Pass (solid)
2 Object Decal Pass (alpha-blend)
3 Deferred Decal Pass (alpha-blend)
24. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Next Decals / Fledge Gen 4
● “Bindless” Decals
○ Analogous to clustered deferred lighting
■ Culling & rendering happens entirely on the GPU
■ Collect info about all visible decals in a buffer
■ Render all decals before lighting in the same compute shader
○ Decal info stores texture IDs (UINT32) to index directly into DescriptorSet / DescriptorTable
○ Blending not restricted to alpha-blending anymore (linear interpolation)
■ “Geometric” normal blending possible [4]
■ Replacing layered materials with decals is now feasible
○ Availability of interpolated vertex-normals in G-Buffer improves T-Space reconstruction
○ Currently in active development
25. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Optimizing for Occupancy / GCN
● GCN Hardware wants saturated CU Units
○ Huge lighting shader uses a lot of general purpose registers
if not structured carefully
● Reducing register usage (VGPR/SGPR) can be a huge win
○ Especially for long, ALU heavy shaders such as lighting
○ Minimize register lifetime
○ Look at the data and iterate
■ runtime profilers
■ static shader code analysis statistics
● Goal: Want min. 40% GCN Wave Occupancy on
PS4 and Xbox One (for lighting compute shader)
26. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
Optimizing for Occupancy / GCN
● Example: Split light type loops
○ Different light types uses different data
■ Shadowed lights use shadow projection matrices, shadowmaps, etc.
■ Image projectors use image projection matrices, images, etc.
■ Boxlights must check bounds differently
■ etc.
○ Shader can free up register usage if structured well
for each light in lightbuffer
if light.type == POINT
// do pointlight calculations
if light.type == SPOT
// do spotlight calculations
else if light.type == SPOT_SHADOWED
// do shadowed spotlight calculated
end
for each light in lightbuffer_point
// do pointlight calculations
end
for each light in lightbuffer_spot
// do spotlight calculations
end
for each light in lightbuffer_spot_shadowed
// do shadowed spotlight calculations
end
27. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
What’s next ?
● Currently working on Fledge Gen 4
○ Always improving tech iteratively
○ Always keep existing systems “alive”
○ Parallel Development of new systems / breaking changes
● Spread knowledge
○ Weekly presentation meeting (tech internal)
● Leap to new APIs
○ Vulkan, DirectX 12
28. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
What’s next ?
● New low-level renderer design
○ Better match the new APIs (no more state-driven)
○ More low-level control such as explicit resource syncs, GPU memory management, etc.
○ Async-Compute also on PC
○ More C-style, more data-oriented
○ “Do as little as possible during render-loop” aka “prebake as much as we can”
■ Setting DescriptorSets, Map/Unmap GPU memory, etc.
○ Goal: Rendering must not be a CPU performance bottleneck
● Better ingame-profiling for content creators
● Better tools for artists and game designers
29. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
What’s next ?
● Improving specific rendering subsystems
○ Switch to physically based inverse square falloff (lumen units)
○ Improved IBL system (e.g. split irradiance and specular probes)
○ Unified volumetric fog / lighting (“lit fog” vs. “volumetric lighting”)
○ Bindless Decals
○ New material system
■ More flexibility for custom shaders / FX Materials
■ Better fit for the new rendering backend interface
○ Improved postprocessing, Antialiasing, HDR tonemapping / color correction
30. Thank you for
your attention!
DECK13 is hiring!
● Tools Programmer
● Concept Environment Artist
● VFX Artist
● etc.
@philiphammer0
phammer@deck13.com
linkedin.com/in/philip-hammer-430baa6
32. Dissecting the Rendering of The Surge
Quo Vadis Berlin 2018
References
[1] Walter et al., "Microfacet Models for Refraction through Rough Surfaces"
[2] Karis, “Real Shading in Unreal Engine 4”, Siggraph 2013
[3] Schulz, Mader, “Rendering Techniques in Ryse: Son of Rome”, Siggraph 2014
[4] Barré-Brisebois, Hill, "Blending in Detail"
http://blog.selfshadow.com/publications/blending-in-detail/