Unraveling Multimodality with Large Language Models.pdf
TressFX The Fast and The Furry by Nicolas Thibieroz
1. TRESSFX
THE FAST AND THE FURRY
AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM
NICOLAS THIBIEROZ
WORLDWIDE GAMING ENGINEERING MANAGER, AMD
2. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM2
TRESSFX: NEXT-GENERATION HAIR AND FUR RENDERING
The time for next-gen quality is now
Tomb Raider pioneered next-gen hair
‒ Includes PS4/XB1
Users expect this level of quality for next-gen titles
You need to start thinking about this!
3. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM3
WHAT MAKES GOOD HAIR/FUR?
Basic Rendering Antialiasing Antialiasing
+ Self Shadowing
Antialiasing
+ Self Shadowing
+ Transparency
Demo
All three components are a must to ensure high quality
Transparency in particular is essential to next-gen visuals
‒ Requires an Order-Independent Transparency (OIT) solution
4. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM4
ISOLINE TESSELLATION FOR HAIR/FUR? 1/2
Isoline tessellation has two tessellation factors
‒ First is line density (lines per invocation)
‒ Second is line detail (segments per line)
In theory provides easy LOD system
‒ Variable line density and detail by increasing both tessellation factors based on distance
Tess = (1,1) Tess = (2,1) Tess = (2,2) Tess = (2,3) Tess = (3,3)
5. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM5
ISOLINE TESSELLATION FOR HAIR/FUR? 2/2
In practice isoline tessellation is not cost effective for this scenario
Lines are always 1-pixel thick
‒ Need Geometry Shader to extrude them into triangles for smooth edges
‒ Major impact on performance!
‒ Alternative is to enable MSAA
‒ Most engines are deferred so this causes a large performance impact
‒ No extrusion for smoothing edges and no MSAA = poor quality!
Bottom line: a pure Vertex Shader solution is faster
‒ Curvature is rarely a problem (dependant on vertices/strands at authoring time)
‒ If needed LOD benefit can be done in Vertex Shader
6. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM6
TRESSFX RENDERING PIPELINE
TressFX 2 uses a deferred approach for best performance
Three main steps
STEP 1: Hair simulation
STEP 2: Store fragment properties into buffers
STEP 3: Fetch fragment properties, sort, selective shading and render
‒ Full shading on K-frontmost fragments
‒ “Tail” fragments are shaded with a simpler light equation and shadowing algorithm
7. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM7
TRESSFX RENDERING PIPELINE
STEP 1: HAIR SIMULATION
CSCSCS
Input Geometry
(SRV)
Post-simulation
geometry (UAV)
Simulation
parameters
Pre-simulation line
segments (model space)
Post-simulation line
segments (world space)
Simulation compute shaders
Edge length constraint
Local shape constraint
Global shape constraint
Not always needed for fur
Model Transform
Collision Shape
Not always needed for fur
External Forces (wind, gravity, etc.)
Input model is a collection of line segments (each segment composed of up to 64 vertices)
Optionally divided into “master strands” and “slave strands” to optimize simulation performance
‒ Only master strands are simulated (e.g. 1:4 ratio)
‒ Slave strands use master strand simulation results with added noise
‒ Virtually no difference from full-scale simulation but much better simulation performance!
‒ Master:slave simulation ratio can also vary with distance for even better performance
Demo
8. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM8
TRESSFX RENDERING PIPELINE
STEP 2: STORE FRAGMENT PROPERTIES INTO BUFFERS
VS
World
space
Index
Buffer
Indexed triangle list
10
1
2
3 2
4
0
5
Extrusion into
triangles
9. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM9
LINE SEGMENT EXTRUSION INTO TRIANGLES
A lot of vertices go through rendering high-quality hair or fur!
‒ Geometry processing can therefore be a significant bottleneck
In previous versions of TressFX extrusion was done in Geometry Shader (don’t do it!) and then VS with Draw()
Much faster performance was obtained with pure VS solution and precomputed index buffers
‒ Maximizes post vertex cache use!
DrawIndexed() method
Indexed triangle list = { ( 0, 1, 2 ), (2, 1, 3 ), ( 2, 3, 4 ), (4, 3, 5 ), ( … ) };
Line segments Expanded quads
10
1
2
3 2
4
0
5
1,4
Draw() method
Line segments Expanded quads
0
1
2
3,5
6
2,3
7,10
8,9
0
11
Triangle list = { ( 0, 1, 2 ), ( 3, 4, 5 ), ( 6, 7, 8 ), (9, 10, 11 ), ( … ) };
10. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM10
TRESSFX RENDERING PIPELINE
STEP 2: STORE FRAGMENT PROPERTIES INTO BUFFERS
Antialiasing
VS PS
Homogeneous
clip space
World
space
Index
Buffer
Indexed triangle list
10
1
2
3 2
4
0
5
Extrusion into
triangles
11. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM11
ANTIALIASING
Antialiasing (aka “coverage”) using analytical method
‒ This is NOT Multisampling Anti-Aliasing!
Compute pixel coverage on edges of hair strand triangles and convert it to
an alpha value
Alpha value fades out based on distance from pixel centre to strand axis
Similar principle to Emil Persson’s phone wire Anti-Aliasing
http://www.humus.name/Articles/Persson_GraphicsGemsForGames.pdf
12. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM12
TRESSFX RENDERING PIPELINE
STEP 2: STORE FRAGMENT PROPERTIES INTO BUFFERS
Antialiasing
depth
tangent
coverage
next
VS PS
Homogeneous
clip space
World
space
Null RT
Stencil
PPLL
UAV
Head
UAV
Index
Buffer
Indexed triangle list
10
1
2
3 2
4
0
5
Extrusion into
triangles
13. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM13
PER-PIXEL LINKED LISTS
Head UAV
‒ Each pixel location has a “head pointer” to a linked list in the PPLL UAV
PPLL UAV
‒ As new fragments are rendered, they are added to the next open location in the PPLL (using UAV counter)
‒ A link is created to the fragment pointed to by the head pointer
‒ Head pointer then points to the new fragment
// Retrieve current pixel count and increase counter
uint uPixelCount = LinkedListUAV.IncrementCounter();
uint uOldStartOffset;
// Exchange indices in LinkedListHead texture corresponding to pixel location
InterlockedExchange(LinkedListHeadUAV[address], uPixelCount, uOldStartOffset);
// Append new element at the end of the Fragment and Link Buffer
Element.uNext = uOldStartOffset;
LinkedListUAV[uPixelCount] = Element;
depth
tangent
coverage
next
PPLL
UAV
Head
UAV
Memory requirements can be large!
‒ Width * Height * Average overdraw * sizeof (PPLL structure)
‒ Can use tiling approach in memory-constrained situations
14. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM14
TRESSFX RENDERING PIPELINE
STEP 3: FETCH FRAGMENTS, SORT, SELECTIVE SHADING AND RENDER
VS PS
Stencil
Head
UAV
PPLL
UAV
Lighting
Full Screen
Quad/Triangle
15. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM15
LIGHTING
Different options available
‒ Kajiya-Kay hair lighting model
‒ Marshner model
‒ Anything else that looks good!
Fragment properties storage requirements may limit your
options!
TressFX 2 sample uses an approximation of the Marchner
technique when rendering two highlights
‒ Unique fragment properties: depth, tangent vector
Primary Highlights
Secondary Highlights
16. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM16
TRESSFX RENDERING PIPELINE
STEP 3: FETCH FRAGMENTS, SORT, SELECTIVE SHADING AND RENDER
VS PS
Stencil
Head
UAV
PPLL
UAV
Lighting Shadows
Full Screen
Quad/Triangle
17. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM17
SHADOWS
Three different cases
Hair self-shadowing
‒ Essential component to give next-gen volumetric quality look
‒ Simplified Deep Shadow Map technique
Hair casting shadows on body & environment
‒ Body: Need a very soft look at close range (blur shadow map)
‒ Environment: render (possibly simplified) hair geometry into cascaded shadow map
Environment casting shadows on hair
‒ Sample environment shadow map at hair fragment rendering time
18. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM18
TRESSFX RENDERING PIPELINE
STEP 3: FETCH FRAGMENTS, SORT, SELECTIVE SHADING AND RENDER
VS PS
Stencil
Head
UAV
PPLL
UAV K frontmost fragment:
full shading, sorting
and manual blending
Lighting Shadows
Full Screen
Quad/Triangle
Tail fragments:
cheap shading,
no sorting and
manual blending
19. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM19
SELECTIVE FRAGMENT SHADING
THIS IS WHERE THE MEAT OF THE CODE OCCURS!
// Go through the rest of the linked list, and keep closest k fragments but
// not in sorted order
[allow_uav_condition]
for(int l=0; l < g_iMaxFragments; l++)
{
if(pointer == NULLPOINTER) break;
int id = 0;
float max_depth = 0;
// Find the furthest node in array
[unroll]for(int i=0; i<KBUFFER_SIZE; i++)
{
float fDepth = kBuffer[i].depth;
if(max_depth < fDepth)
{
max_depth = fDepth; id = i;
}
}
// get the start of the linked list from the head pointer
uint pointer = LinkedListHeadSRV[In.vPosition.xy];
// Copy first K fragments from PPLL into KBuffer[]
NODE Kbuffer[KBUFFER_SIZE];
for(int p=0; p<KBUFFER_SIZE; p++)
{
if (pointer != NULLPOINTER)
{
kBuffer[p] = LinkedListSRV[pointer];
pointer = LinkedListSRV[pointer].uNext;
}
}
// If linked list node is nearer than the furthest one in the local array
// exchange the node in the local array for the one in the linked list
NODE Node = LinkedListSRV[pointer];
if (max_depth > Node.depth)
{
SWAP(Node, Kbuffer[i]);
}
// Do simple shading and shadowing for nodes not part of the K closest fragments
fragmentcolor = ComputeSimpleShading(Node);
// Out of order blending
fcolor.xyz = mad(-fcolor.xyz, fragmentColor.w, fcolor.xyz) +
fragmentColor.xyz * fragmentColor.w;
fcolor.w = mad(-fcolor.w, fragmentColor.w, fcolor.w);
// Retrieve next node pointer
pointer = LinkedListSRV[pointer].uNext;
}
// Blend the k nearest layers of fragments from back to front, where k = KBUFFER_SIZE
for(int j=0; j<KBUFFER_SIZE; j++)
{
int id = 0;
float max_depth = 0;
// Find the furthest node in the array
for(int i=0; i<KBUFFER_SIZE; i++)
{
float fDepth = kBuffer[i].depth;
if(max_depth < fDepth)
{
max_depth = fDepth; id = i;
}
}
// Take this node out of the next search
Node = KBuffer[id]; KBuffer[id] = (NODE)0;
// Do high quality shading and shadowing
fragmentcolor = ComputeHighQualityshading(Node);
// Blend fragment color
fcolor.xyz = mad(-fcolor.xyz, fragmentColor.w, fcolor.xyz) +
fragmentColor.xyz * fragmentColor.w;
fcolor.w = mad(-fcolor.w, fragmentColor.w, fcolor.w);
}
return fcolor;
20. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM20
TRESSFX RENDERING PIPELINE
STEP 3: FETCH FRAGMENTS, SORT, SELECTIVE SHADING AND RENDER
VS PS
Stencil
Head
UAV
PPLL
UAV
Render target
K frontmost fragment:
full shading, sorting
and manual blending
Lighting Shadows
Full Screen
Quad/Triangle
Tail fragments:
cheap shading,
no sorting and
manual blending
21. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM21
TRESSFX PERFORMANCE
FAST AND FURRY
High number of fragments required for quality look
Main bottleneck is shading all those fragments
‒ Not per-pixel linked list traversal!
Selective shading approach allows significant performance savings with minor or negligible quality tradeoffs
Technique Cost
Out of order, no shading 1.31 ms
Out of order, shading 2.80 ms
Deferred PPLL, selective shading 2.13 ms
Shading cost is
~ 1.5 ms
24% faster
Fur model with ~130,000 fur strands
Running on AMD Radeon 7970 @ 1080p
Distance Sim LOD
Disabled
Sim LOD
Enabled
Close range 1.01 ms 1.01 ms
Medium range 1.01 ms 0.70 ms
Long range 1.01 ms 0.37 ms
Simulation LOD
Distance-adaptive Shading and Simulation LOD further improves performance
“K frontmost fragments” value can inversely scale with distance
Distance Shading LOD
Disabled
Shading LOD
Enabled
Close range 3.26 ms 3.26 ms
Medium range 3.23 ms 1.77 ms
Long range 2.52 ms 0.64 ms
Shading LOD
22. | TRESSFX THE FAST AND THE FURRY | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM22
CONCLUSION AND QUESTIONS?
Next-gen hair/fur look at real-time performance is possible now!
Fast:
‒ Variable ratio master/slave compute simulations
‒ Vertex Shader extrusion of segments into triangles (do not use tessellation + GS)
‒ Deferred rendering with selective shading
‒ Distance-based shading and simulation LOD
‒ Optimized shaders!
Furry:
Full and free access to TressFX 2 SDK sample, code and documentation at:
http://developer.amd.com/tools-and-sdks/graphics-development/amd-radeon-sdk/
@NThibieroznicolas.thibieroz@amd.com
Simplified Deep Shadow Map technique: take the difference in depth from shadow map depth and fragment depth. The larger this difference the deeper (darker) the shadow. Also uses additional variables.