Abstract: With advances in model acquisition and procedural modeling, geometric models can have billions of polygons and gigabytes of textures. Such model complexity continues to outpace the explosive growth of CPU and GPU processing power. Brute force rendering cannot achieve interactive frame rates. Even if these massive models could fit into video memory, current GPUs can only process 10-200 million triangles per second. Interactive massive model rendering requires techniques that are output-sensitive: performance is a function of the number of pixels rendered, not the size of the model. Such techniques are surveyed, including visibility culling, level of detail, and memory management. In addition, this work presents a new out-of-core rendering algorithm that is demonstrated with a variety of HLOD rendering algorithms.
3. Project History
Boeing 777 model: ~350 million polygons.
Image from http://graphics.cs.uni-sb.de/MassiveRT/boeing777.html
00000010 of 01010110
4. Contents
Previous Work
View Frustum and Occlusion Culling
Hardware Occlusion Queries (HOQ)
Level of Detail (LOD)
Hierarchical Level of Detail (HLOD)
Out-of-Core Rendering (OOC)
00000011 of 01010110
5. Contents Continued
Implementation Work
Vertex Clustering [Rossignac93]
HLOD Tree Creation
Primary Contribution: OOC Rendering
Results
Future Work
Demos throughout
00000100 of 01010110
6. View Frustum Culling
Can be slower than brute force. When?
culled
rendered
culled
culled
rendered
rendered
00000101 of 01010110
11. Occlusion Culling
From-region or from-point
Most are conservative
Occluder Fusion
Difficult for general scenes with
arbitrary occluders. So make
simplifying assumptions:
[Wonka00] – urban environments
[Ohlarik08] – planets and satellites
00001010 of 01010110
12. Hardware Occlusion Queries
From-point visibility that handles
general scenes with arbitrary
occluders and occluder fusion
How?
Use the GPU
00001011 of 01010110
15. Hardware Occlusion Queries
Disable color and depth write
Render BV using HOQ
Enable color and depth writes
Color Buffer Depth Buffer
0001110 of 01010110
16. Hardware Occlusion Queries
Disable color and depth write
Render BV using HOQ
Enable color and depth writes
Render object based on HOQ
results
00001111 of 01010110
17. Hardware Occlusion Queries
class IQueryOcclusion
{
public:
virtual void Begin() = 0;
virtual void End() = 0;
virtual bool IsResultAvailable() = 0;
virtual unsigned int NumberOfSamplesPassed() = 0;
virtual unsigned int NumberOfFragmentsPassed() = 0;
};
00010000 of 01010110
18. Hardware Occlusion Queries
class IQueryOcclusion
{
public:
virtual void Begin() = 0;
virtual void End() = 0;
virtual bool IsResultAvailable() = 0;
virtual unsigned int NumberOfSamplesPassed() = 0;
virtual unsigned int NumberOfFragmentsPassed() = 0;
};
00010000 of 01010110
19. Hardware Occlusion Queries
class IQueryOcclusion
{
public:
virtual void Begin() = 0;
virtual void End() = 0;
virtual bool IsResultAvailable() = 0;
virtual unsigned int NumberOfSamplesPassed() = 0;
virtual unsigned int NumberOfFragmentsPassed() = 0;
};
00001000 of 01010110
20. Hardware Occlusion Queries
CPU stalls and GPU starvation
Draw o1 Draw o2 Draw o3
Draw o1 Draw o2 Draw o3
CPU
GPU
Query o1
Query o1
Draw o1
Draw o1
-- stall --
-- starve --
CPU
GPU
00010001 of 01010110
39. Optimized HLOD Refinement Driven by
HOQs [Charalambos07]
Exploit spatial and temporal
coherence for scheduling HOQs.
Predict refinement based on node’s
relative visibility from previous
frame
VMSSEi
est
= SSEi * biasi-1
00100100 of 01010110
40. Optimized HLOD Refinement Driven by
HOQs [Charalambos07]
Example prediction
Refinement stopped for this node in
previous frame
VMSSEi
est
< threshold ? Stop : Refine
Stop:
Issue query
Render without checking query
00100101 of 01010110
41. Implementation Work
3 HLOD algorithms including
[Charalambos07]
Vertex Clustering
HLOD Tree Creation
OOC Rendering
Load/Unload Rules
Rendering
Replacement Policy
Multithreading
00100110 of 01010110
42. Vertex Clustering [Rossignac93]
Fast: expected O(n)
Robustness: arbitrary topology
Capable of drastic simplification
“Easy to code”
OOC extensions [Lindstrom00]
00100111 of 01010110
43. Vertex Clustering [Rossignac93]
1. Compute per-vertex weights
11
0.8
0.50.5
2. Assign vertices to clusters
3. Identify highest weighted
vertex in each cluster
00100111 of 01010110
44. Vertex Clustering [Rossignac93]
1. Compute per-vertex weights
11
0.8
2. Assign vertices to clusters
3. Identify highest weighted
vertex in each cluster
4. Collapse and remove
degenerate triangles
00101000 of 01010110
47. HLOD Tree Creation
Input
Model (.ply, .obj)
Target triangles per leaf node
Maximum tree depth
Output
1 file per node
Normals computed at runtime
00101011 of 01010110
48. HLOD Tree Creation
Top-down
Root node:
Full AABB
Lowest Detail
00101100 of 01010110
52. visit(node)
{
if ((computeSSE(node) < pixel tolerance) ||
(not all children resident))
{
render(node);
foreach (child in node.children)
requestResidency(child);
}
else
{
foreach (child in node.children)
visit(child);
}
}
Previous Work: Out-of-Core
Based on [Ulrich02]
Prefetch
Need all
children
To render
To refine
00110000 of 01010110
53. Previous Work: Out-of-Core
[Varadhan02]
Requires full skeleton in memory
No occlusion culling
No front-to-back sorting
Image From [Varadhan02]
00110001 of 01010110
54. Previous Work: Out-of-Core
[Corrêa03]
PLP in separate thread
Requires full skeleton in memory
No LOD
00110010 of 01010110
55. Out-of-Core
Replacement Policy?
LRU?
Can’t refine when one child is removed
Remove deepest child in parent’s tree?
00110011 of 01010110
56. OOC Rendering
Benefits of our algorithm
No full HLOD skeleton
Works with HOQs
Refinement with a subset of children
Replacement policy maximizes detail
near the viewer
Multithreaded
00110100 of 01010110
63. OOC Rendering: Load/Unload
Result:
If a node is not a skeleton, none of its
ancestors are skeletons. In other
words, if a node has geometry loaded,
so does all of its ancestors.
00111011 of 01010110
78. Selected Results (lol)
Load Time
10 Blocks in Pompeii
5,646,041 triangles
Time in seconds
Full model 5.2
Out-of-Core 0.05
01001010 of 01010110
84. Statistics
Lines of Code
GUI: 420
Unit Tests: 1,720
HLOD Creation: 4,600
Rendering: 4,500
Time Spent
Coding: 8 weeks “fulltime.” 3 last
spring, 5 this fall.
Plus reading, writing, slides, and
logistics.
01010000 of 01010110
85. Future Work
Improve tree creation
Polygonal simplification
Splitting planes
Fill cracks
Optimal disk layout
Better occlusion performance
Multiple volumes or occlusion-
preserving low LOD
Optimize use of clipping planes
01010001 of 01010110
86. Future Work
Don’t require ancestors to have
geometry loaded.
Much better use of memory
More complicated rendering
More rendering artifacts
01010010 of 01010110
87. Future Work
Cache Management
Aggressively remove nodes
Replacement Policy: Average detail
instead of best up close
01010011 of 01010110
89. Future Work
True Usefulness
Textures
Picking on individual objects
Test with truly massive models
01010101 of 01010110
90. Future Work
Today
Mad Mex Hour Happy. Now – 6:30pm
Saturday, February 7th
Graduation Party. My House. 3pm.
01010110 of 01010110
Notas del editor
30-60 gig on disk. 12 CDs
Need spatial coherence
Spatial data structures exploit spatial coherence.
Visit nodes in front to back order. Useful for early-z and occlusion culling.
[Wonka00] presents per-region visibility with occluder fusion in urban environments. Assuming a 2.5D scene, all buildings must be perpendicular to the ground plane and connected to the ground.
More recently, [Ohlarik08] presents efficient from-point occlusion culling for scenes with large spherical occluders (e.g. planets) and small occludees (e.g. satellites). Occlusion culling is reduced to horizon distance tests.
The bounding volume usually has far less geometry.
Expensive shaders required to render the object are not generally required to render the bounding volume.
When only depth testing is enabled, as is the case when rendering the bounding volume, today’s GPUs use a higher-performance rendering path.
Walk up tree to find first ancestor with geometry.
Each node keeps a count of the number of children it has with geometry. Refinement stops at nodes with a count of zero since refinement cannot improve image quaility.