Presented at SIGGRAPH Asia 2012 in Singapore on Friday, 30 November 14:15 - 16:00 during the "Points and Vectors" session.
Find the paper at http://developer.nvidia.com/game/gpu-accelerated-path-rendering or on Slideshare.
For thirty years, resolution-independent 2D standards (e.g. PostScript, SVG) have relied largely on CPU-based algorithms for the filling and stroking of paths. Learn about our approach to accelerate path rendering with our GPU-based "Stencil, then Cover" programming interface. We've built and productized our OpenGL-based system.
14. Complete Web
Pages Rendered
via OpenGL
without Pre-rendered Glyph Bitmaps and all on GPU
15. No tricks
Every glyph is
rendered from its
outline; no render-to-texture
Not just zoomed & rotated,
also perspective Magnify & minify
with
no transitional
pixelization
or tile popping
artifacts
synced
to refresh
rate; 60 Hz
updates
17. What is path rendering?
A rendering approach
Resolution-independent two-
dimensional graphics
Occlusion & transparency
depend on rendering order
So called “Painter’s
Algorithm”
Basic primitive is a path to be
filled or stroked
Path is a sequence of path
commands
Commands are
– moveto, lineto, curveto,
arcto, closepath, etc.
Standards
Content: PostScript, PDF, TrueType
fonts, Flash, Scalable Vector
Graphics (SVG), HTML5 Canvas,
Silverlight, Office drawings
APIs: Apple Quartz 2D, Khronos
OpenVG, Microsoft Direct2D, Cairo,
Skia, Qt::QPainter, Anti-grain
Graphics
18. Path Rendering Standards
Document Resolution- Immersive 2D Graphics Office
Printing and Independent Web Programming Productivity
Exchange Fonts Experience Interfaces Applications
Java 2D
API
OpenType Flash
QtGui
API
TrueType
Scalable Mac OS X
Vector 2D API Adobe Illustrator
Graphics
Open XML
Paper (XPS) Inkscape
HTML 5 Khronos API Open Source
19. Seminal Path Rendering Paper
John Warnock & Douglas Wyatt, Xerox PARC
Presented SIGGRAPH 1982
Warnock founded Adobe months later
John Warnock
Adobe founder
20. Reasons to
GPU-accelerate Path Rendering
Increasing screen
Multi-touch
resolutions
Increasing screen densities Power wall
More functionality with less
latency…
Immersive 2D web content
…with less power
21. Live Demo
Classic PostScript content
Complex text
rendering
Flash content New York Times rendered from
its resolution-independent form
22. Live demo!
Gradients with
blending
Dashed stroking Complex
gradient
content
Maps with text
Dragon, and
zoomed dragon 3D dice, but really
2D + gradients
23. Last Year’s SIGGRAPH Results in
Real-time
“Digital Micrography” Ron Maharik, Mikhail Bessmeltsev, Alla
Sheffer, Ariel Shamir, and Nathan Carr
SIGGRAPH 2011
“Girl with Words in
Her Hair” scene
591 paths
338,507 commands
1,244,474 scalar
coordinates
24. Our Contributions
A novel “stencil, then cover” programming interface
for path rendering, well-suited to acceleration by
GPUs
Our NV_path_rendering API
Our programming interface’s efficient
implementation within OpenGL to avoid CPU
bottlenecks
Productized, shipping in GeForce/Quadro drivers
Accompanying algorithms to handle
tessellation-free stenciled stroking of paths
standard stroking embellishments such as dashing
clipping paths to arbitrary paths
mixing 3D and path rendering
25. Notable Prior Art
Loop & Blinn 2005: Resolution
independent curve rendering using
programmable graphics hardware
Kokojima, et al. 2006: Resolution
independent rendering of
deformable vector objects using
graphics hardware
Rueda, et al. 2008: GPU-based
rendering of curved polygons
using simplicial coverings
26. CPU vs. GPU at
Rendering Tasks over Time
100% 100%
90% 90%
80% 80%
70% 70%
60% 60%
50% GPU 50% GPU
CPU CPU
40% 40%
30% 30%
20% 20%
10% 10%
0% 0%
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Pipelined 3D Interactive Rendering Path Rendering
Goal of our research is to make path rendering a GPU task
Render all interactive pixels, whether 3D or 2D or web content with the GPU
27. Our Approach Step 1 Step 2:
Stencil Cover
repeat
“Stencil, then Cover” (StC)
Map the path rendering task from a
sequential algorithm…
…to a pipelined and massively parallel task
Break path rendering into two steps
First, “stencil” the path’s coverage into stencil
buffer
Second, conservatively “cover” path
Test against path coverage determined in the 1st step
Shade the path
And reset the stencil value to render next path
28. Our Implemented System:
NV_path_rendering
OpenGL extension to GPU-accelerate path rendering
Uses “stencil, then cover” (StC) approach via OpenGL calls
Create a path object
Step 1: “Stencil” the path object into the stencil buffer
GPU provides fast stenciling of filled or stroked paths
Step 2: “Cover” the path object and stencil test against its
coverage stenciled by the prior step
Application can configure arbitrary shading during the
step
More details later
Supports the union of functionality of all major path rendering
standards
Includes all stroking embellishments
Includes first-class text and font support
Allows functionality to mix with traditional 3D and programmable
shading
30. Stencil Fill Process Visualized
Visualization
of “invisible”
stencil-only
geometry
generated
during
stencil step
Net result
of stencil
increments
and
decrements
is path’s
winding
number
32. Stroking Approach
Stroked line segments are straightforward
Drawn as rectangles into the stencil buffer
Curved stroked segments are involved
Curved segments are broken into stroked quadratic
segments
Hulls are formed around each stroked quadratic segment
An intricate fragment discard shader solves the cubic
equation for every sample to determine the sample’s
containment in the quadratic stroke segment
If contained, the sample’s stencil sample is updated
Caps & joins are also drawn into the stencil buffer
Covering geometry is computed as union of
rectangles, hulls, and cap/join geometry
33. Quadratic Stroking Hulls Visualized
Simple
quadratic
Bezier
segment,
moving
control points
Drawn
with stroking
Non-convex
hull used
for the
stroking stencil
step is
visualized
35. Excellent Geometric Fidelity for
Stroking
Correct stroking is hard
Lots of CPU
implementations GPU-accelerated OpenVG reference
approximate stroking
GPU-accelerated stroking
avoids such short-cuts
GPU has FLOPS to
compute true stroke point
containment Cairo Qt
Stroking with tight end-point curve
36. Combined for a Complex Scenes
With Many Paths
Stencil Fill Geometry Cover Fill Geometry Filling-only Result
Complete Tiger
240 paths
2,510 commands
12,174 coordinates
Stencil Stroke Geometry Cover Stroke Geometry Stroking-only Result
37. Configuration
GPU: GeForce 480 GTX (GF100)
CPU: Core i7 950 @ 3.07 GHz NV_path_rendering
Compared to Alternatives
With Release 300 driver NV_path_rendering Alternative APIs rendering same content
2,000.00 2,000.00
16x Cairo
1,800.00 1,800.00
8x Qt
1,600.00 Skia Bitmap
1,600.00 4x
Skia Ganesh FBO (16x)
2x
1,400.00 1,400.00 Skia Ganesh Aliased (1x)
1x Direct2D GPU
1,200.00 Direct2D WARP
Frames per second
Frames per second
1,200.00
1,000.00 1,000.00
Alternative approaches
800.00
800.00 are all much slower
600.00
600.00
400.00
400.00
200.00
200.00
-
-
100x100
200x200
300x300
400x400
500x500
600x600
700x700
800x800
900x900
1000x1000
1100x1100
100x100
200x200
300x300
400x400
500x500
600x600
700x700
800x800
900x900
1000x1000
1100x1100
Window Resolution in Pixels Window Resolution in Pixels
38. Configuration
GPU: GeForce 480 GTX (GF100)
CPU: Core i7 950 @ 3.07 GHz Detail on Alternatives
Same results, changed Y Axis
Alternative APIs rendering same content
250.00
2,000.00 Cairo
Cairo Qt
1,800.00 Skia Bitmap
Qt
Skia Ganesh FBO (16x)
1,600.00 Skia Bitmap 200.00
Skia Ganesh Aliased (1x)
Skia Ganesh FBO (16x)
Direct2D GPU
1,400.00 Skia Ganesh Aliased (1x)
Direct2D WARP
Direct2D GPU
1,200.00 Direct2D WARP
Frames per second
150.00
Frames per second
1,000.00
Fast, but
800.00
unacceptable
100.00
quality
600.00
400.00
50.00
200.00
-
-
100x100
200x200
300x300
400x400
500x500
600x600
700x700
800x800
900x900
1000x1000
1100x1100
200x200
500x500
800x800
1000x1000
1100x1100
100x100
300x300
400x400
600x600
700x700
900x900
Window Resolution in Pixels
Window Resolution in Pixels
40. Partial Solutions Not Enough
Path rendering has 30 years of heritage and
history
Can’t do a 90% solution and expect
Software to change
John Warnock
Trying to “mix” CPU and GPU methods doesn’t work Adobe founder
Expensive to move software—needs to be an
unambiguous win
Must surpass CPU approaches on all fronts
Performance
Quality
Functionality
Conformance to standards
More power efficient
Enable new applications Inspiration: Perceptive Pixel
41. Dashing Content Examples
Artist made
windows with
Same cake dashed line
missing dashed segment
stroking details
Technical diagrams
and charts often
Frosting on cake is dashed employ dashing
elliptical arcs with round
end caps for “beaded” look;
flowers are also dashing
All content shown
is fully GPU rendered Dashing character outlines for quilted look
42. First-class, Resolution-independent
Font Support
Fonts are a standard, first-class part of all path rendering systems
Foreign to 3D graphics systems such as OpenGL and Direct3D, but natural
for path rendering
Because letter forms in fonts have outlines defined with paths
TrueType, PostScript, and OpenType fonts all use outlines to specify
glyphs
NV_path_rendering makes font support easy
Can specify a range of path objects with
A specified font
Sequence or range of Unicode character points
No requirement for applications use font API to load glyphs
You can also load glyphs “manually” from
your own glyph outlines
43. Rendering Paths Clipped to
Some Other Arbitrary Path
Example clipping the PostScript tiger to a heart
constructed from two cubic Bezier curves
unclipped tiger tiger with pink background clipped to heart
44. Complex Clipping Example
tiger is 240 paths
cowboy clip is
the union of 1,366 paths
result of clipping tiger
to the union of all the cowboy paths
45. NV_path_rendering is more than just
matching CPU vector graphics
3D and vector graphics mix Superior quality
CPU
GPU Competitors
Arbitrary programmable shader on
2D in perspective is free paths— bump mapping
46. Mixing 3D Depth Buffering and
Path Rendering
PostScript tigers surrounding Utah teapot
Plus overlaid TrueType font rendering
No textures involved, no multi-pass
47. Live demo!
Solid
or
wireframe
teapots
Very fast
Teapots + tigers in same 3D scene
Zoom on tigers
All the detail is there
48. Handling Uncommon Path Rendering
Functionality: Projection
Projection “just works”
Because GPU does everything
with perspective-correct
interpolation
49. Example of Bump Mapping on
Path Rendered Text
Phrase “Brick wall!” is path rendered and bump
mapped with a Cg fragment shader
light source position
50. Handling Common Path Rendering
Functionality: Filtering
GPUs are highly efficient at
image filtering Qt
Fast texture mapping
Mipmapping
Anisotropic filtering
Wrap modes
Moiré
CPUs aren't artifacts
really
Cairo
GPU
51. Anti-aliasing Discussion
Good anti-aliasing is a big deal for path rendering
Particularly true for font rendering of small point sizes
Features of glyphs are often on the scale of a pixel or less
NV_path_rendering uses multiple stencil samples per pixel for
reasonable antialiasing
Otherwise, image quality is poor
4 samples/pixel bare minimum
8 or 16 samples/pixel is pretty sufficient
But 16 requires expensive 2x2 supersampling of 4x
multisampling
16x is quite memory intensive
Alternative: quality vs. performance tradeoff
Fast enough to render multiple passes to improve quality
Approaches
Accumulation buffer
Alpha accumulation
52. Real
Flash
Scene
same scene, GPU-rendered
without conflation
conflation
artifacts abound,
rendered by Skia
conflation is aliasing &
edge coverage percents
are un-predicable in general;
means conflated pixels
flicker when animated slowly
53. Improved Color Space:
sRGB Path Rendering
Modern GPUs have native
Radial color gradient example
support for perceptually-correct moving from saturated red to blue
for
sRGB framebuffer blending
sRGB texture filtering
No reason to tolerate
uncorrected linear RGB color
artifacts!
More intuitive for artists to
linear RGB
transition between saturated
control red and saturated blue has
Negligible expense for GPU to dark purple region
perform sRGB-correct
rendering
However quite expensive for
software path renderers to
perform sRGB rendering
Not done in practice
sRGB
perceptually smooth
transition from saturated
red to saturated blue
54. Trying Out
NV_path_rendering
Operating system support
2000, XP, Vista, Windows 7, Linux, FreeBSD, and Solaris
Unfortunately no Mac support
GPU support
GeForce 8 and up (easy rule: all CUDA-capable GPUs)
Most efficient on Fermi and Kepler GPUs
Current performance can be expected to improve
Shipping since NVIDIA’s Release 275 drivers
Available since summer 2011
New Release 300+ drivers have remarkable
NV_path_rendering performance
Try it, you’ll like it
There’s an SDK freely available with example code!
https://developer.nvidia.com/nv-path-rendering
55. Future Work
Using NV_path_rendering in actual web and 2D
applications
Standardizing the programming interface
Moving these algorithms to mobile devices
Path rendering test bed on Nexus 7
Notas del editor
NV_path_rendering provides a new third pipeline—in addition to the vertex and pixel pipelines—for rendering pixels