SlideShare una empresa de Scribd logo
1 de 206
Descargar para leer sin conexión
1
2
Mark J. Kilgard, NVIDIA
Kurt Akeley, Microsoft Research
13 December 2008
Singapore
Modern OpenGL:
Its Design and Evolution
3
Introductions
4
Kurt Akeley
• Led development of OpenGL at Silicon Graphics (SGI)
• Co-founded SGI
• Lead development of SGI’s high-end graphics hardware
• Co-author of OpenGL specification
• Returned to Stanford University to complete Ph.D.
• Co-developed Cg “C for graphics” language at NVIDIA
• Principal Researcher, Microsoft Research Silicon Valley
• Spent time at Microsoft Research Asia in Beijing
• Member of US National Academy of Engineering
5
Mark Kilgard
• Principal System Software Engineer, NVIDIA, Austin, Texas
• Developed original OpenGL driver for 1st
GeForce GPU
• Specified many key OpenGL extensions
• Works on Cg for portable programmable shading
• NVIDIA Distinguished Inventor
• Before NVIDIA, worked at Silicon Graphics
• Worked on X Window System integration for OpenGL
• Developed popular OpenGL Utility Toolkit (GLUT)
• Wrote book on OpenGL and X, co-authored Cg Tutorial
6
Marc Levoy
• Moderator for our facilitated discussion
• Professor of Computer Science and Electrical
Engineering
• Stanford University
• SIGGRAPH Computer Graphics Achievement Award
• ACM Fellow
7
Course Schedule
• Modern OpenGL (Kilgard)
• OpenGL’s evolution: a personal retrospective (Akeley)
• Writing better OpenGL (Kilgard)
• Implementing OpenGL (Kilgard)
• OpenGL’s future evolution (Kilgard)
• OpenGL in Context (Akeley, Kilgard, Levoy)
• Facilitated conversation
– Mid-session break –
8
Check Out the Course Notes (1)
• Look to www.opengl.org web site for our final slides
• New Material
• “An Incomplete History of OpenGL” (Kilgard)
• How the OpenGL graphics system developed
• “Using Vertex Buffer Objects Well” (Kilgard)
• Learn how to use Vertex Buffers objects for high
vertex processing rates
9
Check Out the Course Notes (2)
• Paper Reprints
• OpenGL design rationale from its specification co-
authors (Segal, Akeley)
• Realizing OpenGL: two implementations of one
architecture (Kilgard)
• Graphics hardware: GTX, RealityEngine,
InfiniteReality, GeForce 6800
• Key developments in graphics hardware design
over last 20 years
• GPU Programmability: “User-Programmable Vertex
Engine” and “Cg” SIGGAPH papers
• “How GPUs Work” (Luebke, Humpherys)
10
Modern OpenGL
Mark Kilgard
Principal System Software Engineer
NVIDIA
11
Modern OpenGL
• History
• How did OpenGL get where it is now?
• Present
• Version 3.0
• Functionality beyond 3.0
12
An Overview History of OpenGL
• Pre-history 1991
• IRIS GL, a proprietary Graphics Library by SGI
• OpenGL, an open standard for 3D
• Focus: procedural hardware-accelerated 3D graphics
• Governed by Architectural Review Board (ARB)
• Extensibility planned into design
• Competition
• Proprietary APIs (1991-1995)
• PHIGS & PEX for X Window System (1992-1997)
• Microsoft’s Direct3D (1998-)
13
OpenGL’s Pre-history
IRIS GL 1
Window system: MEX
IRIS GL 2
Window system: MEX
Operating system: UNIX
IRIS GL 3
Window system: NeWS/X11
Operating system: IRIX 3.x
IRIS GL 4
Window system: Native X11
Operating system: IRIX 4.3
OpenGL 1.0
Window system: Native X11 with GLX
Operating system: IRIX 5.1
1991
1993
1988
1986
1983
First work on
GL 5.0 proposal
1989
Dates are for
shipping commercial
SGI implementation
1983-2008 = 25 years
14
OpenGL’s Design Philosophy
• High-performance
• Assumes hardware
acceleration
• Defined by a specification
• Rather than a de-facto
implementation
• Rendering state machine
• Procedural
• Not a window system,
not a scene graph
• No initial sub-setting
• Extensible
• Data type rich
• Cross-platform
• Window system-
independent core
• X Window System,
Microsoft Windows,
OS/2, OS X, etc.
• Multi-language bindings
• C, FORTRAN, etc.
• Not merely an API,
rather a system
15
Timeline of OpenGL’s Development
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
16
Competitive 3D APIs
• OpenGL has always existed in
competition with other APIs
• Strengthened OpenGL by driving
feature parity
• OpenGL’s competitive strengths:
1. Cross platform, open process
2. API stability, extensibility
3. Clean initial design & specification
1992 1994 1996 1998 2000 2002 2004 2006 2008
Proprietary Unix workstation 3D APIs
XGL
Doré
Starbase
IRIS GL
X Consortium 3D standard
PEX
Microsoft Direct3D
DirectX 3
DirectX 5
DirectX 6
DirectX 7
DirectX 8
DirectX 9
DirectX 10
17
OpenGL 1.0
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
•Immediate mode
•Vertex transformation and lighting
•Points, lines, polygons
•Stippling, wide points and lines
•Bitmaps, image rectangles, and pixel reads
•Pixel store and transfer
•1D and 2D textures, fog, and scissor
•Display lists and evaluators
•RGBA and color index color models
•Color, depth, stencil, and accumulation buffers
•Selection and feedback modes
•Queries
18
OpenGL State Machine
• From OpenGL 3.0 specification, unchanged since 1.0
19
SGI “Classic” Hardware View of OpenGL
3D Application
or Game
• Entirely fixed-function, no programmability
• High-end SGI hardware manifested functionality
in distinct chips
OpenGL API
Front End
Vertex
Assembly
Vertex
Transform & Lighting
Primitive Assembly,
Clipping, Setup,
and Rasterization
Texture &
Fog
Texture Fetch
Raster
Operations
Framebuffer Access
Memory Interface
Graphics Hardware
Boundary
1992
Graphics data flow
Memory operations
Fixed-function unit
Programmable unit
20
OpenGL 1.1
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
• Vertex arrays
• Texture objects
• Texture internal formats
• Texture sub-image updates
• Texture proxies
• Copy framebuffer-to-texture
• Polygon offset
• RGBA logical operations
21
The Look of OpenGL 1.1
SGI skyfly demoSGI skyfly demo
StenciledStenciled
shadow volumesshadow volumes
Ideas in MotionIdeas in Motion
22
OpenGL 1.2
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
• 3D textures
• Texture edge clamp wrap mode
• Texture level-of-detail clamping
• BGRA component order
• Packed pixel formats
• Imaging subset (optional)
• Normal rescaling
• Separate specular
• Vertex array draw elements range
23
Akeley’s (Modernized) OpenGL Data Flow
vertex
shading
rasterization
& fragment
shading
texture
raster
operations
framebuffer
pixel
unpack
pixel
pack
vertex
puller
client
memory
pixel
transfer glReadPixels / glCopyPixels / glCopyTex{Sub}Image
glDrawPixels
glBitmap
glCopyPixels
glTex{Sub}Image
glCopyTex{Sub}Image
glDrawElements
glDrawArrays
selection / feedback / transform feedback
glVertex*
glColor*
glTexCoord*
etc.
blending
depth testing
stencil testing
accumulation
storage
operations
24
OpenGL 1.2 Imaging Subset
Color Table
Convolution
(separable or general)
Post-convolve
Scale & Bias
Post-convolve
Color Table
Color Matrix
Post-color matrix
Scale & Bias
Post-color matrix
Color Table
Histogram
Min-max
Look-up Table
(RGBA-to-RGBA)
Look-up Table
(Index-to-RGBA)
Scale & Bias Shift & Add
Index pixels RGBA pixels
Pixel Rectangle
Rasterization
core
functionality
ARB_imaging
subset
discard
discard
25
OpenGL 1.2.1
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
• Multi-texture
(optional)
26
Multi-texture Poster Child:
Quake 2 Light Maps
×
(modulate)
=
lightmaps onlylightmaps only
decal onlydecal only
combined scenecombined scene
27
GeForce 256 (NV10) View of OpenGL
3D Application
or Game
• Vertex pulling (vertex buffer objects) via DMA
• Dual-texture, cube maps, and register combiners
OpenGL API
GPU
Front End
Vertex
Assembly
Vertex
Transform & Lighting
Primitive Assembly,
Clipping, Setup,
and Rasterization
Texture &
Fog
Texture Fetch
Raster
Operations
Framebuffer Access
Memory Interface
CPU – GPU
Boundary
1999
Attribute Fetch
28
Hardware Cube Maps
Rendered sceneRendered scene
DynamicallyDynamically
createdcreated
cube map imagecube map image
Image credit:
“Guts” GeForce 2 GTS demo,
Thant Thessman
29
OpenGL 1.3
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
• Multi-texture (required now)
• Cube map texturing
• Compressed texture formats
• Texture border clamp
• Texture environment functions
• Add, combine, dot product
• Multisample anti-aliasing
• Transpose matrix
30
GeForce 3 & 4 Ti (NV2x) View of OpenGL
3D Application
or Game
• Programmable vertex processing
• Highly configurable fragment processing
OpenGL API
GPU
Front End
Vertex
Assembly
Vertex
Program
Primitive Assembly,
Clipping, Setup,
and Rasterization
Multi-texture
shaders &
Combiners
Texture Fetch
Raster
Operations
Framebuffer Access
Memory Interface
CPU – GPU
Boundary
2001
Attribute Fetch
31
Vertex Programmability
Paletted matrixPaletted matrix
skinningskinning
Twister vertex programTwister vertex program
Per-vertexPer-vertex
cartooncartoon
shadingshading
32
Configurable Fragment Processing
Bumpy shiny environment mappingBumpy shiny environment mappingChromaticChromatic
aberrationaberration
Offset 2D bumpOffset 2D bump
mappingmapping Depth spritesDepth sprites
33
OpenGL 1.4
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
• Automatic mipmap generation
• Shadow-mapping
• Depth textures and shadow comparisons
• Texture level-of-detail bias
• Texture mirrored repeat wrap mode
• Multi-texture combination
• Fog coordinate
• Secondary color
• Configurable point size attenuation
• Color blending improvements
• Stencil wrap operations
• Window-space raster position specification
34
Hardware Shadow Mapping
Without shadow mappingWithout shadow mapping WithWith shadow mappingshadow mapping
Depth map from lightDepth map from light
source’s viewsource’s view
Darker is closerDarker is closer
lightlight
positionposition
Projective Texturing (1.0) &
Polygon Offset (1.1)
key enablers
35
Shadow Mapping Explained
Planar distance from lightPlanar distance from light Depth map projected onto sceneDepth map projected onto scene
≤≤ ==
lessless
thanthan
True “un-shadowed”True “un-shadowed”
region shown greenregion shown green
equalsequals
36
OpenGL 1.5
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
• Vertex buffer objects (VBOs)
• Occlusion queries
• Generalized shadow mapping functions
37
GeForce FX (NV3x) View of OpenGL
3D Application
or Game
• Programmable fragment processing
• 16 texture units, IEEE 754 32-bit floating-point
• Vertex program branching
OpenGL API
GPU
Front End
Vertex
Assembly
Vertex
Program
Primitive Assembly,
Clipping, Setup,
and Rasterization
Fragment
Program
Texture Fetch
Raster
Operations
Framebuffer Access
Memory Interface
CPU – GPU
Boundary
2003
Attribute Fetch
38
Floating-point Fragment Programmability
39
OpenGL Fragment
Program Flowchart
More
Instructions?
Read Interpolants
and/or Registers
Map Input values:
Swizzle, Negate, etc.
Perform Instruction
Math / Operation
Write Output
Register with
Masking
Begin
Fragment
Fetch & Decode
Next Instruction
Temporary
Registers
initialized to
0,0,0,0
Output
Depth & Color
Registers
initialized to 0,0,0,1
Initialize
Parameters
Emit Output
Registers as
Transformed
Vertex
End
Fragment
Fragment
Program
Instruction
Loop
Fragment
Program
Instruction
Memory
Texture
Fetch
Instruction?
yes
no
no
Compute Texture
Address & Level-
of-detail & Fetch
Texels
Filter
Texels
yes
Texture
Images
Primitive
Interpolants
40
Key Trend:
Configurability becomes Programmability
Fixed-function Programmable
Simple
Configurability
Complex
Configurability
41
Core OpenGL fragment texturing & coloring
Point
Rasterization
Line
Rasterization
Polygon
Rasterization
Pixel Rectangle
Rasterization
Bitmap
Rasterization
From
Primitive
Assembly
DrawPixels
Bitmap
Conventional
Texture Fetching
Texture
Environment
Application
Color Sum
Fog
To raster
operations
Coverage
Application
Texture Unit 0
Texture Unit 1
Texture Unit 0
Texture Unit 1
42 NV1x OpenGL fragment texturing & coloring
Point
Rasterization
Line
Rasterization
Polygon
Rasterization
Pixel Rectangle
Rasterization
Bitmap
Rasterization
From
Primitive
Assembly
DrawPixels
Bitmap
Conventional
Texture Fetching
Texture
Environment
Application
Color Sum
Fog
To raster
operations
Coverage
Application
Register
Combiners
Texture Unit 0
General Stage 1
Final Stage
Texture Unit 1
General Stage 0
Texture Unit 0
Texture Unit 1
GL_REGISTER_COMBINERS_NV
enable
43
Texture Shader 3
…
Texture Shader 1
Texture Shader 0
Register
Combiners
NV2x OpenGL fragment texturing & colorin
Point
Rasterization
Line
Rasterization
Polygon
Rasterization
Pixel Rectangle
Rasterization
Bitmap
Rasterization
From
Primitive
Assembly
DrawPixels
Bitmap
Conventional
Texture Fetching
Texture
Environment
Application
Color Sum
Fog
To raster
operations
Coverage
Application
Texture Shaders
General Stage 1
Final Combiner
General Stage 0
General Stage 7
…Texture Unit 3
…
Texture Unit 1
Texture Unit 0
Texture Unit 3
…
Texture Unit 1
Texture Unit 0
GLTEXTURE_SHADER_NV
enable
GL_REGISTER_COMBINERS_NV
enable
44
Fragment Program
Instruction 0
Texture Shader 3
…
Texture Shader 1
Texture Shader 0
NV3x OpenGL fragment texturing & coloring
Point
Rasterization
Line
Rasterization
Polygon
Rasterization
Pixel Rectangle
Rasterization
Bitmap
Rasterization
From
Primitive
Assembly
DrawPixels
Bitmap
Conventional
Texture Fetching
Texture
Environment
Application
Color Sum
Fog
To raster
operations
Coverage
Application
Texture Shaders
General Stage 1
Final Combiner
General Stage 0
General Stage 7
…
Texture Unit 3
…
Texture Unit 1
Texture Unit 0
Texture Unit 3
…
Texture Unit 1
Texture Unit 0
…
Fragment Program
Fragment Program
Instruction 1023
GL_REGISTER_COMBINERS_NV
enable
GLTEXTURE_SHADER_NV
enable
GL_FRAGMENT_PROGRAM_NV
enable
!!FP1.0 or
!!ARBfp1.0
programs
45
OpenGL 2.0
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
• Programmable shading
• OpenGL Shading Language (GLSL)
• Multiple color buffer rendering targets
• Non-power-of-two texture dimensions
• Point sprites
• Separate blend equation
• Two-sided stencil testing
46
GeForce 6 & 7 (NV4x/G7x) View of OpenGL
3D Application
or Game
• Limited vertex texturing
• Fragment branching
• Multiple render targets & floating-point blending
OpenGL API
GPU
Front End
Vertex
Assembly
Vertex
Program
Primitive Assembly,
Clipping, Setup,
and Rasterization
Fragment
Program
Texture Fetch
Raster
Operations
Framebuffer Access
Memory Interface
CPU – GPU
Boundary
2004
Attribute Fetch
47
Primitive
Program
GeForce 8 & 9 (G8x/G9x) View of OpenGL
3D Application
or Game
• Primitive (geometry) programs
• Parameter reads from buffer objects
• Transform feedback (stream out)
OpenGL API
GPU
Front End
Vertex
Assembly
Vertex
Program
,
Clipping, Setup,
and Rasterization
Fragment
Program
Texture Fetch
Raster
Operations
Framebuffer Access
Memory Interface
CPU – GPU
Boundary
2006
Attribute Fetch
Primitive
Assembly
Parameter Buffer Read
48
Primitive
Program
OpenGL Pipeline Fixed-function Steps
• Much of functional pipeline remains fixed-function
• Vital to maintaining performance and data flow
• Hard to compete with hard-wired rasterization, Zcull, and pixel compression
GPU
Front End
Vertex
Assembly
Vertex
Program
,
Clipping, Setup,
and Rasterization
Fragment
Program
Texture Fetch
Raster
Operations
Framebuffer Access
Memory Interface 2006
Attribute Fetch
Primitive
Assembly
Parameter Buffer Read
49
Primitive
Program
OpenGL Pipeline Programmable Domains
• New geometry shader domain for per-primitive programmable processing
• Unified Streaming Processor Array (SPA) architecture means same capabilities
for all domains
GPU
Front End
Vertex
Assembly
Vertex
Program
,
Clipping, Setup,
and Rasterization
Fragment
Program
Texture Fetch
Raster
Operations
Framebuffer Access
Memory Interface 2006
Attribute Fetch
Primitive
Assembly
Parameter Buffer Read
Can be
unified
hardware!
50
OpenGL 2.1
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
• OpenGL Shading Language
(GLSL) improvements
• Non-square matrices
• Pixel buffer objects (PBOs)
• sRGB color space texture formats
51
OpenGL 3.0
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
• OpenGL Shading Language (GLSL) improvements
• New texture fetches
• True integer data types and operators
• switch/case/default flow control statements
• Conditional rendering based on occlusion query results
• Transform feedback
• Vertex array objects
• Floating-point textures, color buffers, and depth buffers
• Half-precision vertex arrays
• Texture arrays
• Integer textures
• Red and red-green texture formats
• Compressed red and red-green formats
• Framebuffer objects (FBOs)
• Packed depth-stencil pixel formats
• Per-color buffer clearing, blending, and masking
• sRGB color space color buffers
• Fine-grain buffer mapping and flushing
52
Areas of 3.0 Functionality Improvement
• Programmability
• Shader Model 4.0 features
• OpenGL Shading Language (GLSL) 1.30
• Texturing
• New texture representations and formats
• Framebuffer operations
• Framebuffer objects
• New formats
• New copy (blit), clear, blend, and masking operations
• Buffer management
• Non-blocking and fine-grain update of buffer object data stores
• Vertex processing
• Vertex array configuration objects
• Conditional rendering for occlusion culling
• New half-precision vertex attribute formats
• Pixel processing
• New half-precision external pixel formats
All Brand
New
Core
Features
53
OpenGL 3.0 Programmability
• Shader Model 4.0 additions
• True signed & unsigned integer values
• True integer operators: ^, &, |, <<. >>, %,~
• Texture additions
• Texture arrays
• Base texture size queries
• Texel offsets to fetches
• Explicit LOD and derivative control
• Integer samplers
• Interpolation modifiers: centroid, noperspective, and flat
• Vertex array element number: gl_VertexID
• OpenGL Shading Language (GLSL) improvements
• ## concatenation in pre-processor for macros
• switch/case/default statements
54
OpenGL 3.0 Texturing Functionality
• Texture representation
• Texture arrays: indexed access to a set of 1D or 2D
texture images
• Texture formats
• Floating-point texture formats
• Single-precision (32-bit, IEEE s23e8)
• Half-precision (16-bit, s10e5)
• Red & red/green texture formats
• Intended as FBO framebuffer formats too
• Compressed red & red/green texture formats
• Shared exponent texture formats
• Packed floating-point texture formats
55
Texture Arrays
• Conventional texture = One logical pre-filtered image
• Texture array = index-able plurality of pre-filtered images
• Rationale is fewer texture object binds when drawing different objects
• No filtering between mipmap sets in a texture array
• All mipmap sets in array share same format/border & base dimensions
• Both 1D and 2D texture arrays
• Require shaders, no fixed-function support
• Texture image specification
• Use glTexImage3D, glTexSubImage3D, etc. to load 2D texture arrays
• No new OpenGL commands for texture arrays
• 3rd
dimension specifies integer array index
• No halving in 3rd
dimension for mipmaps
• So 64×128x17 reduces to 32×64×17
all the way to 1×1×17
56
Texture Arrays Example
• Multiple skins packed in texture array
• Motivation: binding to one multi-skin texture array avoids texture
bind per object
Texture array index
0 1 2 3 4
0
1
2
3
4
Mipmaplevelindex
57
Compact Floating-point Textures
• Shared exponent & packed float representations are ideal
of High Dynamic Range (HDR) applications
58
Compact Floating-point Texture Formats
• Packed float format
• No sign bit, independent exponents
• Shared exponent format
• No sign bit, shared exponent, no implied leading 1
5-bit
mantissa
5-bit
exponent
6-bit
mantissa
5-bit
exponent
6-bit
mantissa
5-bit
exponent
bit 31 bit 0
9-bit
mantissa
5-bit
shared exponent
9-bit
mantissa
9-bit
mantissa
bit 31 bit 0
59
1- and 2-component
Block Compression Scheme
• Basic 1-component block compression format
• Borrowed from alpha compression scheme of S3TC 5
8-bit B8-bit A
2 min/max
values
64 bits total per block
+
4x4 Pixel Decoded BlockEncoded Block
16 pixels x 8-bit/componet = 128 bits decoded
so effectively 2:1 compression
16 bits
60
Framebuffer Operations
• Framebuffer objects
• Standardized framebuffer objects (FBOs) for rendering to textures
and renderbuffers
• Render-to-texture
• Multisample renderbuffers for FBOs
• Framebuffer operations
• Copies from one FBO to another, including multisample data
• Per-color attachment color clears, blending, and write masking
• Framebuffer formats
• Floating-point color buffers
• Floating-point depth buffers
• Rendering into framebuffer format with 3 small unsigned floating-
point values packed in a 32-bit value
• Rendering into sRGB color space framebuffers
61
Framebuffer Object Example
• Depth peeling for correctly ordered transparency
• Great render-to-texture application for FBOs
62
Depth Peeling Behind the Scenes
• Depth buffer has closest fragment at all pixels
• Save depth buffer
• Render again, but use depth buffer as
shadow map
• Discard fragment in front of shadow
map’s depth value
• Effectively peels one layer of depth!
• Resulting color buffer is 2nd
closest fragment
• And depth buffer for 2nd
closest
fragments’ depth
• Now repeat peeling more layers
• Use ping-pong depth buffer scheme
• Use occlusion query to detect when no
more fragments to peel
• Composite color layers front-to-back (or back-
to-front)
• Front-to-back peeling can be done during
the peeling process
63
Delicate Color Fidelity with sRGB
• Problem: PC display devices have non-linear (sRGB) display gamut
—delicate color shading looks wrong
Conventional
rendering
(uncorrected
color)
Gamma
correct
(sRGB
rendered)
Softer
and
more
natural
Unnaturally
deep facial
shadows
NVIDIA’s Adriana GeForce 8 Launch Demo
64
What is sRGB?
• A standard color space
• Intended for monitors, printers, and the Internet
• Created cooperatively by HP and Microsoft
• Non-linear, roughly gamma of 2.2
• Intuitively “encodes more dark values”
• OpenGL 2.1 already added sRGB texture formats
• Texture fetch converts sRGB to linear RGB, then filters
• Result takes more than 8-bit fixed-point to represent in shader
• 3.0 adds complementary sRGB framebuffer support
• “sRGB correct blending” converts framebuffer sRGB to linear,
blend with linear color from shader, then convert back to sRGB
• Works with FrameBuffer Objects (FBOs)
sRGB chromaticity
65
So why sRGB? Standard Windows Display
is Not Gamma Corrected
• 25+ years of PC graphics, icons, and images depend on not gamma
correcting displays
• sRGB textures and color buffers compensates for this
“Expected” appearance of
Windows desktop & icons
but 3D lighting too dark
Wash-ed out desktop appearance if
color response was linear
but 3D lighting is correct
Gamma
1.0
Gamma
2.2
linear
color
response
66
Vertex Processing
• Vertex array configuration
• Objects to manage vertex array configuration client
state
• Half-precision floating-point vertex array formats
• Vertex output streaming
• Stream transformed vertex results into buffer object
data stores
• Occlusion culling
• Skip rendering based on occlusion query result
67
Miscellaneous
• Pixel Processing
• Half-precision floating-point pixel external formats
• Buffer Management
• Non-blocking and fine-grain update of buffer object data
stores
68
ARB Extensions to OpenGL 3.0
• OpenGL 3.0 standard provides new ARB extensions
• Extensions go beyond OpenGL 3.0
• Standardized at same time as OpenGL 3.0
• Support features in hardware today
• Specifically
• ARB_geometry_shader4—provides per-primitive programmable
processing
• ARB_draw_instanced—gives shader access to instance ID
• ARB_texture_buffer_object—allows buffer object to be sampled
as a huge 1D unfiltered texture
• Shipping today
• NVIDIA driver provides all three
69
Transform Feedback for Terrain Generation
by Recursive Subdivision
• Geometry shaders + transform feedback
1. Render quads (use 4-vertex line adjacency
primitive) from vertex buffer object
2. Fetch height field
3. Stream subdivided positions and normals
to transform feedback “other” buffer
object
4. Use buffer object as vertex buffer
5. Repeat, ping-pong buffer objects
Computation and data all stays on the GPU!
70
Skin Deformation
• Capture & re-use geometric deformations
Transform
feedback allows
the GPU to
calculate the
interactive,
deforming elastic
skin of the frog
71
Silhouette Edge Rendering
• Uses geometry shader
silhouette
edge
detection
geometry
shader
Complete mesh
Silhouette edges
Useful for non-photorealistic
rendering
Looks like human sketching
72
More Geometry Shader Examples
Shimmering
point sprites
Generate
fins for
lines
Generate
shells for
fur
rendering
73
Improved Interpolation Techniques
•Using geometry shader functionality
Quadratic normal
interpolation
True quadrilateral rendering with
mean value coordinate interpolation
74
“Fair” Quadrilateral Interpolation
• glBegin(GL_QUADS);
• glColor3fv(red);
glVertex3fv(lowerLeft);
• glColor3fv(green);
glVertex3fv(lowerRight);
• glColor3fv(red);
glVertex3fv(upperRight);
• glColor3fv(blue);
glVertex3fv(upperLeft);
• glEnd();
• Geometry shader actually operates on
4-vertex GL_LINE_ADJACENCY
primitives instead of quads
Wrong, slash
triangle split
Wrong, backslash
triangle split
Better: Mean value
coordinates
75
OpenGL 2.x ARB Extensions
• Many OpenGL 3.0 extensions have corresponding ARB extensions for
OpenGL 2.1 implementations to advertise
• Helps get 3.0 functionality out sooner, rather than later
• New ARB extensions for 3.0 functionality
• ARB_framebuffer_object—framebuffer objects (FBOs) for render-to-
texture
• ARB_texture_rg—red and red/green texture formats
• ARB_map_buffer_region—non-blocking and fine-grain update of buffer
object data stores
• ARB_instanced_arrays—instance ID available to shaders
• ARB_half_float_vertex—half-precision floating-point vertex array formats
• ARB_framebuffer_sRGB—rendering into sRGB color space framebuffers
• ARB_texture_compression_rgtc—compressed red and red/green texture
formats
• ARB_depth_buffer_float—floating-point depth buffers
• ARB_vertex_array_object—objects to manage vertex array configuration
client state
76
Beyond OpenGL 3.0
OpenGL 3.0
• EXT_gpu_shader4
• NV_conditional_render
• ARB_color_buffer_float
• NV_depth_buffer_float
• ARB_texture_float
• EXT_packed_float
• EXT_texture_shared_exponent
• NV_half_float
• ARB_half_float_pixel
• EXT_framebuffer_object
• EXT_framebuffer_multisample
• EXT_framebuffer_blit
• EXT_texture_integer
• EXT_texture_array
• EXT_packed_depth_stencil
• EXT_draw_buffers2
• EXT_texture_compression_rgtc
• EXT_transform_feedback
• APPLE_vertex_array_object
• EXT_framebuffer_sRGB
• APPLE_flush_buffer_range (modified)
In GeForce 8, 9, & 2xx Series
but not yet core
• EXT_geometry_shader4 (now ARB)
• EXT_bindable_uniform
• NV_gpu_program4
• NV_parameter_buffer_object
• EXT_texture_compression_latc
• EXT_texture_buffer_object (now ARB)
• NV_framebuffer_multisample_coverage
• NV_transform_feedback2
• NV_explicit_multisample
• NV_multisample_coverage
• EXT_draw_instanced (now ARB)
• EXT_direct_state_access
• EXT_vertex_array_bgra
• EXT_texture_swizzle
Plenty of proven OpenGL extensions
for OpenGL Working Group
to draw upon for OpenGL 3.1
77
OpenGL Version Evolution
• Now OpenGL is part of Khronos Group
• Previously OpenGL’s evolution was governed by the OpenGL
Architectural Review Board (ARB)
• Now officially a Khronos working group
• Khronos also standardizes OpenCL, OpenVG, etc.
• How OpenGL version updates happen
• OpenGL participants proposing extensions
• Successful extensions are polished and incorporated into core
• OpenGL 3.0 is great example of this process
• Roughly 20 extensions folded into “core”
• Just 3 of those previously unimplemented
78
29%
17%
15%
15%
4%
2%
2%
2%
2%
2%
2%
2%
1% 1%
4%
15%
Multi-vendor
Silicon Graphics
Architectural Review Board
NVIDIA
ATI
Apple
Mesa3D
Sun Microsystems
OpenGL ES
OpenML
IBM
Intense3D
Hewlett Packard
3Dfx
Other
EXT
SGI
SGIS
SGIX
ARB
NV
Others
Others
OpenGL Extensions by Source
• 44% of extensions are “core” or multi-vendor
• Lots of vendors have initiated extensions
• Extending OpenGL is industry-wide collaboration
ATI
APPLE
MESA
Source: http://www.opengl.org/registry (Dec 2008)
79
What’s Driving OpenGL Modernization?
Human desire for Visual
Intuition and Entertainment
Embarrassing
Parallelism of
Graphics
Increasing
Semiconductor
Density
Particularly the
hardware-amenable,
latency tolerant
nature of rasterization Particularly
interactive video games
80
Kurt Akeley
Principal Researcher
Microsoft Research Silicon Valley
OpenGL’s Evolution:
A Personal Retrospective
81
AA personalpersonal retrospectiveretrospective
• My background:
• Silicon Graphics, 1982-2001
• OpenGL, 1990-2004
• Today’s topics:
• Computer architecture
• Culture and process
• For a more complete coverage see:
• https://graphics.stanford.edu/wikis/cs448-07-spring/
• Mark Kilgard’s excellent course notes
82
Jim Clark and the Geometry EngineJim Clark and the Geometry Engine
• This text is 24 points
– Sub bullets look like this
The Geometry Engine: A VLSI Geometry System for Graphics
Computer Graphics, Volume 16, Number 3
(Proceedings of SIGGRAPH 1982) p127-133, 1982
83
Jim’s helpers: the Stanford gangJim’s helpers: the Stanford gang
IRIS GL
Geometry Engine
IRIS GL
Hardware back-end
Hardware front-end
84
Success!Success! (in 1995)(in 1995)
85
Computer Architecture
86
What is computer architecture?What is computer architecture?
• Architecture: “the minimal set of
properties that determine what programs
will run and what results they will produce”
• Implementation: “the logical
organization of the [computer’s] dataflow
and controls”
• Realization: “the physical structure
embodying the implementation”
87
Example: the analog clockExample: the analog clock
• Architecture
• Circular dial divided into twelfths
• Hour hand (short) and minute hand (long)
Example from Computer Architecture, Concepts and Evolution,
Gerrit A. Blaauw and Frederick P. Brooks, Jr., Addison-Wesley, 1997
• Implementation
• A weight, driving a pendulum, or
• A spring, driving a balance wheel, or
• A battery, driving an oscillator, or ….
• Realization
• Gear ratios, pendulum lengths, battery sizes, ...
12
11
10
6
8
9
7 5
4
2
1
3
88
A useful distinctionA useful distinction
• NVIDIA 8800
• SIMD, or
• SPMD ?
L2
FB
SP SP
L1
TF
ThreadProcessor
Vertex Thread Issue
Setup / Rasterization / ZCull
Primitive Thread Issue Fragment Thread Issue
Data Assembler
Application
SP SP
L1
TF
SP SP
L1
TF
SP SP
L1
TF
SP SP
L1
TF
SP SP
L1
TF
SP SP
L1
TF
SP SP
L1
TF
L2
FB
L2
FB
L2
FB
L2
FB
L2
FB
• Architecture:
• SPMD
• Implementation:
• SIMD
• Realization:
• ASIC
SIMD = Single Instruction, Multiple Data
SPMD = Single Program, Multiple Data
ASIC = Application Specific Integrated Circuit
89
The mainstream viewThe mainstream view
• Table of Contents:
• Fundamentals
• Instruction Sets
• Pipelining
• Advanced Pipelining and ILP
• Memory-Hierarchy Design
• Storage Systems
• Interconnection Networks
• Multiprocessors
90
OpenGL is an architecture
Blaauw/Brooks OpenGL
Different
implementations
IBM 360 30/40/50/65/75
Amdahl
SGI Indy/Indigo/InfiniteReality
NVIDIA GeForce, ATI Radeon, …
Compatibility
Code runs equivalently on all
implementations
Top-level goal
Conformance tests, …
Intentional design
It’s an architecture, whether it was
planned or not .
Carefully planned, though mistakes
were made
Configuration
Can vary amount of resource (e.g.,
memory)
No feature sub-setting
Configuration attributes (e.g.,
framebuffer)
Speed Not a formal aspect of architecture No performance queries
Validity of inputs No undefined operation
All errors specified
No side effects
Little undefined operation
Enforcement
When implementation errors are
found, they are fixed.
Specification rules!
91
But OpenGL is an APIBut OpenGL is an API
(Application Programming Interface)(Application Programming Interface)
• Yes, Blaauw and Brooks talk about (computer) architecture
as though it is always expressed as ISA (Instruction-Set
Architecture)
• But …
• API is just a higher-level programming interface
• “Instruction-Set” Architecture implies other types of
computer architectures (such as “API” Architecture)
• OpenGL has evolved to include ISA-like interfaces
(e.g., the interface below GLSL)
92
We didn’t know …We didn’t know …
• No mention in spec (even 3.0)
• “We view OpenGL as a state …”
• First use in “ARB”
• Architecture Review Board
• Coined by Bill Glazier from “Palo
Alto Architecture Review Board”
• First formal usage (I know of)
• Mark J. Kilgard, Realizing OpenGL: two implementations of one
architecture, Proceedings of the ACM SIGGRAPH/EUROGRAPHICS
workshop on Graphics hardware, p.45-55, August 03-04, 1997,
Los Angeles, California, United States.
93
Fred is magnanimousFred is magnanimous
94
What is implied by “programmable”?What is implied by “programmable”?
• What does it mean to teach programming?
• Does running a microwave oven count?
• Does defining the geometry of a game “level” count?
• Does specifying OpenGL modes count?
• This seems to be a somewhat open question
• Butler Lampson couldn’t tell me .
• Microsoft developers of teaching tools couldn’t tell me.
• An online search wasn’t very helpful.
• Do we just “know it when we see it”?
• Justice Potter Stewart’s definition of pornography
95
My try at some formalizationMy try at some formalization
• Key ideas:
• Composition  choice of placement, sequence
• Non-obvious  semantics are interesting and novel
• Imperative  maybe there are other kinds of programming
“Composition, the organization of elemental
operations into a non-obvious whole, is the
essence of imperative programming.”
-- Kurt Akeley (Foreword to GPU Gems 3)
96
OpenGL has always been programmableOpenGL has always been programmable
• Follows directly from being an “architecture”
• OpenGL commands are instructions (API as an ISA)
• They can be “composed” to create programs
• Multi-pass rendering is the prototypical example
• But Peercy et al. implemented a RenderMan shader compiler
• Invariance was specified from the start (e.g., same fragments)
• We set out to enable “usage that we didn’t anticipate”
• Obvious for a traditional ISA (e.g., IA32)
• Not so obvious for a graphics API
• Example: texture applies to all primitives, not just triangles
97
Example multi-pass OpenGL “program”Example multi-pass OpenGL “program”
glEnable(GL_DEPTH_TEST);
glDisable(GL_LIGHTING);
glColorMask(false, false, false, false);
glEnable(GL_POLYGON_OFFSET_FILL);
glPolygonOffset(maxwidth/2, 1);
draw solid objects
glDepthMask(GL_FALSE);
glColorMask(true, true, true, true);
glColor3f(linecolor);
glDisable(GL_POLYGON_OFFSET_FILL);
glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);
draw solid objects again
glDisable(GL_DEPTH_TEST);
glPolygonMode(GL_FRONT_AND_BACK, GL_FILL);
glDepthMask(GL_TRUE);
Hidden-line rendering
98
Example multi-pass OpenGL “program”Example multi-pass OpenGL “program”
glEnable(GL_DEPTH_TEST);
glDisable(GL_LIGHTING);
glColorMask(false, false, false, false);
glEnable(GL_POLYGON_OFFSET_FILL);
glPolygonOffset(maxwidth/2, 1);
draw solid objects
glDepthMask(GL_FALSE);
glColorMask(true, true, true, true);
glColor3f(1, 1, 1);
glDisable(GL_POLYGON_OFFSET_FILL);
glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);
glEnable(GL_CULL_FACE);
glCullFace(GL_FRONT);
draw solid objects again
draw true edges // for a complete hidden-line drawing
glDisable(GL_DEPTH_TEST);
glPolygonMode(GL_FRONT_AND_BACK, GL_FILL);
glDepthMask(GL_TRUE);
glDisable(GL_CULL_FACE);
Additions to the
hidden-line algorithm
(previous slide)
highlighted in red
Silhouette rendering
99
InvarianceInvariance
Corollary 1 Fragment generation is invariant with respect to
the state values marked with in Rule 2.
100
• Intended to capture complete
sequence of operations
• Also inspired design changes
101
Vertex assembly
Primitive assembly
Rasterization
Fragment operations
Display
Vertex operations
Application
Primitive operations
Framebuffer
Texture memory
Pixel assembly
(unpack)
Pixel operations
Pixel pack
Vertex pipelinePixel pipeline
Application
All primitives
(including pixels) are
rasterized
All vertexes are
treated equally
(e.g., lighted)
All fragments are
treated equally (e.g.,
texture mapped and
depth-buffered)
Not a required
implementation,
but “abstraction
distance” matters
102
Culture and Process
103
Suppose …Suppose …
http://www.opengl.org/registry/
Name
ARB_texture_cube_map
Name Strings
GL_ARB_texture_cube_map
Notice
Copyright OpenGL Architectural Review Board, 1999.
Contact
Michael Gold, NVIDIA (gold 'at' nvidia.com)
Status
Complete. Approved by ARB on 12/8/1999
Version
Last Modified Date: December 14, 1999
Number
ARB Extension #7
Dependencies
None.
Written based on the wording of the OpenGL 1.2.1 specification but not dependent on it.
Overview
This extension provides a new texture generation scheme for cube map textures. Instead of the
current texture providing a 1D, 2D, or 3D lookup into a 1D, 2D, or 3D texture image, the texture is a
set of six 2D images representing the faces of a cube. The (s,t,r) texture coordinates …
104
Complete specificationComplete specification
Name
Name Strings
Notice
Contact
Status
Version
Number
Dependencies
Overview
Issues
New Procedures and Functions
New Tokens
Additions to Chapter 2 of the OpenGL Specification
Additions to Chapter 3 of the OpenGL Specification
Additions to Chapter 4 of the OpenGL Specification
Additions to Chapter 5 of the OpenGL Specification
Additions to Chapter 6 of the OpenGL Specification
Additions to the GLX Specification
Errors
New State (type, query mechanism, initial value, attribute set, specification section)
Usage Examples
105
19 issues19 issues
The spec just linearly interpolates the reflection vectors computed
per-vertex across polygons. Is there a problem interpolating
reflection vectors in this way?
Probably. The better approach would be to interpolate the eye
vector and normal vector over the polygon and perform the reflection
vector computation on a per-fragment basis. Not doing so is likely
to lead to artifacts because angular changes in the normal vector
result in twice as large a change in the reflection vector as normal
vector changes. The effect is likely to be reflections that become
glancing reflections too fast over the surface of the polygon.
Note that this is an issue for REFLECTION_MAP_ARB, but not
NORMAL_MAP_ARB.
106
19 issues …19 issues …
What happens if an (s,t,q) is passed to cube map generation that
is close to (0,0,0), ie. a degenerate direction vector?
RESOLUTION: Leave undefined what happens in this case (but
may not lead to GL interruption or termination).
Note that a vector close to (0,0,0) may be generated as a
result of the per-fragment interpolation of (s,t,r) between
vertices.
107
Trust and integrityTrust and integrity
• Lots of collaboration during the initial design
• But final decisions made by a small group
• SGI played fair
• OpenGL 1.0 didn’t favor SGI equipment (our ports were late )
• SGI obeyed all conformance rules
• SGI didn’t adjust the spec to match our equipment
• The ARB avoided marketing tasks such as benchmarks
• We stuck with technical design issues
• We documented rigorously
• Specification, man pages, …
108
Five Kinkos in Austin TexasFive Kinkos in Austin Texas
The OpenGL Graphics System: A Specification (Version 1.1)
Mark Segal
Kurt Akeley
Editor: Chris Frazier
Copyright © 1992-1997 Silicon Graphics, Inc.
This document contains unpublished information of
Silicon Graphics, Inc.
109
Extension factsExtension facts
• 442 Vendor and “EXT” extension specifications
• Vendor: specific to a single vendor
• EXT: shared by two or more vendors
• 56 “ARB” extensions
• Standardized , likely to be in the next spec revision
• Lots of text …
Source: OpenGL extension registry, December 2008
110
““Specification” sizesSpecification” sizes
Lines Words Chars
56 ARB Extensions 48,674 263,908 2,221,347
All 442 Extensions 209,426 1,076,008 9,079,063
King James Bible 114,535 823,647 5,214,085
New Testament 27,319 188,430 1,197,812
Old Testament 86,783 632,515 3,998,303
111
Beyond the specificationBeyond the specification
• The ARB (now replaced with Khronos)
• Rules of order, secretary, IP, …
• The extension process
• Categories, token syntax, spec templates, enums,
registry, …
• Licensing
• Conformance
• …
112
SummarySummary
• Many mistakes made (see other presentations for lists)
• Created a sustainable culture that values quality and
rigorous documentation
• Defined and evolved the architecture for interactive 3-D
computer graphics
113
Writing better OpenGL
Mark Kilgard
Principal System Software Engineer
NVIDIA
114
Motivation
• Complex APIs and systems have pitfalls
• After 17 years of designed evolution, OpenGL
certainly has its share
• Normal documentation focus:
• What can you do?
• Rather than: What should you do?
115
Communicating Vertex Data
• The way you learn OpenGL:
• Immediate mode
• glBegin, glColor3f, glVertex3f, glEnd
• Straightforward—no ambiguity about vertex data is
• All vertex components are function parameters
• The problem—too function call intensive
• And all vertex data must flow through CPU
116
Example Scenario
• An OpenGL application has to render a set of rectangles
• Rectangle with its parameters
• x, y, height, width, left color, right color, depth
(x,y)
depth order
0.0
1.0
left side color
right side color
height
width
117
Scene Representation
• Each rectangle specified by following RectInfo structure:
• Array of RectInfo structures describes “scene”
• Simplistic scene for sake of teaching
typedef struct {
GLfloat x, y, width, height;
GLfloat depth_order;
GLfloat left_side_color[3]; // red, green, then
blue
GLfloat right_side_color[3]; // red, green, then
blue
} RectInfo;
118
Example Scene and Rendering Result
• Scene of 4 rectangles:
RectInfo rect_list[4] = {
{ 10, 20, 180, 140, 0.5,
{ 1, 1, 1 }, { 1, 0, 1 } },
{ 30, 40, 100, 60, 0.5,
{ 1, 0, 0 }, { 0, 0, 1 } },
{ 140, 60, 100, 80, 0.5,
{ 0, 0, 1 }, { 0, 1, 0 } },
{ 70, 120, 80, 60, 0.7,
{ 1, 1, 0 }, { 0, 1, 1 } },
};
• OpenGL-rendered result
119
Immediate Mode Rectangle Rendering
• Given sized RectInfo array, render vertices of quads
1st
vertex
2nd
vertex
3rd
vertex
4th
vertex
void drawRectangles(int count, const RectInfo *list)
{
glBegin(GL_QUADS);
for (int i=0; i<count; i++) {
const RectInfo *r = &list[i];
glColor3fv(r->left_side_color);
glVertex3f(r->x, r->y, r->depth_order);
glColor3fv(r->right_side_color);
glVertex3f(r->x+r->width, r->y, r->depth_order);
// right_side_color “sticks”
glVertex3f(r->x+r->width, r->y+r->height, r->depth_order);
glColor3fv(r->left_side_color);
glVertex3f(r->x, r->y+r->height, r->depth_order);
}
glEnd();
}
For
each
rectangle
120
Critique of Immediate Mode
• Advantages
• Straightforward to code and debug
• Easy-to-understand conceptual model
• Building stream of vertices with OpenGL commands
• Avoids driver & application copies of vertex data
• Flexible, allowing totally dynamic vertex generation
• Disadvantages
• Rendering continuously streams attributes through CPU
• Pollutes CPU cache with vertex data
• Function call intensive
• Unable to saturate fast graphics hardware
• CPUs just too slow
• Contrast with vertex array approach…
121
Vertex Array Approach
• Step 1: Copy vertex attributes into vertex arrays
• From: RectInfo array (CPU memory)
• To: interleaved arrays of vertex attributes (CPU
memory)
• Step 2: To render
• Configure OpenGL vertex array client state
• Use glEnableClientState, glVertexPointer,
glColorPointer
• Render quads based on indices into vertex arrays
• Use glDrawArrays
122
Vertex Array Format
• Interleave vertex attributes in color & position arrays
color
position
float = 4 bytes
vertex 0
vertex 1
red
green
blue
x
y
z
red
green
blue
x
y
z
color
position
24 bytes
per vertex
123
Step 1:
Copy Rectangle Attributes to Vertex Arrays
void *initVarrayRectangles(int count, const RectInfo *list)
{
void *varray = (char*) malloc(sizeof(GLfloat)*6*4*count);
GLfloat *p = varray;
for (int i=0; i<count; i++, p+=24) {
const RectInfo *r = &list[i];
// quad vertex #1
memcpy(&p[0], r->left_side_color, sizeof(GLfloat)*3);
p[3] = r->x; p[4] = r->y; p[5] = r->depth_order;
// quad vertex #2
memcpy(&p[6], r->right_side_color, sizeof(GLfloat)*3);
p[9] = r->x+r->width; p[10] = r->y; p[11] = r->depth_order;
// quad vertex #3
memcpy(&p[12], r->right_side_color, sizeof(GLfloat)*3);
p[15] = r->x+r->width; p[16] = r->y+r->height; p[17] = r->depth_order;
// quad vertex #4
memcpy(&p[18], r-> left_side_color, sizeof(GLfloat)*3);
p[21] = r->x; p[22] = r->y+r->height; p[23] = r->depth_order;
}
return varray;
}
124
Step 2:
Configure & Render from Vertex Arrays
void drawVarrayRectangles(int count, const RectInfo *list)
{
char *varray = initVarrayRectangles(count, list);
const GLfloat *p = (const GLfloat*) varray;
const GLsizei stride = sizeof(GLfloat)*6;//3 RGB floats,3 XYZ floats
glColorPointer(/*rgb*/3, GL_FLOAT, stride, p+0);
glVertexPointer(/*xyz*/3, GL_FLOAT, stride, p+3);
glEnableClientState(GL_COLOR_ARRAY);
glEnableClientState(GL_VERTEX_ARRAY);
glDrawArrays(GL_QUADS, /*firstIndex*/0,
/*indexCount*/count*4);
free(varray);
}
125
Critique of
Simplistic Vertex Array Rendering
• Advantages
• Far fewer OpenGL commands issued
• Disadvantages
• Every render with drawVarrayRectangles calls
initVarrayRectangles
• Allocates, initializes, & frees vertex array memory
every render
• Improve by separating vertex array construction from
rendering
126
Initialize Once, Render Many Approach
• This routine expects base pointer returned by
initVarrayRectangles
void drawInitializedVarrayRectangles(int count, const void *varray)
{
const GLfloat *p = (const GLfloat*) varray;
const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floats
glColorPointer(/*rgb*/3, GL_FLOAT, stride, p+0);
glVertexPointer(/*xyz*/3, GL_FLOAT, stride, p+3);
// Assume GL_COLOR_ARRAY and GL_VERTEX_ARRAY are already enabled!
glDrawArrays(GL_QUADS, /*firstIndex*/0, /*indexCount*/count*4);
}
127
Client Memory Vertex Attribute Transfer
GPU
Processor
command
processor
vertex
puller
hardware
rendering
pipeline
CPU
command queue
CPU writes of
command + vertex data
GPU DMA transfer of
command + vertex data
application
(client)
memory
vertex
array
vertex
data travels
through
CPU
memory
reads
CPU
128
Vertex Buffer Object Vertex Attribute Pulling
OpenGL
(vertex)
buffer
object
GPU
command
processor
vertex
puller
hardware
rendering
pipeline
CPU
command queue
CPU writes of
command + vertex indices
vertex
array
GPU DMA transfer of
command data
application
(client)
memory
memory
reads
CPU
GPU DMA
transfer
of vertex
data—CPU never reads data
129
Initializing Vertex Buffer Objects (VBOs)
• Once using vertex arrays, easy to switch to VBOs
• Make the vertex array as before
• Then bind to buffer object and copy data to the buffer
void initVarrayRectanglesInVBO(GLuint bufferName,
int count, const RectInfo *list)
{
char *varray = initVarrayRectangles(count, list);
const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floats
const GLint numVertices = 4*count;
const GLsizeiptr bufferSize = stride*numVertices;
glBindBuffer(GL_ARRAY_BUFFER, bufferName);
glBufferData(GL_ARRAY_BUFFER, bufferSize, varray, GL_STATIC_DRAW);
free(varray);
}
130
Rendering from Vertex Buffer Objects
• Once initialized, glBindBuffer to bind to buffer ahead of
vertex array configuration
• Send offsets instead of points
void drawVarrayRectanglesFromVBO(GLuint bufferName,
int count)
{
const char *base = NULL;
const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floats
glBindBuffer(GL_ARRAY_BUFFER, bufferName);
glColorPointer(/*rgb*/3, GL_FLOAT, stride, base+0*sizeof(GLfloat));
glVertexPointer(/*xyz*/3, GL_FLOAT, stride, base+3*sizeof(GLfloat));
// Assume GL_COLOR_ARRAY and GL_VERTEX_ARRAY are already enabled!
glDrawArrays(GL_QUADS, /*firstIndex*/0, /*indexCount*/count*4);
}
131
Understanding glBindBuffer
• Buffer object bindings are frequent point of confusion for
programmers
• What does glBindBuffer do really?
• Lots of buffer binding targets:
• GL_ARRAY_BUFFER target—for vertex attribute arrays
• Query with GL_ARRAY_BUFFER_BINDING
• GL_ARRAY_ELEMENT_BUFFER target—for vertex indices,
effectively topology
• Query with GL_ELEMENT_ARRAY_BUFFER_BINDING
• Each vertex array has its own buffer, query with
• GL_VERTEX_ARRAY_BUFFER_BINDING
• GL_COLOR_ARRAY_BUFFER_BINDING
• GL_TEXCOORD_ARRAY_BUFFER_BINDING, etc.
132
Bind and Query Buffer Targets
Buffer Bind Tokens
• GL_ARRAY_BUFFER
• GL_ELEMENT_ARRAY_BUFFER
Buffer Query Tokens
• GL_ARRAY_BUFFER_BINDING
• GL_ELEMENT_ARRAY_BUFFER_BINDING
• GL_COLOR_ARRAY_BUFFER_BINDING
• GL_VERTEX_ARRAY_BUFFER_BINDING
• GL_FOGCOORD_ARRAY_BUFFER_BINDING
• GL_TEXCOORD_ARRAY_BUFFER_BINDING
• GL_VERTEX_ATTRIB_ARRRAY_BUFFER_BINDING
Target tokens
for glBindBuffer
Query tokens
to glGetIntegerv
Query tokens
to glGetVertexAttribiv
133
Latched Vertex Array Buffer Bindings
• Here’s the confusing part:
glBindBuffer(GL_ARRAY_BUFFER, 34);
glColorPointer(3, GL_FLOAT, color_stride,
(void*)color_offset);
• The glBindBuffer doesn’t change any vertex array
binding
• The GL_ARRAY_BUFFER_BINDING state that
glBindBuffer sets does not itself affect rendering
• It is the glColorPointer call that latches the array buffer
binding to change the color array’s buffer binding!
• Same with all vertex array buffer bindings
134
Binding Buffer Zero is Special
• By default, vertex arrays don’t access buffer objects
• Instead client memory is accessed
• This is because
• The initial buffer binding for a context is zero
• And zero is special
• Zero means access client memory
• You can always resume client memory vertex array access for a given array like this
glBindBuffer(GL_ARRAY_BUFFER, 0); // use client memory
glColorPointer(3, GL_FLOAT, color_stride, color_pointer);
• Different treatment of the “pointer” parameter to vertex array specification commands
• When the current array buffer binding is zero, the pointer value is a client
memory pointer
• When the current array buffer binding is non-zero (meaning it names a buffer
object), the pointer value is “recast” as an offset from the beginning of the buffer
• Once again
• The glBindBuffer(GL_ARRAY_BUFFER,0) call alone doesn’t change any vertex
array buffer bindings
• It takes a vertex array specification command such as glColorPointer to latch the
zero
ensures compatibility
with pre-VBO OpenGL
135
Texture Coordinate Set Selector
• A selector in OpenGL is
• A state variable that controls what state a subsequent command
updates
• Examples of commands that modify selectors
• glMatrixMode, glActiveTexture, glClientActiveTexture
• A selector is different from latched state
• Latched state is a specified value that is set (or “latched”) when
a subsequent command is called
• Pitfall warning: glTexCoordPointer both
• Relies on the glClientActiveTexture command’s selector
• And latches the current array buffer binding for the selected
texture coordinate vertex array
• Example
glBindBuffer(GL_ARRAY_BUFFER, 34);
glClientActiveTexture(GL_TEXTURE3);
glTexCoordPointer(2, GL_FLOAT, uv_stride, (void*)buffer_offset);
buffer value glTexCoordPointer latches
selector glTexCoordPointer uses
136
OpenGL’s Modern Buffer-centric
Processing Model
Vertex Array Buffer
Object (VaBO)
Transform Feedback
Buffer (XBO)
Parameter
Buffer (PaBO)
Pixel Unpack
Buffer (PuBO)
Pixel Pack
Buffer (PpBO)Bindable
Uniform Buffer
(BUB)
Texture Buffer
Object (TexBO)
Vertex Puller
Vertex Shading
Geometry
Shading
Fragment
Shading
Texturing
Array Element Buffer
Object (VeBO)
Pixel
Pipeline
vertex data
texel data
pixel data
parameter data
(not ARB functionality yet)
glBegin, glDrawElements, etc.
glDrawPixels, glTexImage2D, etc.
glReadPixels,
etc.
Framebuffer
137
Usages of OpenGL Buffers Objects
• Vertex uses (VBOs)
• Input to GL: Vertex attribute buffer objects
• Color, position, texture coordinate sets, etc.
• Input to GL: Vertex element buffer objects
• Indices
• Output from GL: Transform feedback
• Streaming vertex attributes out
• Texture uses (TexBOs)
• Texturing from: Texture buffer objects
• Pixel uses (PBOs)
• Output from GL: Pixel pack buffer objects
• glReadPixels
• Input from GL: Pixel unpack buffer objects
• glDrawPixels, glBitmap, glTexImage2D, etc.
• Shader uses (PaBOs, UBOs)
• Input to assembly program: Parameter buffer objects
• Input to GLSL program: Bind-able uniform buffer objects
Key point: OpenGL
buffers are containers for
bytes; a buffer is not tied
to any particular usage
138
Continuum of OpenGL Usage
Tweak-able Performance
Immediate
mode
Client vertex
arrays
Vertex buffer
objects (VBOs)
Display lists
139
Mid-session break
15 minutes
140
Implementing OpenGL
Mark Kilgard
Principal System Software Engineer
NVIDIA
141
Topics in OpenGL Implementation
• Dual-core OpenGL driver operation
• What goes into a texture fetch?
• You give me some texture coordinates
• I give you back a color
• Could it be any simpler?
142
OpenGL Drivers for Multi-core CPUs
• Today dual-core processors in PCs is nearly ubiquitous
• 4, 6, 8, and more cores are clearly coming
• How does OpenGL implementation exploit this trend?
• Answer: develop dual-core OpenGL driver
143
Dual-core OpenGL Driver Architecture
Application thread …
Application thread D
Context 1
Application thread A
Application
rendering thread
App
ICD
ICD’s app thread
(tokenize thread)
Worker thread 1
(server thread)
Application thread C
Application audio
thread (no
OpenGL)
Context 2
Application thread B
Application
rendering thread
ICD’s app thread
(tokenize thread)
Worker thread 2
(server thread)
Circular
command FIFO
Circular
command FIFO
144
Dual-core Performance Results
• A well-behaved OpenGL application benefiting from a
dual-core mode of OpenGL driver operations
0
50
100
150
200
250
Single core Dual core Null driver
Frames
per second
Mode of OpenGL driver operation
145
Good Dual-core Driver Practices
• General advice
• Display lists execute on the driver’s worker thread!
• You want to avoid situations where the application thread must
“sync” with the driver thread
• Specific advice
• Avoid OpenGL state queries
• More on this later
• Avoid querying OpenGL errors in production code
• Bad behavior is detected automatically and leads to exit from the
dual-core mode
• Back to the standard single-core driver mode of operation
• “Do no harm”
146
Consider an OpenGL texture fetch
• Seems very simple
• Input: texture coordinates (s,t,r,q)
• Output: some color (r,g,b,a)
• Just a simple function, written in Cg/HLSL:
uniform sampler2D decal : TEXUNIT2;
float4 texcoord : TEXCOORD3;
float4 rgba = tex2D(decal, texcoordset.st);
• Compiles to single instruction:
TEX o[COLR], f[TEX3], TEX2, 2D;
• Implementation is much more involved!
147
Anatomy of a Texture Fetch
Filtered
texel
vector
Texel
Selection
Texel
Combination
Texel
offsets
Texel
data
Texture images
Combination
parameters
Texture
coordinate
vector
Texture parameters
148
Texture Fetch Functionality (1)
• Texture coordinate processing
• Projective texturing (OpenGL 1.0)
• Cube map face selection (OpenGL 1.3)
• Texture array indexing (OpenGL 2.1)
• Coordinate scale: normalization (ARB_texture_rectangle)
• Level-of-detail (LOD) computation
• Log of maximum texture coordinate partial derivative (OpenGL 1.0)
• LOD clamping (OpenGL 1.2)
• LOD bias (OpenGL 1.3)
• Anisotropic scaling of partial derivatives (SGIX_texture_lod_bias)
• Wrap modes
• Repeat, clamp (OpenGL 1.0)
• Clamp to edge (OpenGL 1.2), Clamp to border (OpenGL 1.3)
• Mirrored repeat (OpenGL 1.4)
• Fully generalized clamped mirror repeat (EXT_texture_mirror_clamp)
• Wrap to adjacent cube map face
• Region clamp & mirror (PlayStation 2)
149
Texture Fetch Functionality (2)
• Filter modes
• Minification / magnification transition (OpenGL 1.0)
• Nearest, linear, mipmap (OpenGL 1.0)
• 1D & 2D (OpenGL 1.0), 3D (OpenGL 1.2), 4D (SGIS_texture4D)
• Anisotropic (EXT_texture_filter_anisotropic)
• Fixed-weights: Quincunx, 3x3 Gaussian
• Used for multi-sample resolves
• Detail texture magnification (SGIS_detail_texture)
• Sharpen texture magnification (SGIS_sharpen_texture)
• 4x4 filter (SGIS_texture_filter4)
• Sharp-edge texture magnification (E&S Harmony)
• Floating-point texture filtering (ARB_texture_float, OpenGL 3.0)
150
Texture Fetch Functionality (3)
• Texture formats
• Uncompressed
• Packing: RGBA8, RGB5A1, etc. (OpenGL 1.1)
• Type: unsigned, signed (NV_texture_shader)
• Normalized: fixed-point vs. integer (OpenGL 3.0)
• Compressed
• DXT compression formats (EXT_texture_compression_s3tc)
• 4:2:2 video compression (various extensions)
• 1- and 2-component compression (EXT_texture_compression_latc,
OpenGL 3.0)
• Other approaches: IDCT, VQ, differential encoding, normal maps,
separable decompositions
• Alternate encodings
• RGB9 with 5-bit shared exponent (EXT_texture_shared_exponent)
• Spherical harmonics
• Sum of product decompositions
151
Texture Fetch Functionality (4)
• Pre-filtering operations
• Gamma correction (OpenGL 2.1)
• Table: sRGB / arbitrary
• Shadow map comparison (OpenGL 1.4)
• Compare functions: LEQUAL, GREATER, etc.
(OpenGL 1.5)
• Needs “R” depth value per texel
• Palette lookup (EXT_paletted_texture)
• Thresh-holding
• Color key
• Generalized thresh-holding
152
Texture Fetch Functionality (5)
• Optimizations
• Level-of-detail weighting adjustments
• Mid-maps (extra pre-filtered levels in-between existing levels)
• Unconventional uses
• Bitmap textures for fonts with large filters (Direct3D 10)
• Rip-mapping
• Non-uniform texture border color
• Clip-mapping (SGIX_clipmap)
• Multi-texel borders
• Silhouette maps (Pardeep Sen’s work)
• Shadow mapping
• Sharp piecewise linear magnification
153
Phased Data Flow
• Must hide long memory read latency between Selection
and Combination phases
Texel
Selection
Texel
Combination
Texel
offsets
Texel
data
Texture images
Combination
parameters
Texture
coordinate
vector
Texture parameters
Memory
reads for
samples
FIFOing of
combination
parameters
154
What really happens?
• Let’s consider a simple tri-linear mip-mapped 2D
projective texture fetch
• Logically just one instruction
TXP o[COLR], f[TEX3], TEX2, 2D;
• Logically
• Texel selection
• Texel combination
• How many operations are involved?
155
Medium-Level Dissection
of a Texture Fetch
Convert
texel
coords
to
texel
offsets
integer /
fixed-point
texel
combination
texel
offsets
texel data
texture images
combination
parameters
interpolated
texture coords
vector
texture parameters
Convert
texture
coords
to
texel
coords
filtered
texel
vector
texel
coords
floor /
frac integer
coords &
fractional
weights
floating-point
scaling
and
combination
integer /
fixed-point
texel
intermediates
156
Interpolation
• First we need to interpolate (s,t,r,q)
• This is the f[TEX3] part of the TXP instruction
• Projective texturing means we want (s/q, t/q)
• And possible r/q if shadow mapping
• In order to correct for perspective, hardware actually interpolates
• (s/w, t/w, r/w, q/w)
• If not projective texturing, could linearly interpolate inverse w (or 1/w)
• Then compute its reciprocal to get w
• Since 1/(1/w) equals w
• Then multiply (s/w,t/w,r/w,q/w) times w
• To get (s,t,r,q)
• If projective texturing, we can instead
• Compute reciprocal of q/w to get w/q
• Then multiple (s/w,t/w,r/w) by w/q to get (s/q, t/q, r/q)
Observe projective
texturing is same
cost as perspective
correction
157
Interpolation Operations
• Ax + By + C per scalar linear interpolation
• 2 MADs
• One reciprocal to invert q/w for projective texturing
• Or one reciprocal to invert 1/w for perspective
texturing
• Then 1 MUL per component for s/w * w/q
• Or s/w * w
• For (s,t) means
• 4 MADs, 2 MULs, & 1 RCP
• (s,t,r) requires 6 MADs, 3 MULs, & 1 RCP
• All floating-point operations
158
Texture Space Mapping
• Have interpolated & projected coordinates
• Now need to determine what texels to fetch
• Multiple (s,t) by (width,height) of texture base level
• Could convert (s,t) to fixed-point first
• Or do math in floating-point
• Say based texture is 256x256 so
• So compute (s*256, t*256)=(u,v)
159
Mipmap Level-of-detail Selection
• Tri-linear mip-mapping means compute appropriate
mipmap level
• Hardware rasterizes in 2x2 pixel entities
• Typically called quad-pixels or just quad
• Finite difference with neighbors to get change in u
and v with respect to window space
• Approximation to ∂u/∂x, ∂u/∂y, ∂v/∂x, ∂v/∂y
• Means 4 subtractions per quad (1 per pixel)
• Now compute approximation to gradient length
• p = max(sqrt((∂u/∂x)2
+(∂u/∂y)2
),
sqrt((∂v/∂x)2
+(∂v/∂y)2
))
one-pixel separation
160
Level-of-detail Bias and Clamping
• Convert p length to power-of-two level-of-detail and
apply LOD bias
• λ = log2(p) + lodBias
• Now clamp λ to valid LOD range
• λ’ = max(minLOD, min(maxLOD, λ))
161
Determine Mipmap Levels and
Level Filtering Weight
• Determine lower and upper mipmap levels
• b = floor(λ’)) is bottom mipmap level
• t = floor(λ’+1) is top mipmap level
• Determine filter weight between levels
• w = frac(λ’) is filter weight
162
Determine Texture Sample Point
• Get (u,v) for selected top and bottom mipmap levels
• Consider a level l which could be either level t or b
• With (u,v) locations (ul,vl)
• Perform GL_CLAMP_TO_EDGE wrap modes
• uw = max(1/2*widthOfLevel(l),
min(1-1/2*widthOfLevel(l), u))
• vw = max(1/2*heightOfLevel(l),
min(1-1/2*heightOfLevel(l), v))
• Get integer location (i,j) within each level
• (i,j) = ( floor(uw* widthOfLevel(l)),
floor(vw* ) )
border
edge
s
t
163
Determine Texel Locations
• Bilinear sample needs 4 texel locations
• (i0,j0), (i0,j1), (i1,j0), (i1,j1)
• With integer texel coordinates
• i0 = floor(i-1/2)
• i1 = floor(i+1/2)
• j0 = floor(j-1/2)
• j1 = floor(j+1/2)
• Also compute fractional weights for bilinear filtering
• a = frac(i-1/2)
• b = frac(j-1/2)
164
Determine Texel Addresses
• Assuming a texture level image’s base pointer, compute a texel
address of each texel to fetch
• Assume bytesPerTexel = 4 bytes for RGBA8 texture
• Example
• addr00 = baseOfLevel(l) +
bytesPerTexel*(i0+j0*widthOfLevel(l))
• addr01 = baseOfLevel(l) +
bytesPerTexel*(i0+j1*widthOfLevel(l))
• addr10 = baseOfLevel(l) +
bytesPerTexel*(i1+j0*widthOfLevel(l))
• addr11 = baseOfLevel(l) +
bytesPerTexel*(i1+j1*widthOfLevel(l))
• More complicated address schemes are needed for good texture
locality!
165
Initiate Texture Reads
• Initiate texture memory reads at the 8 texel addresses
• addr00, addr01, addr10, addr11 for the upper level
• addr00, addr01, addr10, addr11 for the lower level
• Queue the weights a, b, and w
• Latency FIFO in hardware makes these weights
available when texture reads complete
166
Phased Data Flow
• Must hide long memory read latency between Selection
and Combination phases
Texel
Selection
Texel
Combination
Texel
offsets
Texel
data
Texture images
Combination
parameters
Texture
coordinate
vector
Texture parameters
Memory
reads for
samples
FIFOing of
combination
parameters
167
Texel Combination
• When texels reads are returned, begin filtering
• Assume results are
• Top texels: t00, t01, t10, t11
• Bottom texels: b00, b01, b10, b11
• Per-component filtering math is tri-linear filter
• RGBA8 is four components
• result = (1-a)*(1-b)*(1-w)*b00 +
(1-a)*b*(1-w)*b*b01 +
a*(1-b)*(1-w)*b10 +
a*b*(1-w)*b11 +
(1-a)*(1-b)*w*t00 +
(1-a)*b*w*t01 +
a*(1-b)*w*t10 +
a*b*w*t11;
• 24 MADs per component, or 96 for RGBA
• Lerp-tree could do 14 MADs per component, or 56 for RGBA
168
Total Texture Fetch Operations
• Interpolation
• 6 MADs, 3 MULs, & 1 RCP (floating-point)
• Texel selection
• Texture space mapping
• 2 MULs (fixed-point)
• LOD determination (floating-point)
• 1 pixel difference, 2 SQRTs, 4 MULs, 1 LOG2
• LOD bias and clamping (fixed-point)
• 1 ADD, 1 MIN, 1 MAX
• Level determination and level weighting (fixed-point)
• 1 FLOOR, 1 ADD, 1 FRAC
• Texture sample point
• 4 MAXs, 4 MINs, 2 FLOORs (fixed-point)
• Texel locations and bi-linear weights
• 8 FLOORs, 4 FRACs, 8 ADDs (fixed-point)
• Addressing
• 16 integer MADs (integer)
• Texel combination
• 56 fixed-point MADs (fixed-point)
169
Observations about the Texture Fetch
• Lots of ways to implement the math
• Lots of clever ways to be efficient
• Lots more texture operations not considered in this analysis
• Compression
• Anisotropic filtering
• sRGB
• Shadow mapping
• Arguably TEX instructions are “world’s most CISC instructions”
• Texture fetches are incredibly complex instructions
• Good deal of GPU’s superiority at graphics operations over CPUs is
attributable to TEX instruction efficiency
• Good for compute too
170
OpenGL’s Future Evolution
Mark Kilgard
Principal System Software Engineer
NVIDIA
171
What drives OpenGL’s future?
• GPU graphics functionality
• Tessellation & geometry amplification
• Ratio of GPU to single-core CPU performance
• Compatibility
• Direct3Disms
• OpenGLisms
• Deprecation
• Compute support
• OpenCL, CUDA, Stream processing
• Unconventional graphics devices
172
Better Graphics Functionality
• Expect more graphics performance
• Easy prediction
• Rasterization nowhere near peaked
• Ray tracing fans—GPUs make rays and triangles
faster
– Market still values triangles more than rays
• Expect more generalized graphics functionality
• Trend for texture enhancements likely to continue
173
Geometry Amplification
• Tessellation
• Programmable hardware support coming
• True market demand probably not tessellation per se
• Games want visual richness
• Texture and shading have created much richness
– Often “pixel richness” as substitute for geometry richness
• Increasingly “visual richness” means geometric complexity
• Geometry Amplification may be better term
• Tessellation is one way to improve tessellation
– Recognize the limits of bi-variate patches for
representing geometry
174
Programmable Tessellation
• Stunning real-time geometric detail + animation possible
• Programmable tessellation + vertex textured displacements
175
Continuous Level-of-detail for Tessellation
Increasing tessellation level-of-detail
• Same patch mesh for all 3 scenes
176
Adaptive Programmable Tessellation
Programmable level-of-detail determination allows
more tessellation along silhouette edges
177
Limits of Patch Tessellation
• What games tend to want
• Here’s 8 vertices (bounding
box), go draw a fire truck
• Here’s a few vertices, go draw
a tree
178
Tessellation Not New to OpenGL
• At least three different bi-variate patch tessellation schemes have
been added to OpenGL
• Evaluators (OpenGL 1.0)
• NV_evaluators (GeForce 3)
• water-tight
• adaptive level-of-detail
• forward differencing approach
• ATI_pn_triangles Curved PN Triangles (Radeon)
• tessellated triangle based on positions+normals
• None succeeded
• Hard to integrate into art pipelines
• Didn’t offer enough performance advantage
GLUT’s wire-frame
teapot
[Moreton 20001]
[Vlachos 20001]
179
Ratio of CPU core-to-GPU Performance
• Well known computer architecture trends now
• Single-threaded CPU performance trends are stalled
• Multi-core is CPU designer response
• GPU performance continues on-trend
• What does this mean for graphics API design?
• CPUs must generate more visually rich API command
streams to saturate GPUs
• Can’t just send more commands faster
• Single-threaded CPUs can only do so much
• So must send more powerful commands
180
Déjà vu
• We’ve been here before
• Early 1980s: Graphics terminals used to be
connected to minicomputers by slow speed
interconnects
• CPUs themselves far too slow for real-time
rendering
• Resulting rendering model
• Download scene database to graphics terminal
• Adjust viewing and modeling parameters
• Send “redraw scene” command
181
What Happened
• Such “scene processor” hardware not very flexible
• Difficult to animate anything beyond rigid dynamics
• Eventually SGI and others matched CPUs and interconnects to
graphics performance
• Result was IRIS GL’s immediate mode
• CPU fast enough to send geometry every frame
• OpenGL took this model
• Over time added vertex arrays, vertex buffers, texturing,
programmable shading, and more performance
• CPU performance became limiter still
• Better graphics driver tuning helped
• Dual-core drivers help some more
182
OpenGL’s Most Powerful Command
• Available since OpenGL 1.0
• Can render essentially anything OpenGL can render!
• Takes just one parameter
• The command
glCallList(GLuint displayListName);
• Power of display lists comes from
• Playing back arbitrary compiled commands
• Allowing for hierarchical calling of display list
• A display list can contain glCallList or glCallLists
• Ability of application to re-define display lists
• No editing, but can be re-defined
183
Enhanced Display Lists
• OpenGL 1.0 display lists are too inflexible
• Pixel & vertex data “compiled into” display lists
• Binding objects always “by name”
• Rather than “by reference
• These problems can be fixed
• Modern OpenGL supports buffers for transferring vertices and
pixels
• Compile commands into display lists that defer vertex and
pixel transfers until execute-time
– Rather than compile-time
• Allow objects (textures, buffers, programs) to be bound “by
reference” or “by name”
184
Other Display List Enhancements
• Conditional display list execution
• Relaxed vertex index and command order
• Parallel construction of display lists by multiple threads
General insight: Easier for driver to optimize application’s
graphics command stream if it gets to
1) see the repetition in the command stream clearly
2) take time to analyze and optimize usage
185
Conditional Display List Execution
• Today’s occlusion query
• Application must “query” to learn occlusion result
• Latency too great to respond
• Application can use OpenGL 3.0’s conditional render
capability
• But just skips vertex pulling, not state changes
• Conditional display list execution
• Allow a glCallList to depend on the occlusion result
from an occlusion query object
• Allows in-band occlusion querying
• Skip both vertex pulling and state changes
186
Relaxed Vertex Index and Command Order
• OpenGL today always executes commands “in order”
• Sequentially requirement
• Provide compile-time specification of re-ordering allowances
• Allows GL implementation to re-order
• Vertex indices within display list’s vertex batch
• Commands within display list
• Key rule: state vector rendering command executes in must
match the state if command was rendered sequentially
• Allow static or dynamic re-ordering
• Static re-ordering needed for multi-pass invariances
• Past practice
• IRIS Performer would sort rendering by state changes for
performance
• [Sander 2007] show substantial benefit for vertex ordering
187
Parallel Display List Construction
• Today’s model
• Single thread makes all OpenGL rendering calls
• Minimizes GPU context switch overhead
• Ties command generation rate to single core’s
CPU performance
• Enhanced display list model
• Multiple threads can build display lists in parallel
• Single thread still executes display lists
• Countable semaphore objects used to synchronize
hand-off of display lists built by other threads with
main rendering thread
188
Rethinking Display Lists
• Display lists have been proposed for deprecation
• Right as we really need them!
• Much more interesting to enhance display lists
• Dual-core driver already off-loads display list traversal
to driver’s thread
• Multi-core driver could scan frequently executed
display lists to optimize their order and error
processing
• Includes adding pre-fetching to avoid stalling CPU
on cache misses for object accesses
189
Direct3Disms
• Developing a shader-rich game title costs $$$
• For top titles, often US$ 5,000,000+
• Investment typically amortized over multiple platforms
• Consoles are primary target, then PCs
• PC version typically developed for Direct3D
• Reality: OpenGL is often 3rd
or worse priority
• API differences = porting & performance pitfalls
• Stops or slows Direct3D-developed 3D content from
working easily on OpenGL platforms
190
Supporting Direct3D: Not New
• OpenGL has always supported multiple formats well
• OpenGL’s plethora of pixel and vertex formats
• Very first OpenGL extension: EXT_bgra
• Provides a pixel component ordering to match the
color component ordering of Windows for 2D GDI
rendering
• Made core functionality by OpenGL 1.3
• Many OpenGL extensions have embraced Direct3Disms
• Secondary color
• Fog coordinate
• Point sprites
191
Direct3D vs. OpenGL
Coordinate System Conventions
• Window origin conventions
• Direct3D = upper-left origin
• OpenGL = lower-left origin
• Pixel center conventions
• Direct3D9 = pixel centers at integer locations
• OpenGL (and Direct3D 10) = pixel centers at half-pixel locations
• Clip space conventions
• Direct3D = [-1,+1] for XY, [0,1] for Z
• OpenGL = [-1,+1] range for XYZ
• Affects
• How projection matrix is loaded
• Fragment shaders that access the window position
• Point sprites have upper-left texture coordinate origin
• OpenGL already lets application choose lower-left or upper-left
192
Direct3D vs. OpenGL
Provoking Vertex Conventions
• Direct3D uses “first” vertex of a triangle or line to
determine which color is used for flat shading
• OpenGL uses “last” vertex for lines, triangles, and quads
• Except for polygons (GL_POLYGON) mode that use the
first vertex
Direct3D 9
pDev->SetRenderState(
D3DRS_SHADEMODE,
D3DSHADE_FLAT);
OpenGL
glShadeModel(GL_FLAT);
Input triangle strip
with per-vertex colors
193
BGRA Vertex Array Order
• Direct3D 9’s most common usage for sending per-vertex
colors is 32-bit D3DCOLOR data type:
• Red in bits 16:23
• Green in bits 8:15
• Blue in bits 0:7
• Alpha in bits 24:31
• Laid in memory, looks like BGRA order
• OpenGL assumes RGBA order for all vertex arrays
• Direct3Dism EXT_vertex_array_bgra extension allows:
glColorPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);
glSecondaryColorPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);
glVertexAttribPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);
8-bit
red
8-bit
alpha
8-bit
green
8-bit
blue
bit 31
bit 0
194
OpenGLisms
• Things about OpenGL’s operation that make it hard for
non-OpenGL applications to port to OpenGL
• Examples
• Selectors
• Linked GLSL program objects
195
Eliminating Selectors from OpenGL
• OpenGL has lots of selectors
• Selectors set state that indicates what state subsequent
commands will update
• Already mentioned selectors: glClientActiveTexture
• Other examples: glActiveTexture, glMatrixMode,
glBindTexture, glBindBuffer, glUseProgram,
glBindProgramARB
• OpenGL is full of selectors
– Partly OpenGL’s extensibility strategy
– Partly because objects are bound into context
» Bind-to-edit objects
» Rather than edit-by-name
• Direct State Access extension: EXT_direct_state_access
• Provides complete selector-free additional API for OpenGL
• Shipping in NVIDIA’s 180.43 drivers
196
Reasons to Eliminate Selectors
• Direct3D has an “edit-by-name” model of operation
• Means Direct3D has no selectors
• Having to manage selectors when porting Direct3D or console
code to OpenGL is awkward
• Requires deferring updates to minimize selector and object
bind changes
• Layered libraries can’t count of selector state
• To be safe when updating sate controlled by selectors, such
libraries must use idiom
• Save selector, Set selector, Update state, Restore selector
• Bad for performance, particularly bad for dual-core drivers
since queries are expensive
197
GLSL Program Object Linking
• GLSL requires shader objects from different domains
(vertex, geometry, fragment) to be linked into single
GLSL program object
• Means you can’t mix-and-match shaders easily
• Other APIs don’t have this limitation
• Direct3D
• Prior OpenGL assembly language extensions
• Consoles
• Have a “separate shader objects” extension could fix this
problem
198
Separate Shader Objects Example
• Combining different GLSL shaders at once
Specular brick
bump mapping
Red diffuse
Wobbly torus
Smooth torus
Different
GLSL
vertex
shaders
Different GLSL fragment shaders
199
Deprecation
• Part of OpenGL 3.0 is a marking of features for deprecation
• LOTS of functionality is marked for deprecation
• I contend no real application today uses the non-deprecated
subset of OpenGL—all apps would have to change due to
deprecation
• Some vendors believe getting rid of features will make OpenGL
better in some way
• NVIDIA does not believe in abandoning API compatibility this
way
• OpenGL is part of a large ecosystem so removing features this way
undermines the substantial investment partners have made in
OpenGL over years
• API compatibility and stability is one of OpenGL’s great
strengths
200
Synergy between OpenGL and OpenCL
• Complimentary capabilities
• OpenGL 3.0 = state-of-the-art, cross-platform graphics
• OpenCL 1.0 = state-of-the-art, cross-platform compute
• Computation & Graphics should work together
• Most natural way to intuit compute results is with graphics
• When Compute is done on a GPU, there’s no need to “copy” the
data to see it visualized
• Appendix B of OpenCL specification
• Details with sharing objects between OpenGL and OpenCL
• Called “GL” and “CL” from here on…
201
Four Kinds of Shared Objects
OpenCL 3D image object
cl_mem
OpenGL renderbuffer object
GLuint renderbuffer
OpenGL buffer object
GLuint bufferobj
OpenCL buffer object
cl_mem
OpenGL texture 2D object
GLenum target
GLuint texture
GLint miplevel
OpenGL texture 3D object
GLenum target
GLuint texture
GLint
OpenCL 2D image object
cl_mem
2D image object
cl_mem
clCreateFromGLBuffer
clCreateFromGLTexture2D
clCreateFromGLTexture3D
clCreateFromGLRenderbuffer
OpenGL OpenCL
202
OpenGL / OpenCL Sharing
• Requirements for GL object sharing with CL
• CL context must be created with an OpenGL context
• Each platform-specific API will provide its appropriate
way to create an OpenGL-compatible CL context
• For WGL (Windows), CGL (OS X), GLX (X11/Linux),
EGL (OpenGL ES), etc.
• Creating cl_mem for GL Objects does two things
1.Ensures CL has a reference to the GL objects
2.Provides cl_mem handle to acquire GL object for CL’s
use
• clRetainMemObject & clReleaseMemObject can create
counted references to cl_mem objects
203
Acquiring GL Objects for Compute Access
• Still must “enqueue acquire” GL objects for compute kernels to
use them
• Otherwise reading or writing GL objects with CL is undefined
• Enqueue acquire and release provide sequential consistency
with GL command processing
• Enqueue commands for GL objects
• clEnqueueAcquireGLObjects
• Takes list of cl_mem objects for GL objects & list of
cl_events that must complete before acquire
• Returns a cl_event for this acquire operation
• clEnqueueReleaseGLObjects
• Takes list of cl_mem objects for GL objects & list of
cl_events that must complete before release
• Returns a cl_event for this release operation
204
Unconventional OpenGL Deployments
• Workstation PCs—Quadro
• Consumer PCs—GeForce
• High-end Visualization—QuadroPlex Visual
Computing Solution (VCS)
• Embedded Applications
• Handheld Devices
• Game Consoles
Conventional
PC
OpenGL
Products
Unconventional
205
OpenGL in Context
A facilitated conversation
with Dr. Marc Levoy, Stanford University
206
Questions?

Más contenido relacionado

La actualidad más candente

NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityMark Kilgard
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteElectronic Arts / DICE
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahAMD Developer Central
 
[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile StudioOwen Wu
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsHolger Gruen
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
 
NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017Mark Kilgard
 
OpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsOpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsMark Kilgard
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsElectronic Arts / DICE
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Johan Andersson
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsJohan Andersson
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingElectronic Arts / DICE
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overheadCass Everitt
 
Practical SPU Programming in God of War III
Practical SPU Programming in God of War IIIPractical SPU Programming in God of War III
Practical SPU Programming in God of War IIISlide_N
 
Understaing Android EGL
Understaing Android EGLUnderstaing Android EGL
Understaing Android EGLSuhan Lee
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)Philip Hammer
 
Custom fabric shader for unreal engine 4
Custom fabric shader for unreal engine 4Custom fabric shader for unreal engine 4
Custom fabric shader for unreal engine 4동석 김
 

La actualidad más candente (20)

NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL Functionality
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked Lists
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017
 
OpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsOpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUs
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next Steps
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overhead
 
Practical SPU Programming in God of War III
Practical SPU Programming in God of War IIIPractical SPU Programming in God of War III
Practical SPU Programming in God of War III
 
Understaing Android EGL
Understaing Android EGLUnderstaing Android EGL
Understaing Android EGL
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Custom fabric shader for unreal engine 4
Custom fabric shader for unreal engine 4Custom fabric shader for unreal engine 4
Custom fabric shader for unreal engine 4
 

Destacado

Anatomy of a Texture Fetch
Anatomy of a Texture FetchAnatomy of a Texture Fetch
Anatomy of a Texture FetchMark Kilgard
 
CS 354 Texture Mapping
CS 354 Texture MappingCS 354 Texture Mapping
CS 354 Texture MappingMark Kilgard
 
glut dev c++ membuat nama
glut dev c++ membuat namaglut dev c++ membuat nama
glut dev c++ membuat namaDitta Paski
 
Anthony de Mello - Bewustzijn
Anthony de Mello  - BewustzijnAnthony de Mello  - Bewustzijn
Anthony de Mello - Bewustzijnnonduality01
 
Anthony de-mello-constienta-capcanele-si-sansele-realitatii-160620161428(1)
Anthony de-mello-constienta-capcanele-si-sansele-realitatii-160620161428(1)Anthony de-mello-constienta-capcanele-si-sansele-realitatii-160620161428(1)
Anthony de-mello-constienta-capcanele-si-sansele-realitatii-160620161428(1)Costel Bucur
 
vSphere vStorage: Troubleshooting Performance
vSphere vStorage: Troubleshooting PerformancevSphere vStorage: Troubleshooting Performance
vSphere vStorage: Troubleshooting PerformanceProfessionalVMware
 
2010-JOGL-11-Toon-Shading
2010-JOGL-11-Toon-Shading2010-JOGL-11-Toon-Shading
2010-JOGL-11-Toon-ShadingJohannes Diemke
 
2010-JOGL-09-Texture-Mapping
2010-JOGL-09-Texture-Mapping2010-JOGL-09-Texture-Mapping
2010-JOGL-09-Texture-MappingJohannes Diemke
 
Texture mapping in_opengl
Texture mapping in_openglTexture mapping in_opengl
Texture mapping in_openglManas Nayak
 
CS 354 Blending, Compositing, Anti-aliasing
CS 354 Blending, Compositing, Anti-aliasingCS 354 Blending, Compositing, Anti-aliasing
CS 354 Blending, Compositing, Anti-aliasingMark Kilgard
 
Yoda - HTML5 Content Authoring Tool
Yoda - HTML5 Content Authoring ToolYoda - HTML5 Content Authoring Tool
Yoda - HTML5 Content Authoring ToolHyekyoung Lee
 
Exploring Cultures through Cuisines at the Ultimate Travel Show 2016 - Toronto
Exploring Cultures through Cuisines at the Ultimate Travel Show 2016 - Toronto Exploring Cultures through Cuisines at the Ultimate Travel Show 2016 - Toronto
Exploring Cultures through Cuisines at the Ultimate Travel Show 2016 - Toronto Yashy Murphy
 
Designing for Sensors 
& the Future of Experiences
Designing for Sensors 
& the Future of ExperiencesDesigning for Sensors 
& the Future of Experiences
Designing for Sensors 
& the Future of ExperiencesJeremy Johnson
 
Adobe Digital Publishing Solution
Adobe Digital Publishing SolutionAdobe Digital Publishing Solution
Adobe Digital Publishing Solutionjeon jun
 
VMsoft clairview 제품소개서 (2014.03)
VMsoft clairview 제품소개서 (2014.03)VMsoft clairview 제품소개서 (2014.03)
VMsoft clairview 제품소개서 (2014.03)Daniel Park
 
내손남 Solution
내손남 Solution내손남 Solution
내손남 Solution샬라 박
 

Destacado (20)

Anatomy of a Texture Fetch
Anatomy of a Texture FetchAnatomy of a Texture Fetch
Anatomy of a Texture Fetch
 
CS 354 Texture Mapping
CS 354 Texture MappingCS 354 Texture Mapping
CS 354 Texture Mapping
 
glut dev c++ membuat nama
glut dev c++ membuat namaglut dev c++ membuat nama
glut dev c++ membuat nama
 
Anthony de Mello - Bewustzijn
Anthony de Mello  - BewustzijnAnthony de Mello  - Bewustzijn
Anthony de Mello - Bewustzijn
 
Anthony de-mello-constienta-capcanele-si-sansele-realitatii-160620161428(1)
Anthony de-mello-constienta-capcanele-si-sansele-realitatii-160620161428(1)Anthony de-mello-constienta-capcanele-si-sansele-realitatii-160620161428(1)
Anthony de-mello-constienta-capcanele-si-sansele-realitatii-160620161428(1)
 
B-Plan Pitch Deck Template
B-Plan Pitch Deck TemplateB-Plan Pitch Deck Template
B-Plan Pitch Deck Template
 
vSphere vStorage: Troubleshooting Performance
vSphere vStorage: Troubleshooting PerformancevSphere vStorage: Troubleshooting Performance
vSphere vStorage: Troubleshooting Performance
 
2010-JOGL-11-Toon-Shading
2010-JOGL-11-Toon-Shading2010-JOGL-11-Toon-Shading
2010-JOGL-11-Toon-Shading
 
2010-JOGL-09-Texture-Mapping
2010-JOGL-09-Texture-Mapping2010-JOGL-09-Texture-Mapping
2010-JOGL-09-Texture-Mapping
 
Texture mapping in_opengl
Texture mapping in_openglTexture mapping in_opengl
Texture mapping in_opengl
 
CS 354 Blending, Compositing, Anti-aliasing
CS 354 Blending, Compositing, Anti-aliasingCS 354 Blending, Compositing, Anti-aliasing
CS 354 Blending, Compositing, Anti-aliasing
 
Yoda - HTML5 Content Authoring Tool
Yoda - HTML5 Content Authoring ToolYoda - HTML5 Content Authoring Tool
Yoda - HTML5 Content Authoring Tool
 
TWJournal2
TWJournal2TWJournal2
TWJournal2
 
Exploring Cultures through Cuisines at the Ultimate Travel Show 2016 - Toronto
Exploring Cultures through Cuisines at the Ultimate Travel Show 2016 - Toronto Exploring Cultures through Cuisines at the Ultimate Travel Show 2016 - Toronto
Exploring Cultures through Cuisines at the Ultimate Travel Show 2016 - Toronto
 
Designing for Sensors 
& the Future of Experiences
Designing for Sensors 
& the Future of ExperiencesDesigning for Sensors 
& the Future of Experiences
Designing for Sensors 
& the Future of Experiences
 
Adobe Digital Publishing Solution
Adobe Digital Publishing SolutionAdobe Digital Publishing Solution
Adobe Digital Publishing Solution
 
VMsoft clairview 제품소개서 (2014.03)
VMsoft clairview 제품소개서 (2014.03)VMsoft clairview 제품소개서 (2014.03)
VMsoft clairview 제품소개서 (2014.03)
 
Sengketa jual beli tanah adat
Sengketa jual beli tanah adatSengketa jual beli tanah adat
Sengketa jual beli tanah adat
 
Final field semantics
Final field semanticsFinal field semantics
Final field semantics
 
내손남 Solution
내손남 Solution내손남 Solution
내손남 Solution
 

Similar a SIGGRAPH Asia 2008 Modern OpenGL

The next generation of GPU APIs for Game Engines
The next generation of GPU APIs for Game EnginesThe next generation of GPU APIs for Game Engines
The next generation of GPU APIs for Game EnginesPooya Eimandar
 
ngGoBuilder and collaborative development between San Francisco and Tokyo
ngGoBuilder and collaborative development between San Francisco and TokyongGoBuilder and collaborative development between San Francisco and Tokyo
ngGoBuilder and collaborative development between San Francisco and Tokyonotolab
 
13th kandroid OpenGL and EGL
13th kandroid OpenGL and EGL13th kandroid OpenGL and EGL
13th kandroid OpenGL and EGLJungsoo Nam
 
Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile Games
Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile GamesUnreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile Games
Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile GamesEpic Games China
 
OpenGL Shading Language
OpenGL Shading LanguageOpenGL Shading Language
OpenGL Shading LanguageJungsoo Nam
 
SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012Mark Kilgard
 
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axelaparuma
 
Community works for muli core embedded image processing
Community works for muli core embedded image processingCommunity works for muli core embedded image processing
Community works for muli core embedded image processingJeongpyo Kong
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & FutureOfer Rosenberg
 
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondSIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondMark Kilgard
 
Introduction of openGL
Introduction  of openGLIntroduction  of openGL
Introduction of openGLGary Yeh
 
Lets have a look at Apple's Metal Framework
Lets have a look at Apple's Metal FrameworkLets have a look at Apple's Metal Framework
Lets have a look at Apple's Metal FrameworkLINE Corporation
 
PSGL (PlayStation Graphics Library)
PSGL (PlayStation Graphics Library)PSGL (PlayStation Graphics Library)
PSGL (PlayStation Graphics Library)Slide_N
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
Seminar presentation on OpenGL
Seminar presentation on OpenGLSeminar presentation on OpenGL
Seminar presentation on OpenGLMegha V
 
Embedded Graphics Drivers in Mesa (ELCE 2019)
Embedded Graphics Drivers in Mesa (ELCE 2019)Embedded Graphics Drivers in Mesa (ELCE 2019)
Embedded Graphics Drivers in Mesa (ELCE 2019)Igalia
 

Similar a SIGGRAPH Asia 2008 Modern OpenGL (20)

The next generation of GPU APIs for Game Engines
The next generation of GPU APIs for Game EnginesThe next generation of GPU APIs for Game Engines
The next generation of GPU APIs for Game Engines
 
Programming with OpenGL
Programming with OpenGLProgramming with OpenGL
Programming with OpenGL
 
What is OpenGL ?
What is OpenGL ?What is OpenGL ?
What is OpenGL ?
 
ngGoBuilder and collaborative development between San Francisco and Tokyo
ngGoBuilder and collaborative development between San Francisco and TokyongGoBuilder and collaborative development between San Francisco and Tokyo
ngGoBuilder and collaborative development between San Francisco and Tokyo
 
13th kandroid OpenGL and EGL
13th kandroid OpenGL and EGL13th kandroid OpenGL and EGL
13th kandroid OpenGL and EGL
 
Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile Games
Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile GamesUnreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile Games
Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile Games
 
OpenGL Shading Language
OpenGL Shading LanguageOpenGL Shading Language
OpenGL Shading Language
 
SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012
 
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe
 
Community works for muli core embedded image processing
Community works for muli core embedded image processingCommunity works for muli core embedded image processing
Community works for muli core embedded image processing
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & Future
 
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondSIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
 
OpenGL 4 for 2010
OpenGL 4 for 2010OpenGL 4 for 2010
OpenGL 4 for 2010
 
Introduction of openGL
Introduction  of openGLIntroduction  of openGL
Introduction of openGL
 
Lets have a look at Apple's Metal Framework
Lets have a look at Apple's Metal FrameworkLets have a look at Apple's Metal Framework
Lets have a look at Apple's Metal Framework
 
PSGL (PlayStation Graphics Library)
PSGL (PlayStation Graphics Library)PSGL (PlayStation Graphics Library)
PSGL (PlayStation Graphics Library)
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
Seminar presentation on OpenGL
Seminar presentation on OpenGLSeminar presentation on OpenGL
Seminar presentation on OpenGL
 
Embedded Graphics Drivers in Mesa (ELCE 2019)
Embedded Graphics Drivers in Mesa (ELCE 2019)Embedded Graphics Drivers in Mesa (ELCE 2019)
Embedded Graphics Drivers in Mesa (ELCE 2019)
 
2D graphics
2D graphics2D graphics
2D graphics
 

Más de Mark Kilgard

D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...Mark Kilgard
 
Computers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsComputers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsMark Kilgard
 
Virtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUsVirtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUsMark Kilgard
 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMark Kilgard
 
EXT_window_rectangles
EXT_window_rectanglesEXT_window_rectangles
EXT_window_rectanglesMark Kilgard
 
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Mark Kilgard
 
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware PipelineAccelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware PipelineMark Kilgard
 
NV_path rendering Functional Improvements
NV_path rendering Functional ImprovementsNV_path rendering Functional Improvements
NV_path rendering Functional ImprovementsMark Kilgard
 
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path RenderingSIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path RenderingMark Kilgard
 
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...Mark Kilgard
 
GPU accelerated path rendering fastforward
GPU accelerated path rendering fastforwardGPU accelerated path rendering fastforward
GPU accelerated path rendering fastforwardMark Kilgard
 
GPU-accelerated Path Rendering
GPU-accelerated Path RenderingGPU-accelerated Path Rendering
GPU-accelerated Path RenderingMark Kilgard
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingMark Kilgard
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering Mark Kilgard
 
GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012Mark Kilgard
 
CS 354 Final Exam Review
CS 354 Final Exam ReviewCS 354 Final Exam Review
CS 354 Final Exam ReviewMark Kilgard
 
CS 354 Surfaces, Programmable Tessellation, and NPR Graphics
CS 354 Surfaces, Programmable Tessellation, and NPR GraphicsCS 354 Surfaces, Programmable Tessellation, and NPR Graphics
CS 354 Surfaces, Programmable Tessellation, and NPR GraphicsMark Kilgard
 
CS 354 Performance Analysis
CS 354 Performance AnalysisCS 354 Performance Analysis
CS 354 Performance AnalysisMark Kilgard
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration StructuresMark Kilgard
 

Más de Mark Kilgard (20)

D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...
 
Computers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsComputers, Graphics, Engineering, Math, and Video Games for High School Students
Computers, Graphics, Engineering, Math, and Video Games for High School Students
 
Virtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUsVirtual Reality Features of NVIDIA GPUs
Virtual Reality Features of NVIDIA GPUs
 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to Vulkan
 
EXT_window_rectangles
EXT_window_rectanglesEXT_window_rectangles
EXT_window_rectangles
 
OpenGL for 2015
OpenGL for 2015OpenGL for 2015
OpenGL for 2015
 
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
 
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware PipelineAccelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
 
NV_path rendering Functional Improvements
NV_path rendering Functional ImprovementsNV_path rendering Functional Improvements
NV_path rendering Functional Improvements
 
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path RenderingSIGGRAPH Asia 2012: GPU-accelerated Path Rendering
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
 
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
 
GPU accelerated path rendering fastforward
GPU accelerated path rendering fastforwardGPU accelerated path rendering fastforward
GPU accelerated path rendering fastforward
 
GPU-accelerated Path Rendering
GPU-accelerated Path RenderingGPU-accelerated Path Rendering
GPU-accelerated Path Rendering
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering
 
GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012
 
CS 354 Final Exam Review
CS 354 Final Exam ReviewCS 354 Final Exam Review
CS 354 Final Exam Review
 
CS 354 Surfaces, Programmable Tessellation, and NPR Graphics
CS 354 Surfaces, Programmable Tessellation, and NPR GraphicsCS 354 Surfaces, Programmable Tessellation, and NPR Graphics
CS 354 Surfaces, Programmable Tessellation, and NPR Graphics
 
CS 354 Performance Analysis
CS 354 Performance AnalysisCS 354 Performance Analysis
CS 354 Performance Analysis
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration Structures
 

Último

Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 

Último (20)

Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 

SIGGRAPH Asia 2008 Modern OpenGL

  • 1. 1
  • 2. 2 Mark J. Kilgard, NVIDIA Kurt Akeley, Microsoft Research 13 December 2008 Singapore Modern OpenGL: Its Design and Evolution
  • 4. 4 Kurt Akeley • Led development of OpenGL at Silicon Graphics (SGI) • Co-founded SGI • Lead development of SGI’s high-end graphics hardware • Co-author of OpenGL specification • Returned to Stanford University to complete Ph.D. • Co-developed Cg “C for graphics” language at NVIDIA • Principal Researcher, Microsoft Research Silicon Valley • Spent time at Microsoft Research Asia in Beijing • Member of US National Academy of Engineering
  • 5. 5 Mark Kilgard • Principal System Software Engineer, NVIDIA, Austin, Texas • Developed original OpenGL driver for 1st GeForce GPU • Specified many key OpenGL extensions • Works on Cg for portable programmable shading • NVIDIA Distinguished Inventor • Before NVIDIA, worked at Silicon Graphics • Worked on X Window System integration for OpenGL • Developed popular OpenGL Utility Toolkit (GLUT) • Wrote book on OpenGL and X, co-authored Cg Tutorial
  • 6. 6 Marc Levoy • Moderator for our facilitated discussion • Professor of Computer Science and Electrical Engineering • Stanford University • SIGGRAPH Computer Graphics Achievement Award • ACM Fellow
  • 7. 7 Course Schedule • Modern OpenGL (Kilgard) • OpenGL’s evolution: a personal retrospective (Akeley) • Writing better OpenGL (Kilgard) • Implementing OpenGL (Kilgard) • OpenGL’s future evolution (Kilgard) • OpenGL in Context (Akeley, Kilgard, Levoy) • Facilitated conversation – Mid-session break –
  • 8. 8 Check Out the Course Notes (1) • Look to www.opengl.org web site for our final slides • New Material • “An Incomplete History of OpenGL” (Kilgard) • How the OpenGL graphics system developed • “Using Vertex Buffer Objects Well” (Kilgard) • Learn how to use Vertex Buffers objects for high vertex processing rates
  • 9. 9 Check Out the Course Notes (2) • Paper Reprints • OpenGL design rationale from its specification co- authors (Segal, Akeley) • Realizing OpenGL: two implementations of one architecture (Kilgard) • Graphics hardware: GTX, RealityEngine, InfiniteReality, GeForce 6800 • Key developments in graphics hardware design over last 20 years • GPU Programmability: “User-Programmable Vertex Engine” and “Cg” SIGGAPH papers • “How GPUs Work” (Luebke, Humpherys)
  • 10. 10 Modern OpenGL Mark Kilgard Principal System Software Engineer NVIDIA
  • 11. 11 Modern OpenGL • History • How did OpenGL get where it is now? • Present • Version 3.0 • Functionality beyond 3.0
  • 12. 12 An Overview History of OpenGL • Pre-history 1991 • IRIS GL, a proprietary Graphics Library by SGI • OpenGL, an open standard for 3D • Focus: procedural hardware-accelerated 3D graphics • Governed by Architectural Review Board (ARB) • Extensibility planned into design • Competition • Proprietary APIs (1991-1995) • PHIGS & PEX for X Window System (1992-1997) • Microsoft’s Direct3D (1998-)
  • 13. 13 OpenGL’s Pre-history IRIS GL 1 Window system: MEX IRIS GL 2 Window system: MEX Operating system: UNIX IRIS GL 3 Window system: NeWS/X11 Operating system: IRIX 3.x IRIS GL 4 Window system: Native X11 Operating system: IRIX 4.3 OpenGL 1.0 Window system: Native X11 with GLX Operating system: IRIX 5.1 1991 1993 1988 1986 1983 First work on GL 5.0 proposal 1989 Dates are for shipping commercial SGI implementation 1983-2008 = 25 years
  • 14. 14 OpenGL’s Design Philosophy • High-performance • Assumes hardware acceleration • Defined by a specification • Rather than a de-facto implementation • Rendering state machine • Procedural • Not a window system, not a scene graph • No initial sub-setting • Extensible • Data type rich • Cross-platform • Window system- independent core • X Window System, Microsoft Windows, OS/2, OS X, etc. • Multi-language bindings • C, FORTRAN, etc. • Not merely an API, rather a system
  • 15. 15 Timeline of OpenGL’s Development 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1st commercial OpenGL implementation (DEC)
  • 16. 16 Competitive 3D APIs • OpenGL has always existed in competition with other APIs • Strengthened OpenGL by driving feature parity • OpenGL’s competitive strengths: 1. Cross platform, open process 2. API stability, extensibility 3. Clean initial design & specification 1992 1994 1996 1998 2000 2002 2004 2006 2008 Proprietary Unix workstation 3D APIs XGL Doré Starbase IRIS GL X Consortium 3D standard PEX Microsoft Direct3D DirectX 3 DirectX 5 DirectX 6 DirectX 7 DirectX 8 DirectX 9 DirectX 10
  • 17. 17 OpenGL 1.0 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1st commercial OpenGL implementation (DEC) •Immediate mode •Vertex transformation and lighting •Points, lines, polygons •Stippling, wide points and lines •Bitmaps, image rectangles, and pixel reads •Pixel store and transfer •1D and 2D textures, fog, and scissor •Display lists and evaluators •RGBA and color index color models •Color, depth, stencil, and accumulation buffers •Selection and feedback modes •Queries
  • 18. 18 OpenGL State Machine • From OpenGL 3.0 specification, unchanged since 1.0
  • 19. 19 SGI “Classic” Hardware View of OpenGL 3D Application or Game • Entirely fixed-function, no programmability • High-end SGI hardware manifested functionality in distinct chips OpenGL API Front End Vertex Assembly Vertex Transform & Lighting Primitive Assembly, Clipping, Setup, and Rasterization Texture & Fog Texture Fetch Raster Operations Framebuffer Access Memory Interface Graphics Hardware Boundary 1992 Graphics data flow Memory operations Fixed-function unit Programmable unit
  • 20. 20 OpenGL 1.1 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1st commercial OpenGL implementation (DEC) • Vertex arrays • Texture objects • Texture internal formats • Texture sub-image updates • Texture proxies • Copy framebuffer-to-texture • Polygon offset • RGBA logical operations
  • 21. 21 The Look of OpenGL 1.1 SGI skyfly demoSGI skyfly demo StenciledStenciled shadow volumesshadow volumes Ideas in MotionIdeas in Motion
  • 22. 22 OpenGL 1.2 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1st commercial OpenGL implementation (DEC) • 3D textures • Texture edge clamp wrap mode • Texture level-of-detail clamping • BGRA component order • Packed pixel formats • Imaging subset (optional) • Normal rescaling • Separate specular • Vertex array draw elements range
  • 23. 23 Akeley’s (Modernized) OpenGL Data Flow vertex shading rasterization & fragment shading texture raster operations framebuffer pixel unpack pixel pack vertex puller client memory pixel transfer glReadPixels / glCopyPixels / glCopyTex{Sub}Image glDrawPixels glBitmap glCopyPixels glTex{Sub}Image glCopyTex{Sub}Image glDrawElements glDrawArrays selection / feedback / transform feedback glVertex* glColor* glTexCoord* etc. blending depth testing stencil testing accumulation storage operations
  • 24. 24 OpenGL 1.2 Imaging Subset Color Table Convolution (separable or general) Post-convolve Scale & Bias Post-convolve Color Table Color Matrix Post-color matrix Scale & Bias Post-color matrix Color Table Histogram Min-max Look-up Table (RGBA-to-RGBA) Look-up Table (Index-to-RGBA) Scale & Bias Shift & Add Index pixels RGBA pixels Pixel Rectangle Rasterization core functionality ARB_imaging subset discard discard
  • 25. 25 OpenGL 1.2.1 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1st commercial OpenGL implementation (DEC) • Multi-texture (optional)
  • 26. 26 Multi-texture Poster Child: Quake 2 Light Maps × (modulate) = lightmaps onlylightmaps only decal onlydecal only combined scenecombined scene
  • 27. 27 GeForce 256 (NV10) View of OpenGL 3D Application or Game • Vertex pulling (vertex buffer objects) via DMA • Dual-texture, cube maps, and register combiners OpenGL API GPU Front End Vertex Assembly Vertex Transform & Lighting Primitive Assembly, Clipping, Setup, and Rasterization Texture & Fog Texture Fetch Raster Operations Framebuffer Access Memory Interface CPU – GPU Boundary 1999 Attribute Fetch
  • 28. 28 Hardware Cube Maps Rendered sceneRendered scene DynamicallyDynamically createdcreated cube map imagecube map image Image credit: “Guts” GeForce 2 GTS demo, Thant Thessman
  • 29. 29 OpenGL 1.3 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1st commercial OpenGL implementation (DEC) • Multi-texture (required now) • Cube map texturing • Compressed texture formats • Texture border clamp • Texture environment functions • Add, combine, dot product • Multisample anti-aliasing • Transpose matrix
  • 30. 30 GeForce 3 & 4 Ti (NV2x) View of OpenGL 3D Application or Game • Programmable vertex processing • Highly configurable fragment processing OpenGL API GPU Front End Vertex Assembly Vertex Program Primitive Assembly, Clipping, Setup, and Rasterization Multi-texture shaders & Combiners Texture Fetch Raster Operations Framebuffer Access Memory Interface CPU – GPU Boundary 2001 Attribute Fetch
  • 31. 31 Vertex Programmability Paletted matrixPaletted matrix skinningskinning Twister vertex programTwister vertex program Per-vertexPer-vertex cartooncartoon shadingshading
  • 32. 32 Configurable Fragment Processing Bumpy shiny environment mappingBumpy shiny environment mappingChromaticChromatic aberrationaberration Offset 2D bumpOffset 2D bump mappingmapping Depth spritesDepth sprites
  • 33. 33 OpenGL 1.4 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1st commercial OpenGL implementation (DEC) • Automatic mipmap generation • Shadow-mapping • Depth textures and shadow comparisons • Texture level-of-detail bias • Texture mirrored repeat wrap mode • Multi-texture combination • Fog coordinate • Secondary color • Configurable point size attenuation • Color blending improvements • Stencil wrap operations • Window-space raster position specification
  • 34. 34 Hardware Shadow Mapping Without shadow mappingWithout shadow mapping WithWith shadow mappingshadow mapping Depth map from lightDepth map from light source’s viewsource’s view Darker is closerDarker is closer lightlight positionposition Projective Texturing (1.0) & Polygon Offset (1.1) key enablers
  • 35. 35 Shadow Mapping Explained Planar distance from lightPlanar distance from light Depth map projected onto sceneDepth map projected onto scene ≤≤ == lessless thanthan True “un-shadowed”True “un-shadowed” region shown greenregion shown green equalsequals
  • 36. 36 OpenGL 1.5 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1st commercial OpenGL implementation (DEC) • Vertex buffer objects (VBOs) • Occlusion queries • Generalized shadow mapping functions
  • 37. 37 GeForce FX (NV3x) View of OpenGL 3D Application or Game • Programmable fragment processing • 16 texture units, IEEE 754 32-bit floating-point • Vertex program branching OpenGL API GPU Front End Vertex Assembly Vertex Program Primitive Assembly, Clipping, Setup, and Rasterization Fragment Program Texture Fetch Raster Operations Framebuffer Access Memory Interface CPU – GPU Boundary 2003 Attribute Fetch
  • 39. 39 OpenGL Fragment Program Flowchart More Instructions? Read Interpolants and/or Registers Map Input values: Swizzle, Negate, etc. Perform Instruction Math / Operation Write Output Register with Masking Begin Fragment Fetch & Decode Next Instruction Temporary Registers initialized to 0,0,0,0 Output Depth & Color Registers initialized to 0,0,0,1 Initialize Parameters Emit Output Registers as Transformed Vertex End Fragment Fragment Program Instruction Loop Fragment Program Instruction Memory Texture Fetch Instruction? yes no no Compute Texture Address & Level- of-detail & Fetch Texels Filter Texels yes Texture Images Primitive Interpolants
  • 40. 40 Key Trend: Configurability becomes Programmability Fixed-function Programmable Simple Configurability Complex Configurability
  • 41. 41 Core OpenGL fragment texturing & coloring Point Rasterization Line Rasterization Polygon Rasterization Pixel Rectangle Rasterization Bitmap Rasterization From Primitive Assembly DrawPixels Bitmap Conventional Texture Fetching Texture Environment Application Color Sum Fog To raster operations Coverage Application Texture Unit 0 Texture Unit 1 Texture Unit 0 Texture Unit 1
  • 42. 42 NV1x OpenGL fragment texturing & coloring Point Rasterization Line Rasterization Polygon Rasterization Pixel Rectangle Rasterization Bitmap Rasterization From Primitive Assembly DrawPixels Bitmap Conventional Texture Fetching Texture Environment Application Color Sum Fog To raster operations Coverage Application Register Combiners Texture Unit 0 General Stage 1 Final Stage Texture Unit 1 General Stage 0 Texture Unit 0 Texture Unit 1 GL_REGISTER_COMBINERS_NV enable
  • 43. 43 Texture Shader 3 … Texture Shader 1 Texture Shader 0 Register Combiners NV2x OpenGL fragment texturing & colorin Point Rasterization Line Rasterization Polygon Rasterization Pixel Rectangle Rasterization Bitmap Rasterization From Primitive Assembly DrawPixels Bitmap Conventional Texture Fetching Texture Environment Application Color Sum Fog To raster operations Coverage Application Texture Shaders General Stage 1 Final Combiner General Stage 0 General Stage 7 …Texture Unit 3 … Texture Unit 1 Texture Unit 0 Texture Unit 3 … Texture Unit 1 Texture Unit 0 GLTEXTURE_SHADER_NV enable GL_REGISTER_COMBINERS_NV enable
  • 44. 44 Fragment Program Instruction 0 Texture Shader 3 … Texture Shader 1 Texture Shader 0 NV3x OpenGL fragment texturing & coloring Point Rasterization Line Rasterization Polygon Rasterization Pixel Rectangle Rasterization Bitmap Rasterization From Primitive Assembly DrawPixels Bitmap Conventional Texture Fetching Texture Environment Application Color Sum Fog To raster operations Coverage Application Texture Shaders General Stage 1 Final Combiner General Stage 0 General Stage 7 … Texture Unit 3 … Texture Unit 1 Texture Unit 0 Texture Unit 3 … Texture Unit 1 Texture Unit 0 … Fragment Program Fragment Program Instruction 1023 GL_REGISTER_COMBINERS_NV enable GLTEXTURE_SHADER_NV enable GL_FRAGMENT_PROGRAM_NV enable !!FP1.0 or !!ARBfp1.0 programs
  • 45. 45 OpenGL 2.0 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1st commercial OpenGL implementation (DEC) • Programmable shading • OpenGL Shading Language (GLSL) • Multiple color buffer rendering targets • Non-power-of-two texture dimensions • Point sprites • Separate blend equation • Two-sided stencil testing
  • 46. 46 GeForce 6 & 7 (NV4x/G7x) View of OpenGL 3D Application or Game • Limited vertex texturing • Fragment branching • Multiple render targets & floating-point blending OpenGL API GPU Front End Vertex Assembly Vertex Program Primitive Assembly, Clipping, Setup, and Rasterization Fragment Program Texture Fetch Raster Operations Framebuffer Access Memory Interface CPU – GPU Boundary 2004 Attribute Fetch
  • 47. 47 Primitive Program GeForce 8 & 9 (G8x/G9x) View of OpenGL 3D Application or Game • Primitive (geometry) programs • Parameter reads from buffer objects • Transform feedback (stream out) OpenGL API GPU Front End Vertex Assembly Vertex Program , Clipping, Setup, and Rasterization Fragment Program Texture Fetch Raster Operations Framebuffer Access Memory Interface CPU – GPU Boundary 2006 Attribute Fetch Primitive Assembly Parameter Buffer Read
  • 48. 48 Primitive Program OpenGL Pipeline Fixed-function Steps • Much of functional pipeline remains fixed-function • Vital to maintaining performance and data flow • Hard to compete with hard-wired rasterization, Zcull, and pixel compression GPU Front End Vertex Assembly Vertex Program , Clipping, Setup, and Rasterization Fragment Program Texture Fetch Raster Operations Framebuffer Access Memory Interface 2006 Attribute Fetch Primitive Assembly Parameter Buffer Read
  • 49. 49 Primitive Program OpenGL Pipeline Programmable Domains • New geometry shader domain for per-primitive programmable processing • Unified Streaming Processor Array (SPA) architecture means same capabilities for all domains GPU Front End Vertex Assembly Vertex Program , Clipping, Setup, and Rasterization Fragment Program Texture Fetch Raster Operations Framebuffer Access Memory Interface 2006 Attribute Fetch Primitive Assembly Parameter Buffer Read Can be unified hardware!
  • 50. 50 OpenGL 2.1 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1st commercial OpenGL implementation (DEC) • OpenGL Shading Language (GLSL) improvements • Non-square matrices • Pixel buffer objects (PBOs) • sRGB color space texture formats
  • 51. 51 OpenGL 3.0 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1st commercial OpenGL implementation (DEC) • OpenGL Shading Language (GLSL) improvements • New texture fetches • True integer data types and operators • switch/case/default flow control statements • Conditional rendering based on occlusion query results • Transform feedback • Vertex array objects • Floating-point textures, color buffers, and depth buffers • Half-precision vertex arrays • Texture arrays • Integer textures • Red and red-green texture formats • Compressed red and red-green formats • Framebuffer objects (FBOs) • Packed depth-stencil pixel formats • Per-color buffer clearing, blending, and masking • sRGB color space color buffers • Fine-grain buffer mapping and flushing
  • 52. 52 Areas of 3.0 Functionality Improvement • Programmability • Shader Model 4.0 features • OpenGL Shading Language (GLSL) 1.30 • Texturing • New texture representations and formats • Framebuffer operations • Framebuffer objects • New formats • New copy (blit), clear, blend, and masking operations • Buffer management • Non-blocking and fine-grain update of buffer object data stores • Vertex processing • Vertex array configuration objects • Conditional rendering for occlusion culling • New half-precision vertex attribute formats • Pixel processing • New half-precision external pixel formats All Brand New Core Features
  • 53. 53 OpenGL 3.0 Programmability • Shader Model 4.0 additions • True signed & unsigned integer values • True integer operators: ^, &, |, <<. >>, %,~ • Texture additions • Texture arrays • Base texture size queries • Texel offsets to fetches • Explicit LOD and derivative control • Integer samplers • Interpolation modifiers: centroid, noperspective, and flat • Vertex array element number: gl_VertexID • OpenGL Shading Language (GLSL) improvements • ## concatenation in pre-processor for macros • switch/case/default statements
  • 54. 54 OpenGL 3.0 Texturing Functionality • Texture representation • Texture arrays: indexed access to a set of 1D or 2D texture images • Texture formats • Floating-point texture formats • Single-precision (32-bit, IEEE s23e8) • Half-precision (16-bit, s10e5) • Red & red/green texture formats • Intended as FBO framebuffer formats too • Compressed red & red/green texture formats • Shared exponent texture formats • Packed floating-point texture formats
  • 55. 55 Texture Arrays • Conventional texture = One logical pre-filtered image • Texture array = index-able plurality of pre-filtered images • Rationale is fewer texture object binds when drawing different objects • No filtering between mipmap sets in a texture array • All mipmap sets in array share same format/border & base dimensions • Both 1D and 2D texture arrays • Require shaders, no fixed-function support • Texture image specification • Use glTexImage3D, glTexSubImage3D, etc. to load 2D texture arrays • No new OpenGL commands for texture arrays • 3rd dimension specifies integer array index • No halving in 3rd dimension for mipmaps • So 64×128x17 reduces to 32×64×17 all the way to 1×1×17
  • 56. 56 Texture Arrays Example • Multiple skins packed in texture array • Motivation: binding to one multi-skin texture array avoids texture bind per object Texture array index 0 1 2 3 4 0 1 2 3 4 Mipmaplevelindex
  • 57. 57 Compact Floating-point Textures • Shared exponent & packed float representations are ideal of High Dynamic Range (HDR) applications
  • 58. 58 Compact Floating-point Texture Formats • Packed float format • No sign bit, independent exponents • Shared exponent format • No sign bit, shared exponent, no implied leading 1 5-bit mantissa 5-bit exponent 6-bit mantissa 5-bit exponent 6-bit mantissa 5-bit exponent bit 31 bit 0 9-bit mantissa 5-bit shared exponent 9-bit mantissa 9-bit mantissa bit 31 bit 0
  • 59. 59 1- and 2-component Block Compression Scheme • Basic 1-component block compression format • Borrowed from alpha compression scheme of S3TC 5 8-bit B8-bit A 2 min/max values 64 bits total per block + 4x4 Pixel Decoded BlockEncoded Block 16 pixels x 8-bit/componet = 128 bits decoded so effectively 2:1 compression 16 bits
  • 60. 60 Framebuffer Operations • Framebuffer objects • Standardized framebuffer objects (FBOs) for rendering to textures and renderbuffers • Render-to-texture • Multisample renderbuffers for FBOs • Framebuffer operations • Copies from one FBO to another, including multisample data • Per-color attachment color clears, blending, and write masking • Framebuffer formats • Floating-point color buffers • Floating-point depth buffers • Rendering into framebuffer format with 3 small unsigned floating- point values packed in a 32-bit value • Rendering into sRGB color space framebuffers
  • 61. 61 Framebuffer Object Example • Depth peeling for correctly ordered transparency • Great render-to-texture application for FBOs
  • 62. 62 Depth Peeling Behind the Scenes • Depth buffer has closest fragment at all pixels • Save depth buffer • Render again, but use depth buffer as shadow map • Discard fragment in front of shadow map’s depth value • Effectively peels one layer of depth! • Resulting color buffer is 2nd closest fragment • And depth buffer for 2nd closest fragments’ depth • Now repeat peeling more layers • Use ping-pong depth buffer scheme • Use occlusion query to detect when no more fragments to peel • Composite color layers front-to-back (or back- to-front) • Front-to-back peeling can be done during the peeling process
  • 63. 63 Delicate Color Fidelity with sRGB • Problem: PC display devices have non-linear (sRGB) display gamut —delicate color shading looks wrong Conventional rendering (uncorrected color) Gamma correct (sRGB rendered) Softer and more natural Unnaturally deep facial shadows NVIDIA’s Adriana GeForce 8 Launch Demo
  • 64. 64 What is sRGB? • A standard color space • Intended for monitors, printers, and the Internet • Created cooperatively by HP and Microsoft • Non-linear, roughly gamma of 2.2 • Intuitively “encodes more dark values” • OpenGL 2.1 already added sRGB texture formats • Texture fetch converts sRGB to linear RGB, then filters • Result takes more than 8-bit fixed-point to represent in shader • 3.0 adds complementary sRGB framebuffer support • “sRGB correct blending” converts framebuffer sRGB to linear, blend with linear color from shader, then convert back to sRGB • Works with FrameBuffer Objects (FBOs) sRGB chromaticity
  • 65. 65 So why sRGB? Standard Windows Display is Not Gamma Corrected • 25+ years of PC graphics, icons, and images depend on not gamma correcting displays • sRGB textures and color buffers compensates for this “Expected” appearance of Windows desktop & icons but 3D lighting too dark Wash-ed out desktop appearance if color response was linear but 3D lighting is correct Gamma 1.0 Gamma 2.2 linear color response
  • 66. 66 Vertex Processing • Vertex array configuration • Objects to manage vertex array configuration client state • Half-precision floating-point vertex array formats • Vertex output streaming • Stream transformed vertex results into buffer object data stores • Occlusion culling • Skip rendering based on occlusion query result
  • 67. 67 Miscellaneous • Pixel Processing • Half-precision floating-point pixel external formats • Buffer Management • Non-blocking and fine-grain update of buffer object data stores
  • 68. 68 ARB Extensions to OpenGL 3.0 • OpenGL 3.0 standard provides new ARB extensions • Extensions go beyond OpenGL 3.0 • Standardized at same time as OpenGL 3.0 • Support features in hardware today • Specifically • ARB_geometry_shader4—provides per-primitive programmable processing • ARB_draw_instanced—gives shader access to instance ID • ARB_texture_buffer_object—allows buffer object to be sampled as a huge 1D unfiltered texture • Shipping today • NVIDIA driver provides all three
  • 69. 69 Transform Feedback for Terrain Generation by Recursive Subdivision • Geometry shaders + transform feedback 1. Render quads (use 4-vertex line adjacency primitive) from vertex buffer object 2. Fetch height field 3. Stream subdivided positions and normals to transform feedback “other” buffer object 4. Use buffer object as vertex buffer 5. Repeat, ping-pong buffer objects Computation and data all stays on the GPU!
  • 70. 70 Skin Deformation • Capture & re-use geometric deformations Transform feedback allows the GPU to calculate the interactive, deforming elastic skin of the frog
  • 71. 71 Silhouette Edge Rendering • Uses geometry shader silhouette edge detection geometry shader Complete mesh Silhouette edges Useful for non-photorealistic rendering Looks like human sketching
  • 72. 72 More Geometry Shader Examples Shimmering point sprites Generate fins for lines Generate shells for fur rendering
  • 73. 73 Improved Interpolation Techniques •Using geometry shader functionality Quadratic normal interpolation True quadrilateral rendering with mean value coordinate interpolation
  • 74. 74 “Fair” Quadrilateral Interpolation • glBegin(GL_QUADS); • glColor3fv(red); glVertex3fv(lowerLeft); • glColor3fv(green); glVertex3fv(lowerRight); • glColor3fv(red); glVertex3fv(upperRight); • glColor3fv(blue); glVertex3fv(upperLeft); • glEnd(); • Geometry shader actually operates on 4-vertex GL_LINE_ADJACENCY primitives instead of quads Wrong, slash triangle split Wrong, backslash triangle split Better: Mean value coordinates
  • 75. 75 OpenGL 2.x ARB Extensions • Many OpenGL 3.0 extensions have corresponding ARB extensions for OpenGL 2.1 implementations to advertise • Helps get 3.0 functionality out sooner, rather than later • New ARB extensions for 3.0 functionality • ARB_framebuffer_object—framebuffer objects (FBOs) for render-to- texture • ARB_texture_rg—red and red/green texture formats • ARB_map_buffer_region—non-blocking and fine-grain update of buffer object data stores • ARB_instanced_arrays—instance ID available to shaders • ARB_half_float_vertex—half-precision floating-point vertex array formats • ARB_framebuffer_sRGB—rendering into sRGB color space framebuffers • ARB_texture_compression_rgtc—compressed red and red/green texture formats • ARB_depth_buffer_float—floating-point depth buffers • ARB_vertex_array_object—objects to manage vertex array configuration client state
  • 76. 76 Beyond OpenGL 3.0 OpenGL 3.0 • EXT_gpu_shader4 • NV_conditional_render • ARB_color_buffer_float • NV_depth_buffer_float • ARB_texture_float • EXT_packed_float • EXT_texture_shared_exponent • NV_half_float • ARB_half_float_pixel • EXT_framebuffer_object • EXT_framebuffer_multisample • EXT_framebuffer_blit • EXT_texture_integer • EXT_texture_array • EXT_packed_depth_stencil • EXT_draw_buffers2 • EXT_texture_compression_rgtc • EXT_transform_feedback • APPLE_vertex_array_object • EXT_framebuffer_sRGB • APPLE_flush_buffer_range (modified) In GeForce 8, 9, & 2xx Series but not yet core • EXT_geometry_shader4 (now ARB) • EXT_bindable_uniform • NV_gpu_program4 • NV_parameter_buffer_object • EXT_texture_compression_latc • EXT_texture_buffer_object (now ARB) • NV_framebuffer_multisample_coverage • NV_transform_feedback2 • NV_explicit_multisample • NV_multisample_coverage • EXT_draw_instanced (now ARB) • EXT_direct_state_access • EXT_vertex_array_bgra • EXT_texture_swizzle Plenty of proven OpenGL extensions for OpenGL Working Group to draw upon for OpenGL 3.1
  • 77. 77 OpenGL Version Evolution • Now OpenGL is part of Khronos Group • Previously OpenGL’s evolution was governed by the OpenGL Architectural Review Board (ARB) • Now officially a Khronos working group • Khronos also standardizes OpenCL, OpenVG, etc. • How OpenGL version updates happen • OpenGL participants proposing extensions • Successful extensions are polished and incorporated into core • OpenGL 3.0 is great example of this process • Roughly 20 extensions folded into “core” • Just 3 of those previously unimplemented
  • 78. 78 29% 17% 15% 15% 4% 2% 2% 2% 2% 2% 2% 2% 1% 1% 4% 15% Multi-vendor Silicon Graphics Architectural Review Board NVIDIA ATI Apple Mesa3D Sun Microsystems OpenGL ES OpenML IBM Intense3D Hewlett Packard 3Dfx Other EXT SGI SGIS SGIX ARB NV Others Others OpenGL Extensions by Source • 44% of extensions are “core” or multi-vendor • Lots of vendors have initiated extensions • Extending OpenGL is industry-wide collaboration ATI APPLE MESA Source: http://www.opengl.org/registry (Dec 2008)
  • 79. 79 What’s Driving OpenGL Modernization? Human desire for Visual Intuition and Entertainment Embarrassing Parallelism of Graphics Increasing Semiconductor Density Particularly the hardware-amenable, latency tolerant nature of rasterization Particularly interactive video games
  • 80. 80 Kurt Akeley Principal Researcher Microsoft Research Silicon Valley OpenGL’s Evolution: A Personal Retrospective
  • 81. 81 AA personalpersonal retrospectiveretrospective • My background: • Silicon Graphics, 1982-2001 • OpenGL, 1990-2004 • Today’s topics: • Computer architecture • Culture and process • For a more complete coverage see: • https://graphics.stanford.edu/wikis/cs448-07-spring/ • Mark Kilgard’s excellent course notes
  • 82. 82 Jim Clark and the Geometry EngineJim Clark and the Geometry Engine • This text is 24 points – Sub bullets look like this The Geometry Engine: A VLSI Geometry System for Graphics Computer Graphics, Volume 16, Number 3 (Proceedings of SIGGRAPH 1982) p127-133, 1982
  • 83. 83 Jim’s helpers: the Stanford gangJim’s helpers: the Stanford gang IRIS GL Geometry Engine IRIS GL Hardware back-end Hardware front-end
  • 86. 86 What is computer architecture?What is computer architecture? • Architecture: “the minimal set of properties that determine what programs will run and what results they will produce” • Implementation: “the logical organization of the [computer’s] dataflow and controls” • Realization: “the physical structure embodying the implementation”
  • 87. 87 Example: the analog clockExample: the analog clock • Architecture • Circular dial divided into twelfths • Hour hand (short) and minute hand (long) Example from Computer Architecture, Concepts and Evolution, Gerrit A. Blaauw and Frederick P. Brooks, Jr., Addison-Wesley, 1997 • Implementation • A weight, driving a pendulum, or • A spring, driving a balance wheel, or • A battery, driving an oscillator, or …. • Realization • Gear ratios, pendulum lengths, battery sizes, ... 12 11 10 6 8 9 7 5 4 2 1 3
  • 88. 88 A useful distinctionA useful distinction • NVIDIA 8800 • SIMD, or • SPMD ? L2 FB SP SP L1 TF ThreadProcessor Vertex Thread Issue Setup / Rasterization / ZCull Primitive Thread Issue Fragment Thread Issue Data Assembler Application SP SP L1 TF SP SP L1 TF SP SP L1 TF SP SP L1 TF SP SP L1 TF SP SP L1 TF SP SP L1 TF L2 FB L2 FB L2 FB L2 FB L2 FB • Architecture: • SPMD • Implementation: • SIMD • Realization: • ASIC SIMD = Single Instruction, Multiple Data SPMD = Single Program, Multiple Data ASIC = Application Specific Integrated Circuit
  • 89. 89 The mainstream viewThe mainstream view • Table of Contents: • Fundamentals • Instruction Sets • Pipelining • Advanced Pipelining and ILP • Memory-Hierarchy Design • Storage Systems • Interconnection Networks • Multiprocessors
  • 90. 90 OpenGL is an architecture Blaauw/Brooks OpenGL Different implementations IBM 360 30/40/50/65/75 Amdahl SGI Indy/Indigo/InfiniteReality NVIDIA GeForce, ATI Radeon, … Compatibility Code runs equivalently on all implementations Top-level goal Conformance tests, … Intentional design It’s an architecture, whether it was planned or not . Carefully planned, though mistakes were made Configuration Can vary amount of resource (e.g., memory) No feature sub-setting Configuration attributes (e.g., framebuffer) Speed Not a formal aspect of architecture No performance queries Validity of inputs No undefined operation All errors specified No side effects Little undefined operation Enforcement When implementation errors are found, they are fixed. Specification rules!
  • 91. 91 But OpenGL is an APIBut OpenGL is an API (Application Programming Interface)(Application Programming Interface) • Yes, Blaauw and Brooks talk about (computer) architecture as though it is always expressed as ISA (Instruction-Set Architecture) • But … • API is just a higher-level programming interface • “Instruction-Set” Architecture implies other types of computer architectures (such as “API” Architecture) • OpenGL has evolved to include ISA-like interfaces (e.g., the interface below GLSL)
  • 92. 92 We didn’t know …We didn’t know … • No mention in spec (even 3.0) • “We view OpenGL as a state …” • First use in “ARB” • Architecture Review Board • Coined by Bill Glazier from “Palo Alto Architecture Review Board” • First formal usage (I know of) • Mark J. Kilgard, Realizing OpenGL: two implementations of one architecture, Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware, p.45-55, August 03-04, 1997, Los Angeles, California, United States.
  • 93. 93 Fred is magnanimousFred is magnanimous
  • 94. 94 What is implied by “programmable”?What is implied by “programmable”? • What does it mean to teach programming? • Does running a microwave oven count? • Does defining the geometry of a game “level” count? • Does specifying OpenGL modes count? • This seems to be a somewhat open question • Butler Lampson couldn’t tell me . • Microsoft developers of teaching tools couldn’t tell me. • An online search wasn’t very helpful. • Do we just “know it when we see it”? • Justice Potter Stewart’s definition of pornography
  • 95. 95 My try at some formalizationMy try at some formalization • Key ideas: • Composition  choice of placement, sequence • Non-obvious  semantics are interesting and novel • Imperative  maybe there are other kinds of programming “Composition, the organization of elemental operations into a non-obvious whole, is the essence of imperative programming.” -- Kurt Akeley (Foreword to GPU Gems 3)
  • 96. 96 OpenGL has always been programmableOpenGL has always been programmable • Follows directly from being an “architecture” • OpenGL commands are instructions (API as an ISA) • They can be “composed” to create programs • Multi-pass rendering is the prototypical example • But Peercy et al. implemented a RenderMan shader compiler • Invariance was specified from the start (e.g., same fragments) • We set out to enable “usage that we didn’t anticipate” • Obvious for a traditional ISA (e.g., IA32) • Not so obvious for a graphics API • Example: texture applies to all primitives, not just triangles
  • 97. 97 Example multi-pass OpenGL “program”Example multi-pass OpenGL “program” glEnable(GL_DEPTH_TEST); glDisable(GL_LIGHTING); glColorMask(false, false, false, false); glEnable(GL_POLYGON_OFFSET_FILL); glPolygonOffset(maxwidth/2, 1); draw solid objects glDepthMask(GL_FALSE); glColorMask(true, true, true, true); glColor3f(linecolor); glDisable(GL_POLYGON_OFFSET_FILL); glPolygonMode(GL_FRONT_AND_BACK, GL_LINE); draw solid objects again glDisable(GL_DEPTH_TEST); glPolygonMode(GL_FRONT_AND_BACK, GL_FILL); glDepthMask(GL_TRUE); Hidden-line rendering
  • 98. 98 Example multi-pass OpenGL “program”Example multi-pass OpenGL “program” glEnable(GL_DEPTH_TEST); glDisable(GL_LIGHTING); glColorMask(false, false, false, false); glEnable(GL_POLYGON_OFFSET_FILL); glPolygonOffset(maxwidth/2, 1); draw solid objects glDepthMask(GL_FALSE); glColorMask(true, true, true, true); glColor3f(1, 1, 1); glDisable(GL_POLYGON_OFFSET_FILL); glPolygonMode(GL_FRONT_AND_BACK, GL_LINE); glEnable(GL_CULL_FACE); glCullFace(GL_FRONT); draw solid objects again draw true edges // for a complete hidden-line drawing glDisable(GL_DEPTH_TEST); glPolygonMode(GL_FRONT_AND_BACK, GL_FILL); glDepthMask(GL_TRUE); glDisable(GL_CULL_FACE); Additions to the hidden-line algorithm (previous slide) highlighted in red Silhouette rendering
  • 99. 99 InvarianceInvariance Corollary 1 Fragment generation is invariant with respect to the state values marked with in Rule 2.
  • 100. 100 • Intended to capture complete sequence of operations • Also inspired design changes
  • 101. 101 Vertex assembly Primitive assembly Rasterization Fragment operations Display Vertex operations Application Primitive operations Framebuffer Texture memory Pixel assembly (unpack) Pixel operations Pixel pack Vertex pipelinePixel pipeline Application All primitives (including pixels) are rasterized All vertexes are treated equally (e.g., lighted) All fragments are treated equally (e.g., texture mapped and depth-buffered) Not a required implementation, but “abstraction distance” matters
  • 103. 103 Suppose …Suppose … http://www.opengl.org/registry/ Name ARB_texture_cube_map Name Strings GL_ARB_texture_cube_map Notice Copyright OpenGL Architectural Review Board, 1999. Contact Michael Gold, NVIDIA (gold 'at' nvidia.com) Status Complete. Approved by ARB on 12/8/1999 Version Last Modified Date: December 14, 1999 Number ARB Extension #7 Dependencies None. Written based on the wording of the OpenGL 1.2.1 specification but not dependent on it. Overview This extension provides a new texture generation scheme for cube map textures. Instead of the current texture providing a 1D, 2D, or 3D lookup into a 1D, 2D, or 3D texture image, the texture is a set of six 2D images representing the faces of a cube. The (s,t,r) texture coordinates …
  • 104. 104 Complete specificationComplete specification Name Name Strings Notice Contact Status Version Number Dependencies Overview Issues New Procedures and Functions New Tokens Additions to Chapter 2 of the OpenGL Specification Additions to Chapter 3 of the OpenGL Specification Additions to Chapter 4 of the OpenGL Specification Additions to Chapter 5 of the OpenGL Specification Additions to Chapter 6 of the OpenGL Specification Additions to the GLX Specification Errors New State (type, query mechanism, initial value, attribute set, specification section) Usage Examples
  • 105. 105 19 issues19 issues The spec just linearly interpolates the reflection vectors computed per-vertex across polygons. Is there a problem interpolating reflection vectors in this way? Probably. The better approach would be to interpolate the eye vector and normal vector over the polygon and perform the reflection vector computation on a per-fragment basis. Not doing so is likely to lead to artifacts because angular changes in the normal vector result in twice as large a change in the reflection vector as normal vector changes. The effect is likely to be reflections that become glancing reflections too fast over the surface of the polygon. Note that this is an issue for REFLECTION_MAP_ARB, but not NORMAL_MAP_ARB.
  • 106. 106 19 issues …19 issues … What happens if an (s,t,q) is passed to cube map generation that is close to (0,0,0), ie. a degenerate direction vector? RESOLUTION: Leave undefined what happens in this case (but may not lead to GL interruption or termination). Note that a vector close to (0,0,0) may be generated as a result of the per-fragment interpolation of (s,t,r) between vertices.
  • 107. 107 Trust and integrityTrust and integrity • Lots of collaboration during the initial design • But final decisions made by a small group • SGI played fair • OpenGL 1.0 didn’t favor SGI equipment (our ports were late ) • SGI obeyed all conformance rules • SGI didn’t adjust the spec to match our equipment • The ARB avoided marketing tasks such as benchmarks • We stuck with technical design issues • We documented rigorously • Specification, man pages, …
  • 108. 108 Five Kinkos in Austin TexasFive Kinkos in Austin Texas The OpenGL Graphics System: A Specification (Version 1.1) Mark Segal Kurt Akeley Editor: Chris Frazier Copyright © 1992-1997 Silicon Graphics, Inc. This document contains unpublished information of Silicon Graphics, Inc.
  • 109. 109 Extension factsExtension facts • 442 Vendor and “EXT” extension specifications • Vendor: specific to a single vendor • EXT: shared by two or more vendors • 56 “ARB” extensions • Standardized , likely to be in the next spec revision • Lots of text … Source: OpenGL extension registry, December 2008
  • 110. 110 ““Specification” sizesSpecification” sizes Lines Words Chars 56 ARB Extensions 48,674 263,908 2,221,347 All 442 Extensions 209,426 1,076,008 9,079,063 King James Bible 114,535 823,647 5,214,085 New Testament 27,319 188,430 1,197,812 Old Testament 86,783 632,515 3,998,303
  • 111. 111 Beyond the specificationBeyond the specification • The ARB (now replaced with Khronos) • Rules of order, secretary, IP, … • The extension process • Categories, token syntax, spec templates, enums, registry, … • Licensing • Conformance • …
  • 112. 112 SummarySummary • Many mistakes made (see other presentations for lists) • Created a sustainable culture that values quality and rigorous documentation • Defined and evolved the architecture for interactive 3-D computer graphics
  • 113. 113 Writing better OpenGL Mark Kilgard Principal System Software Engineer NVIDIA
  • 114. 114 Motivation • Complex APIs and systems have pitfalls • After 17 years of designed evolution, OpenGL certainly has its share • Normal documentation focus: • What can you do? • Rather than: What should you do?
  • 115. 115 Communicating Vertex Data • The way you learn OpenGL: • Immediate mode • glBegin, glColor3f, glVertex3f, glEnd • Straightforward—no ambiguity about vertex data is • All vertex components are function parameters • The problem—too function call intensive • And all vertex data must flow through CPU
  • 116. 116 Example Scenario • An OpenGL application has to render a set of rectangles • Rectangle with its parameters • x, y, height, width, left color, right color, depth (x,y) depth order 0.0 1.0 left side color right side color height width
  • 117. 117 Scene Representation • Each rectangle specified by following RectInfo structure: • Array of RectInfo structures describes “scene” • Simplistic scene for sake of teaching typedef struct { GLfloat x, y, width, height; GLfloat depth_order; GLfloat left_side_color[3]; // red, green, then blue GLfloat right_side_color[3]; // red, green, then blue } RectInfo;
  • 118. 118 Example Scene and Rendering Result • Scene of 4 rectangles: RectInfo rect_list[4] = { { 10, 20, 180, 140, 0.5, { 1, 1, 1 }, { 1, 0, 1 } }, { 30, 40, 100, 60, 0.5, { 1, 0, 0 }, { 0, 0, 1 } }, { 140, 60, 100, 80, 0.5, { 0, 0, 1 }, { 0, 1, 0 } }, { 70, 120, 80, 60, 0.7, { 1, 1, 0 }, { 0, 1, 1 } }, }; • OpenGL-rendered result
  • 119. 119 Immediate Mode Rectangle Rendering • Given sized RectInfo array, render vertices of quads 1st vertex 2nd vertex 3rd vertex 4th vertex void drawRectangles(int count, const RectInfo *list) { glBegin(GL_QUADS); for (int i=0; i<count; i++) { const RectInfo *r = &list[i]; glColor3fv(r->left_side_color); glVertex3f(r->x, r->y, r->depth_order); glColor3fv(r->right_side_color); glVertex3f(r->x+r->width, r->y, r->depth_order); // right_side_color “sticks” glVertex3f(r->x+r->width, r->y+r->height, r->depth_order); glColor3fv(r->left_side_color); glVertex3f(r->x, r->y+r->height, r->depth_order); } glEnd(); } For each rectangle
  • 120. 120 Critique of Immediate Mode • Advantages • Straightforward to code and debug • Easy-to-understand conceptual model • Building stream of vertices with OpenGL commands • Avoids driver & application copies of vertex data • Flexible, allowing totally dynamic vertex generation • Disadvantages • Rendering continuously streams attributes through CPU • Pollutes CPU cache with vertex data • Function call intensive • Unable to saturate fast graphics hardware • CPUs just too slow • Contrast with vertex array approach…
  • 121. 121 Vertex Array Approach • Step 1: Copy vertex attributes into vertex arrays • From: RectInfo array (CPU memory) • To: interleaved arrays of vertex attributes (CPU memory) • Step 2: To render • Configure OpenGL vertex array client state • Use glEnableClientState, glVertexPointer, glColorPointer • Render quads based on indices into vertex arrays • Use glDrawArrays
  • 122. 122 Vertex Array Format • Interleave vertex attributes in color & position arrays color position float = 4 bytes vertex 0 vertex 1 red green blue x y z red green blue x y z color position 24 bytes per vertex
  • 123. 123 Step 1: Copy Rectangle Attributes to Vertex Arrays void *initVarrayRectangles(int count, const RectInfo *list) { void *varray = (char*) malloc(sizeof(GLfloat)*6*4*count); GLfloat *p = varray; for (int i=0; i<count; i++, p+=24) { const RectInfo *r = &list[i]; // quad vertex #1 memcpy(&p[0], r->left_side_color, sizeof(GLfloat)*3); p[3] = r->x; p[4] = r->y; p[5] = r->depth_order; // quad vertex #2 memcpy(&p[6], r->right_side_color, sizeof(GLfloat)*3); p[9] = r->x+r->width; p[10] = r->y; p[11] = r->depth_order; // quad vertex #3 memcpy(&p[12], r->right_side_color, sizeof(GLfloat)*3); p[15] = r->x+r->width; p[16] = r->y+r->height; p[17] = r->depth_order; // quad vertex #4 memcpy(&p[18], r-> left_side_color, sizeof(GLfloat)*3); p[21] = r->x; p[22] = r->y+r->height; p[23] = r->depth_order; } return varray; }
  • 124. 124 Step 2: Configure & Render from Vertex Arrays void drawVarrayRectangles(int count, const RectInfo *list) { char *varray = initVarrayRectangles(count, list); const GLfloat *p = (const GLfloat*) varray; const GLsizei stride = sizeof(GLfloat)*6;//3 RGB floats,3 XYZ floats glColorPointer(/*rgb*/3, GL_FLOAT, stride, p+0); glVertexPointer(/*xyz*/3, GL_FLOAT, stride, p+3); glEnableClientState(GL_COLOR_ARRAY); glEnableClientState(GL_VERTEX_ARRAY); glDrawArrays(GL_QUADS, /*firstIndex*/0, /*indexCount*/count*4); free(varray); }
  • 125. 125 Critique of Simplistic Vertex Array Rendering • Advantages • Far fewer OpenGL commands issued • Disadvantages • Every render with drawVarrayRectangles calls initVarrayRectangles • Allocates, initializes, & frees vertex array memory every render • Improve by separating vertex array construction from rendering
  • 126. 126 Initialize Once, Render Many Approach • This routine expects base pointer returned by initVarrayRectangles void drawInitializedVarrayRectangles(int count, const void *varray) { const GLfloat *p = (const GLfloat*) varray; const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floats glColorPointer(/*rgb*/3, GL_FLOAT, stride, p+0); glVertexPointer(/*xyz*/3, GL_FLOAT, stride, p+3); // Assume GL_COLOR_ARRAY and GL_VERTEX_ARRAY are already enabled! glDrawArrays(GL_QUADS, /*firstIndex*/0, /*indexCount*/count*4); }
  • 127. 127 Client Memory Vertex Attribute Transfer GPU Processor command processor vertex puller hardware rendering pipeline CPU command queue CPU writes of command + vertex data GPU DMA transfer of command + vertex data application (client) memory vertex array vertex data travels through CPU memory reads CPU
  • 128. 128 Vertex Buffer Object Vertex Attribute Pulling OpenGL (vertex) buffer object GPU command processor vertex puller hardware rendering pipeline CPU command queue CPU writes of command + vertex indices vertex array GPU DMA transfer of command data application (client) memory memory reads CPU GPU DMA transfer of vertex data—CPU never reads data
  • 129. 129 Initializing Vertex Buffer Objects (VBOs) • Once using vertex arrays, easy to switch to VBOs • Make the vertex array as before • Then bind to buffer object and copy data to the buffer void initVarrayRectanglesInVBO(GLuint bufferName, int count, const RectInfo *list) { char *varray = initVarrayRectangles(count, list); const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floats const GLint numVertices = 4*count; const GLsizeiptr bufferSize = stride*numVertices; glBindBuffer(GL_ARRAY_BUFFER, bufferName); glBufferData(GL_ARRAY_BUFFER, bufferSize, varray, GL_STATIC_DRAW); free(varray); }
  • 130. 130 Rendering from Vertex Buffer Objects • Once initialized, glBindBuffer to bind to buffer ahead of vertex array configuration • Send offsets instead of points void drawVarrayRectanglesFromVBO(GLuint bufferName, int count) { const char *base = NULL; const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floats glBindBuffer(GL_ARRAY_BUFFER, bufferName); glColorPointer(/*rgb*/3, GL_FLOAT, stride, base+0*sizeof(GLfloat)); glVertexPointer(/*xyz*/3, GL_FLOAT, stride, base+3*sizeof(GLfloat)); // Assume GL_COLOR_ARRAY and GL_VERTEX_ARRAY are already enabled! glDrawArrays(GL_QUADS, /*firstIndex*/0, /*indexCount*/count*4); }
  • 131. 131 Understanding glBindBuffer • Buffer object bindings are frequent point of confusion for programmers • What does glBindBuffer do really? • Lots of buffer binding targets: • GL_ARRAY_BUFFER target—for vertex attribute arrays • Query with GL_ARRAY_BUFFER_BINDING • GL_ARRAY_ELEMENT_BUFFER target—for vertex indices, effectively topology • Query with GL_ELEMENT_ARRAY_BUFFER_BINDING • Each vertex array has its own buffer, query with • GL_VERTEX_ARRAY_BUFFER_BINDING • GL_COLOR_ARRAY_BUFFER_BINDING • GL_TEXCOORD_ARRAY_BUFFER_BINDING, etc.
  • 132. 132 Bind and Query Buffer Targets Buffer Bind Tokens • GL_ARRAY_BUFFER • GL_ELEMENT_ARRAY_BUFFER Buffer Query Tokens • GL_ARRAY_BUFFER_BINDING • GL_ELEMENT_ARRAY_BUFFER_BINDING • GL_COLOR_ARRAY_BUFFER_BINDING • GL_VERTEX_ARRAY_BUFFER_BINDING • GL_FOGCOORD_ARRAY_BUFFER_BINDING • GL_TEXCOORD_ARRAY_BUFFER_BINDING • GL_VERTEX_ATTRIB_ARRRAY_BUFFER_BINDING Target tokens for glBindBuffer Query tokens to glGetIntegerv Query tokens to glGetVertexAttribiv
  • 133. 133 Latched Vertex Array Buffer Bindings • Here’s the confusing part: glBindBuffer(GL_ARRAY_BUFFER, 34); glColorPointer(3, GL_FLOAT, color_stride, (void*)color_offset); • The glBindBuffer doesn’t change any vertex array binding • The GL_ARRAY_BUFFER_BINDING state that glBindBuffer sets does not itself affect rendering • It is the glColorPointer call that latches the array buffer binding to change the color array’s buffer binding! • Same with all vertex array buffer bindings
  • 134. 134 Binding Buffer Zero is Special • By default, vertex arrays don’t access buffer objects • Instead client memory is accessed • This is because • The initial buffer binding for a context is zero • And zero is special • Zero means access client memory • You can always resume client memory vertex array access for a given array like this glBindBuffer(GL_ARRAY_BUFFER, 0); // use client memory glColorPointer(3, GL_FLOAT, color_stride, color_pointer); • Different treatment of the “pointer” parameter to vertex array specification commands • When the current array buffer binding is zero, the pointer value is a client memory pointer • When the current array buffer binding is non-zero (meaning it names a buffer object), the pointer value is “recast” as an offset from the beginning of the buffer • Once again • The glBindBuffer(GL_ARRAY_BUFFER,0) call alone doesn’t change any vertex array buffer bindings • It takes a vertex array specification command such as glColorPointer to latch the zero ensures compatibility with pre-VBO OpenGL
  • 135. 135 Texture Coordinate Set Selector • A selector in OpenGL is • A state variable that controls what state a subsequent command updates • Examples of commands that modify selectors • glMatrixMode, glActiveTexture, glClientActiveTexture • A selector is different from latched state • Latched state is a specified value that is set (or “latched”) when a subsequent command is called • Pitfall warning: glTexCoordPointer both • Relies on the glClientActiveTexture command’s selector • And latches the current array buffer binding for the selected texture coordinate vertex array • Example glBindBuffer(GL_ARRAY_BUFFER, 34); glClientActiveTexture(GL_TEXTURE3); glTexCoordPointer(2, GL_FLOAT, uv_stride, (void*)buffer_offset); buffer value glTexCoordPointer latches selector glTexCoordPointer uses
  • 136. 136 OpenGL’s Modern Buffer-centric Processing Model Vertex Array Buffer Object (VaBO) Transform Feedback Buffer (XBO) Parameter Buffer (PaBO) Pixel Unpack Buffer (PuBO) Pixel Pack Buffer (PpBO)Bindable Uniform Buffer (BUB) Texture Buffer Object (TexBO) Vertex Puller Vertex Shading Geometry Shading Fragment Shading Texturing Array Element Buffer Object (VeBO) Pixel Pipeline vertex data texel data pixel data parameter data (not ARB functionality yet) glBegin, glDrawElements, etc. glDrawPixels, glTexImage2D, etc. glReadPixels, etc. Framebuffer
  • 137. 137 Usages of OpenGL Buffers Objects • Vertex uses (VBOs) • Input to GL: Vertex attribute buffer objects • Color, position, texture coordinate sets, etc. • Input to GL: Vertex element buffer objects • Indices • Output from GL: Transform feedback • Streaming vertex attributes out • Texture uses (TexBOs) • Texturing from: Texture buffer objects • Pixel uses (PBOs) • Output from GL: Pixel pack buffer objects • glReadPixels • Input from GL: Pixel unpack buffer objects • glDrawPixels, glBitmap, glTexImage2D, etc. • Shader uses (PaBOs, UBOs) • Input to assembly program: Parameter buffer objects • Input to GLSL program: Bind-able uniform buffer objects Key point: OpenGL buffers are containers for bytes; a buffer is not tied to any particular usage
  • 138. 138 Continuum of OpenGL Usage Tweak-able Performance Immediate mode Client vertex arrays Vertex buffer objects (VBOs) Display lists
  • 140. 140 Implementing OpenGL Mark Kilgard Principal System Software Engineer NVIDIA
  • 141. 141 Topics in OpenGL Implementation • Dual-core OpenGL driver operation • What goes into a texture fetch? • You give me some texture coordinates • I give you back a color • Could it be any simpler?
  • 142. 142 OpenGL Drivers for Multi-core CPUs • Today dual-core processors in PCs is nearly ubiquitous • 4, 6, 8, and more cores are clearly coming • How does OpenGL implementation exploit this trend? • Answer: develop dual-core OpenGL driver
  • 143. 143 Dual-core OpenGL Driver Architecture Application thread … Application thread D Context 1 Application thread A Application rendering thread App ICD ICD’s app thread (tokenize thread) Worker thread 1 (server thread) Application thread C Application audio thread (no OpenGL) Context 2 Application thread B Application rendering thread ICD’s app thread (tokenize thread) Worker thread 2 (server thread) Circular command FIFO Circular command FIFO
  • 144. 144 Dual-core Performance Results • A well-behaved OpenGL application benefiting from a dual-core mode of OpenGL driver operations 0 50 100 150 200 250 Single core Dual core Null driver Frames per second Mode of OpenGL driver operation
  • 145. 145 Good Dual-core Driver Practices • General advice • Display lists execute on the driver’s worker thread! • You want to avoid situations where the application thread must “sync” with the driver thread • Specific advice • Avoid OpenGL state queries • More on this later • Avoid querying OpenGL errors in production code • Bad behavior is detected automatically and leads to exit from the dual-core mode • Back to the standard single-core driver mode of operation • “Do no harm”
  • 146. 146 Consider an OpenGL texture fetch • Seems very simple • Input: texture coordinates (s,t,r,q) • Output: some color (r,g,b,a) • Just a simple function, written in Cg/HLSL: uniform sampler2D decal : TEXUNIT2; float4 texcoord : TEXCOORD3; float4 rgba = tex2D(decal, texcoordset.st); • Compiles to single instruction: TEX o[COLR], f[TEX3], TEX2, 2D; • Implementation is much more involved!
  • 147. 147 Anatomy of a Texture Fetch Filtered texel vector Texel Selection Texel Combination Texel offsets Texel data Texture images Combination parameters Texture coordinate vector Texture parameters
  • 148. 148 Texture Fetch Functionality (1) • Texture coordinate processing • Projective texturing (OpenGL 1.0) • Cube map face selection (OpenGL 1.3) • Texture array indexing (OpenGL 2.1) • Coordinate scale: normalization (ARB_texture_rectangle) • Level-of-detail (LOD) computation • Log of maximum texture coordinate partial derivative (OpenGL 1.0) • LOD clamping (OpenGL 1.2) • LOD bias (OpenGL 1.3) • Anisotropic scaling of partial derivatives (SGIX_texture_lod_bias) • Wrap modes • Repeat, clamp (OpenGL 1.0) • Clamp to edge (OpenGL 1.2), Clamp to border (OpenGL 1.3) • Mirrored repeat (OpenGL 1.4) • Fully generalized clamped mirror repeat (EXT_texture_mirror_clamp) • Wrap to adjacent cube map face • Region clamp & mirror (PlayStation 2)
  • 149. 149 Texture Fetch Functionality (2) • Filter modes • Minification / magnification transition (OpenGL 1.0) • Nearest, linear, mipmap (OpenGL 1.0) • 1D & 2D (OpenGL 1.0), 3D (OpenGL 1.2), 4D (SGIS_texture4D) • Anisotropic (EXT_texture_filter_anisotropic) • Fixed-weights: Quincunx, 3x3 Gaussian • Used for multi-sample resolves • Detail texture magnification (SGIS_detail_texture) • Sharpen texture magnification (SGIS_sharpen_texture) • 4x4 filter (SGIS_texture_filter4) • Sharp-edge texture magnification (E&S Harmony) • Floating-point texture filtering (ARB_texture_float, OpenGL 3.0)
  • 150. 150 Texture Fetch Functionality (3) • Texture formats • Uncompressed • Packing: RGBA8, RGB5A1, etc. (OpenGL 1.1) • Type: unsigned, signed (NV_texture_shader) • Normalized: fixed-point vs. integer (OpenGL 3.0) • Compressed • DXT compression formats (EXT_texture_compression_s3tc) • 4:2:2 video compression (various extensions) • 1- and 2-component compression (EXT_texture_compression_latc, OpenGL 3.0) • Other approaches: IDCT, VQ, differential encoding, normal maps, separable decompositions • Alternate encodings • RGB9 with 5-bit shared exponent (EXT_texture_shared_exponent) • Spherical harmonics • Sum of product decompositions
  • 151. 151 Texture Fetch Functionality (4) • Pre-filtering operations • Gamma correction (OpenGL 2.1) • Table: sRGB / arbitrary • Shadow map comparison (OpenGL 1.4) • Compare functions: LEQUAL, GREATER, etc. (OpenGL 1.5) • Needs “R” depth value per texel • Palette lookup (EXT_paletted_texture) • Thresh-holding • Color key • Generalized thresh-holding
  • 152. 152 Texture Fetch Functionality (5) • Optimizations • Level-of-detail weighting adjustments • Mid-maps (extra pre-filtered levels in-between existing levels) • Unconventional uses • Bitmap textures for fonts with large filters (Direct3D 10) • Rip-mapping • Non-uniform texture border color • Clip-mapping (SGIX_clipmap) • Multi-texel borders • Silhouette maps (Pardeep Sen’s work) • Shadow mapping • Sharp piecewise linear magnification
  • 153. 153 Phased Data Flow • Must hide long memory read latency between Selection and Combination phases Texel Selection Texel Combination Texel offsets Texel data Texture images Combination parameters Texture coordinate vector Texture parameters Memory reads for samples FIFOing of combination parameters
  • 154. 154 What really happens? • Let’s consider a simple tri-linear mip-mapped 2D projective texture fetch • Logically just one instruction TXP o[COLR], f[TEX3], TEX2, 2D; • Logically • Texel selection • Texel combination • How many operations are involved?
  • 155. 155 Medium-Level Dissection of a Texture Fetch Convert texel coords to texel offsets integer / fixed-point texel combination texel offsets texel data texture images combination parameters interpolated texture coords vector texture parameters Convert texture coords to texel coords filtered texel vector texel coords floor / frac integer coords & fractional weights floating-point scaling and combination integer / fixed-point texel intermediates
  • 156. 156 Interpolation • First we need to interpolate (s,t,r,q) • This is the f[TEX3] part of the TXP instruction • Projective texturing means we want (s/q, t/q) • And possible r/q if shadow mapping • In order to correct for perspective, hardware actually interpolates • (s/w, t/w, r/w, q/w) • If not projective texturing, could linearly interpolate inverse w (or 1/w) • Then compute its reciprocal to get w • Since 1/(1/w) equals w • Then multiply (s/w,t/w,r/w,q/w) times w • To get (s,t,r,q) • If projective texturing, we can instead • Compute reciprocal of q/w to get w/q • Then multiple (s/w,t/w,r/w) by w/q to get (s/q, t/q, r/q) Observe projective texturing is same cost as perspective correction
  • 157. 157 Interpolation Operations • Ax + By + C per scalar linear interpolation • 2 MADs • One reciprocal to invert q/w for projective texturing • Or one reciprocal to invert 1/w for perspective texturing • Then 1 MUL per component for s/w * w/q • Or s/w * w • For (s,t) means • 4 MADs, 2 MULs, & 1 RCP • (s,t,r) requires 6 MADs, 3 MULs, & 1 RCP • All floating-point operations
  • 158. 158 Texture Space Mapping • Have interpolated & projected coordinates • Now need to determine what texels to fetch • Multiple (s,t) by (width,height) of texture base level • Could convert (s,t) to fixed-point first • Or do math in floating-point • Say based texture is 256x256 so • So compute (s*256, t*256)=(u,v)
  • 159. 159 Mipmap Level-of-detail Selection • Tri-linear mip-mapping means compute appropriate mipmap level • Hardware rasterizes in 2x2 pixel entities • Typically called quad-pixels or just quad • Finite difference with neighbors to get change in u and v with respect to window space • Approximation to ∂u/∂x, ∂u/∂y, ∂v/∂x, ∂v/∂y • Means 4 subtractions per quad (1 per pixel) • Now compute approximation to gradient length • p = max(sqrt((∂u/∂x)2 +(∂u/∂y)2 ), sqrt((∂v/∂x)2 +(∂v/∂y)2 )) one-pixel separation
  • 160. 160 Level-of-detail Bias and Clamping • Convert p length to power-of-two level-of-detail and apply LOD bias • λ = log2(p) + lodBias • Now clamp λ to valid LOD range • λ’ = max(minLOD, min(maxLOD, λ))
  • 161. 161 Determine Mipmap Levels and Level Filtering Weight • Determine lower and upper mipmap levels • b = floor(λ’)) is bottom mipmap level • t = floor(λ’+1) is top mipmap level • Determine filter weight between levels • w = frac(λ’) is filter weight
  • 162. 162 Determine Texture Sample Point • Get (u,v) for selected top and bottom mipmap levels • Consider a level l which could be either level t or b • With (u,v) locations (ul,vl) • Perform GL_CLAMP_TO_EDGE wrap modes • uw = max(1/2*widthOfLevel(l), min(1-1/2*widthOfLevel(l), u)) • vw = max(1/2*heightOfLevel(l), min(1-1/2*heightOfLevel(l), v)) • Get integer location (i,j) within each level • (i,j) = ( floor(uw* widthOfLevel(l)), floor(vw* ) ) border edge s t
  • 163. 163 Determine Texel Locations • Bilinear sample needs 4 texel locations • (i0,j0), (i0,j1), (i1,j0), (i1,j1) • With integer texel coordinates • i0 = floor(i-1/2) • i1 = floor(i+1/2) • j0 = floor(j-1/2) • j1 = floor(j+1/2) • Also compute fractional weights for bilinear filtering • a = frac(i-1/2) • b = frac(j-1/2)
  • 164. 164 Determine Texel Addresses • Assuming a texture level image’s base pointer, compute a texel address of each texel to fetch • Assume bytesPerTexel = 4 bytes for RGBA8 texture • Example • addr00 = baseOfLevel(l) + bytesPerTexel*(i0+j0*widthOfLevel(l)) • addr01 = baseOfLevel(l) + bytesPerTexel*(i0+j1*widthOfLevel(l)) • addr10 = baseOfLevel(l) + bytesPerTexel*(i1+j0*widthOfLevel(l)) • addr11 = baseOfLevel(l) + bytesPerTexel*(i1+j1*widthOfLevel(l)) • More complicated address schemes are needed for good texture locality!
  • 165. 165 Initiate Texture Reads • Initiate texture memory reads at the 8 texel addresses • addr00, addr01, addr10, addr11 for the upper level • addr00, addr01, addr10, addr11 for the lower level • Queue the weights a, b, and w • Latency FIFO in hardware makes these weights available when texture reads complete
  • 166. 166 Phased Data Flow • Must hide long memory read latency between Selection and Combination phases Texel Selection Texel Combination Texel offsets Texel data Texture images Combination parameters Texture coordinate vector Texture parameters Memory reads for samples FIFOing of combination parameters
  • 167. 167 Texel Combination • When texels reads are returned, begin filtering • Assume results are • Top texels: t00, t01, t10, t11 • Bottom texels: b00, b01, b10, b11 • Per-component filtering math is tri-linear filter • RGBA8 is four components • result = (1-a)*(1-b)*(1-w)*b00 + (1-a)*b*(1-w)*b*b01 + a*(1-b)*(1-w)*b10 + a*b*(1-w)*b11 + (1-a)*(1-b)*w*t00 + (1-a)*b*w*t01 + a*(1-b)*w*t10 + a*b*w*t11; • 24 MADs per component, or 96 for RGBA • Lerp-tree could do 14 MADs per component, or 56 for RGBA
  • 168. 168 Total Texture Fetch Operations • Interpolation • 6 MADs, 3 MULs, & 1 RCP (floating-point) • Texel selection • Texture space mapping • 2 MULs (fixed-point) • LOD determination (floating-point) • 1 pixel difference, 2 SQRTs, 4 MULs, 1 LOG2 • LOD bias and clamping (fixed-point) • 1 ADD, 1 MIN, 1 MAX • Level determination and level weighting (fixed-point) • 1 FLOOR, 1 ADD, 1 FRAC • Texture sample point • 4 MAXs, 4 MINs, 2 FLOORs (fixed-point) • Texel locations and bi-linear weights • 8 FLOORs, 4 FRACs, 8 ADDs (fixed-point) • Addressing • 16 integer MADs (integer) • Texel combination • 56 fixed-point MADs (fixed-point)
  • 169. 169 Observations about the Texture Fetch • Lots of ways to implement the math • Lots of clever ways to be efficient • Lots more texture operations not considered in this analysis • Compression • Anisotropic filtering • sRGB • Shadow mapping • Arguably TEX instructions are “world’s most CISC instructions” • Texture fetches are incredibly complex instructions • Good deal of GPU’s superiority at graphics operations over CPUs is attributable to TEX instruction efficiency • Good for compute too
  • 170. 170 OpenGL’s Future Evolution Mark Kilgard Principal System Software Engineer NVIDIA
  • 171. 171 What drives OpenGL’s future? • GPU graphics functionality • Tessellation & geometry amplification • Ratio of GPU to single-core CPU performance • Compatibility • Direct3Disms • OpenGLisms • Deprecation • Compute support • OpenCL, CUDA, Stream processing • Unconventional graphics devices
  • 172. 172 Better Graphics Functionality • Expect more graphics performance • Easy prediction • Rasterization nowhere near peaked • Ray tracing fans—GPUs make rays and triangles faster – Market still values triangles more than rays • Expect more generalized graphics functionality • Trend for texture enhancements likely to continue
  • 173. 173 Geometry Amplification • Tessellation • Programmable hardware support coming • True market demand probably not tessellation per se • Games want visual richness • Texture and shading have created much richness – Often “pixel richness” as substitute for geometry richness • Increasingly “visual richness” means geometric complexity • Geometry Amplification may be better term • Tessellation is one way to improve tessellation – Recognize the limits of bi-variate patches for representing geometry
  • 174. 174 Programmable Tessellation • Stunning real-time geometric detail + animation possible • Programmable tessellation + vertex textured displacements
  • 175. 175 Continuous Level-of-detail for Tessellation Increasing tessellation level-of-detail • Same patch mesh for all 3 scenes
  • 176. 176 Adaptive Programmable Tessellation Programmable level-of-detail determination allows more tessellation along silhouette edges
  • 177. 177 Limits of Patch Tessellation • What games tend to want • Here’s 8 vertices (bounding box), go draw a fire truck • Here’s a few vertices, go draw a tree
  • 178. 178 Tessellation Not New to OpenGL • At least three different bi-variate patch tessellation schemes have been added to OpenGL • Evaluators (OpenGL 1.0) • NV_evaluators (GeForce 3) • water-tight • adaptive level-of-detail • forward differencing approach • ATI_pn_triangles Curved PN Triangles (Radeon) • tessellated triangle based on positions+normals • None succeeded • Hard to integrate into art pipelines • Didn’t offer enough performance advantage GLUT’s wire-frame teapot [Moreton 20001] [Vlachos 20001]
  • 179. 179 Ratio of CPU core-to-GPU Performance • Well known computer architecture trends now • Single-threaded CPU performance trends are stalled • Multi-core is CPU designer response • GPU performance continues on-trend • What does this mean for graphics API design? • CPUs must generate more visually rich API command streams to saturate GPUs • Can’t just send more commands faster • Single-threaded CPUs can only do so much • So must send more powerful commands
  • 180. 180 Déjà vu • We’ve been here before • Early 1980s: Graphics terminals used to be connected to minicomputers by slow speed interconnects • CPUs themselves far too slow for real-time rendering • Resulting rendering model • Download scene database to graphics terminal • Adjust viewing and modeling parameters • Send “redraw scene” command
  • 181. 181 What Happened • Such “scene processor” hardware not very flexible • Difficult to animate anything beyond rigid dynamics • Eventually SGI and others matched CPUs and interconnects to graphics performance • Result was IRIS GL’s immediate mode • CPU fast enough to send geometry every frame • OpenGL took this model • Over time added vertex arrays, vertex buffers, texturing, programmable shading, and more performance • CPU performance became limiter still • Better graphics driver tuning helped • Dual-core drivers help some more
  • 182. 182 OpenGL’s Most Powerful Command • Available since OpenGL 1.0 • Can render essentially anything OpenGL can render! • Takes just one parameter • The command glCallList(GLuint displayListName); • Power of display lists comes from • Playing back arbitrary compiled commands • Allowing for hierarchical calling of display list • A display list can contain glCallList or glCallLists • Ability of application to re-define display lists • No editing, but can be re-defined
  • 183. 183 Enhanced Display Lists • OpenGL 1.0 display lists are too inflexible • Pixel & vertex data “compiled into” display lists • Binding objects always “by name” • Rather than “by reference • These problems can be fixed • Modern OpenGL supports buffers for transferring vertices and pixels • Compile commands into display lists that defer vertex and pixel transfers until execute-time – Rather than compile-time • Allow objects (textures, buffers, programs) to be bound “by reference” or “by name”
  • 184. 184 Other Display List Enhancements • Conditional display list execution • Relaxed vertex index and command order • Parallel construction of display lists by multiple threads General insight: Easier for driver to optimize application’s graphics command stream if it gets to 1) see the repetition in the command stream clearly 2) take time to analyze and optimize usage
  • 185. 185 Conditional Display List Execution • Today’s occlusion query • Application must “query” to learn occlusion result • Latency too great to respond • Application can use OpenGL 3.0’s conditional render capability • But just skips vertex pulling, not state changes • Conditional display list execution • Allow a glCallList to depend on the occlusion result from an occlusion query object • Allows in-band occlusion querying • Skip both vertex pulling and state changes
  • 186. 186 Relaxed Vertex Index and Command Order • OpenGL today always executes commands “in order” • Sequentially requirement • Provide compile-time specification of re-ordering allowances • Allows GL implementation to re-order • Vertex indices within display list’s vertex batch • Commands within display list • Key rule: state vector rendering command executes in must match the state if command was rendered sequentially • Allow static or dynamic re-ordering • Static re-ordering needed for multi-pass invariances • Past practice • IRIS Performer would sort rendering by state changes for performance • [Sander 2007] show substantial benefit for vertex ordering
  • 187. 187 Parallel Display List Construction • Today’s model • Single thread makes all OpenGL rendering calls • Minimizes GPU context switch overhead • Ties command generation rate to single core’s CPU performance • Enhanced display list model • Multiple threads can build display lists in parallel • Single thread still executes display lists • Countable semaphore objects used to synchronize hand-off of display lists built by other threads with main rendering thread
  • 188. 188 Rethinking Display Lists • Display lists have been proposed for deprecation • Right as we really need them! • Much more interesting to enhance display lists • Dual-core driver already off-loads display list traversal to driver’s thread • Multi-core driver could scan frequently executed display lists to optimize their order and error processing • Includes adding pre-fetching to avoid stalling CPU on cache misses for object accesses
  • 189. 189 Direct3Disms • Developing a shader-rich game title costs $$$ • For top titles, often US$ 5,000,000+ • Investment typically amortized over multiple platforms • Consoles are primary target, then PCs • PC version typically developed for Direct3D • Reality: OpenGL is often 3rd or worse priority • API differences = porting & performance pitfalls • Stops or slows Direct3D-developed 3D content from working easily on OpenGL platforms
  • 190. 190 Supporting Direct3D: Not New • OpenGL has always supported multiple formats well • OpenGL’s plethora of pixel and vertex formats • Very first OpenGL extension: EXT_bgra • Provides a pixel component ordering to match the color component ordering of Windows for 2D GDI rendering • Made core functionality by OpenGL 1.3 • Many OpenGL extensions have embraced Direct3Disms • Secondary color • Fog coordinate • Point sprites
  • 191. 191 Direct3D vs. OpenGL Coordinate System Conventions • Window origin conventions • Direct3D = upper-left origin • OpenGL = lower-left origin • Pixel center conventions • Direct3D9 = pixel centers at integer locations • OpenGL (and Direct3D 10) = pixel centers at half-pixel locations • Clip space conventions • Direct3D = [-1,+1] for XY, [0,1] for Z • OpenGL = [-1,+1] range for XYZ • Affects • How projection matrix is loaded • Fragment shaders that access the window position • Point sprites have upper-left texture coordinate origin • OpenGL already lets application choose lower-left or upper-left
  • 192. 192 Direct3D vs. OpenGL Provoking Vertex Conventions • Direct3D uses “first” vertex of a triangle or line to determine which color is used for flat shading • OpenGL uses “last” vertex for lines, triangles, and quads • Except for polygons (GL_POLYGON) mode that use the first vertex Direct3D 9 pDev->SetRenderState( D3DRS_SHADEMODE, D3DSHADE_FLAT); OpenGL glShadeModel(GL_FLAT); Input triangle strip with per-vertex colors
  • 193. 193 BGRA Vertex Array Order • Direct3D 9’s most common usage for sending per-vertex colors is 32-bit D3DCOLOR data type: • Red in bits 16:23 • Green in bits 8:15 • Blue in bits 0:7 • Alpha in bits 24:31 • Laid in memory, looks like BGRA order • OpenGL assumes RGBA order for all vertex arrays • Direct3Dism EXT_vertex_array_bgra extension allows: glColorPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer); glSecondaryColorPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer); glVertexAttribPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer); 8-bit red 8-bit alpha 8-bit green 8-bit blue bit 31 bit 0
  • 194. 194 OpenGLisms • Things about OpenGL’s operation that make it hard for non-OpenGL applications to port to OpenGL • Examples • Selectors • Linked GLSL program objects
  • 195. 195 Eliminating Selectors from OpenGL • OpenGL has lots of selectors • Selectors set state that indicates what state subsequent commands will update • Already mentioned selectors: glClientActiveTexture • Other examples: glActiveTexture, glMatrixMode, glBindTexture, glBindBuffer, glUseProgram, glBindProgramARB • OpenGL is full of selectors – Partly OpenGL’s extensibility strategy – Partly because objects are bound into context » Bind-to-edit objects » Rather than edit-by-name • Direct State Access extension: EXT_direct_state_access • Provides complete selector-free additional API for OpenGL • Shipping in NVIDIA’s 180.43 drivers
  • 196. 196 Reasons to Eliminate Selectors • Direct3D has an “edit-by-name” model of operation • Means Direct3D has no selectors • Having to manage selectors when porting Direct3D or console code to OpenGL is awkward • Requires deferring updates to minimize selector and object bind changes • Layered libraries can’t count of selector state • To be safe when updating sate controlled by selectors, such libraries must use idiom • Save selector, Set selector, Update state, Restore selector • Bad for performance, particularly bad for dual-core drivers since queries are expensive
  • 197. 197 GLSL Program Object Linking • GLSL requires shader objects from different domains (vertex, geometry, fragment) to be linked into single GLSL program object • Means you can’t mix-and-match shaders easily • Other APIs don’t have this limitation • Direct3D • Prior OpenGL assembly language extensions • Consoles • Have a “separate shader objects” extension could fix this problem
  • 198. 198 Separate Shader Objects Example • Combining different GLSL shaders at once Specular brick bump mapping Red diffuse Wobbly torus Smooth torus Different GLSL vertex shaders Different GLSL fragment shaders
  • 199. 199 Deprecation • Part of OpenGL 3.0 is a marking of features for deprecation • LOTS of functionality is marked for deprecation • I contend no real application today uses the non-deprecated subset of OpenGL—all apps would have to change due to deprecation • Some vendors believe getting rid of features will make OpenGL better in some way • NVIDIA does not believe in abandoning API compatibility this way • OpenGL is part of a large ecosystem so removing features this way undermines the substantial investment partners have made in OpenGL over years • API compatibility and stability is one of OpenGL’s great strengths
  • 200. 200 Synergy between OpenGL and OpenCL • Complimentary capabilities • OpenGL 3.0 = state-of-the-art, cross-platform graphics • OpenCL 1.0 = state-of-the-art, cross-platform compute • Computation & Graphics should work together • Most natural way to intuit compute results is with graphics • When Compute is done on a GPU, there’s no need to “copy” the data to see it visualized • Appendix B of OpenCL specification • Details with sharing objects between OpenGL and OpenCL • Called “GL” and “CL” from here on…
  • 201. 201 Four Kinds of Shared Objects OpenCL 3D image object cl_mem OpenGL renderbuffer object GLuint renderbuffer OpenGL buffer object GLuint bufferobj OpenCL buffer object cl_mem OpenGL texture 2D object GLenum target GLuint texture GLint miplevel OpenGL texture 3D object GLenum target GLuint texture GLint OpenCL 2D image object cl_mem 2D image object cl_mem clCreateFromGLBuffer clCreateFromGLTexture2D clCreateFromGLTexture3D clCreateFromGLRenderbuffer OpenGL OpenCL
  • 202. 202 OpenGL / OpenCL Sharing • Requirements for GL object sharing with CL • CL context must be created with an OpenGL context • Each platform-specific API will provide its appropriate way to create an OpenGL-compatible CL context • For WGL (Windows), CGL (OS X), GLX (X11/Linux), EGL (OpenGL ES), etc. • Creating cl_mem for GL Objects does two things 1.Ensures CL has a reference to the GL objects 2.Provides cl_mem handle to acquire GL object for CL’s use • clRetainMemObject & clReleaseMemObject can create counted references to cl_mem objects
  • 203. 203 Acquiring GL Objects for Compute Access • Still must “enqueue acquire” GL objects for compute kernels to use them • Otherwise reading or writing GL objects with CL is undefined • Enqueue acquire and release provide sequential consistency with GL command processing • Enqueue commands for GL objects • clEnqueueAcquireGLObjects • Takes list of cl_mem objects for GL objects & list of cl_events that must complete before acquire • Returns a cl_event for this acquire operation • clEnqueueReleaseGLObjects • Takes list of cl_mem objects for GL objects & list of cl_events that must complete before release • Returns a cl_event for this release operation
  • 204. 204 Unconventional OpenGL Deployments • Workstation PCs—Quadro • Consumer PCs—GeForce • High-end Visualization—QuadroPlex Visual Computing Solution (VCS) • Embedded Applications • Handheld Devices • Game Consoles Conventional PC OpenGL Products Unconventional
  • 205. 205 OpenGL in Context A facilitated conversation with Dr. Marc Levoy, Stanford University

Notas del editor

  1. An exciting SIGGRAPH for me
  2. Didn’t continue to succeed, though. One of my sorrows is that OpenGL didn’t seem to contribute to success for SGI
  3. Not a required “implementation”, just a concise way to specify the architecture (like ISA registers) Directly inspired changes to the specification (especially to pixel operations, e.g., depth buffer of)
  4. Not a required “implementation”, just a concise way to specify the architecture (like ISA registers) Directly inspired changes to the specification (especially to pixel operations, e.g., depth buffer of)