3. Non Functional
Requirements
• Maintainability
• Extensibility
• Security
• Scalability
• Intellectual Manageability
• Availability
• Portability
• Usability
• Performance
Fanafzar Game Studio
4. Performance
The amount of work accomplished by a
computer system compared to the time and
resources used.
• Short response time
• High throughput
• Low utilization of computer resources
• High availability of applications
• Fast data compression and decompression
• High bandwidth/ Short data transmission time
Fanafzar Game Studio
5. Video Games
• Most x-abilities are important
– Even more so for game engines. (As in
enterprise applications)
• Performance is REALLY important!
– For any game or game engine.
Fanafzar Game Studio
6. System Design
• Solution for Functional Requirements
• Solution for Non-Functional Requirements
– Bulk of the technical efforts
– Conflicts in Design!
– Performance as the bad boy in the group
– Performance as the cream of the crop
– Performance being directly experienced by
end user
Fanafzar Game Studio
8. Optimization
• “The process of modifying a software
system to make some aspects of it work
more efficiently or use fewer resources.”
Fanafzar Game Studio
10. Levels of Optimization
• System Level
• Algorithmic Level
• Micro Level
– Branch prediction
– Instruction throughput
– Latency
Fanafzar Game Studio
11. Project Lifecycle and
Optimization
• Pre-production
• Production
• Post-production
Optimization from High Level to Low Level
Quake Story: High level architectural
optimization before low level triangle draw
function (Carmack and Abrash)
http://www.bluesnews.com/abrash/
Fanafzar Game Studio
12. Measuring Performance in
Games
1. Set Specification
1. Performance Goal (FPS, time)
2. Hardware Specification
2. Define Line Items
1. CPU time, RAM, GPU time, Video Mem
2. Rendering, Physics, Sound, Gameplay, Misc.
Fanafzar Game Studio
13. Memory Management (God
of War)
32 Meg
memory
16 Meg for Levels, split into 24*1 Meg
Enemies
1.5
Meg
Exe
Run
Time
Data
Perm
Data
• Establish Hard Rules.
– 16 Meg for Level Data (Split into 2 Levels)
– 4 * 1 Meg for Enemies
• Maintain 60fps
From: Tim Moss 2006 GDC Talk
Fanafzar Game Studio
14. Tools
• Profilers (Intel VTune, VS Profiler, …)
– Total time
– Self time
– Calls
• System Monitors (Nvidia PerfHud, MS PIX,…)
• System Adjusters (Intel GPA, …)
Fanafzar Game Studio
19. Data Access Patterns
• Linear Access Forward
for (i = 0; i < numData; ++i)
memArray[i];
• Linear Access Backward
Fanafzar Game Studio
20. Data Access Patterns Ctd.
• Periodic Access
struct vertex
{
float pos[3];
float norm[3];
float textCoord[3];
}
for (i = 0; i < num; ++i)
vertexArray[i].pos
• Random Access
Fanafzar Game Studio
23. Strip Mining
for
{ access pos;
}
for
{
access norm;
}
------------------------------------------------------
for
{
access pos;
access norm;
}
Fanafzar Game Studio
24. Memory
• Stack
– Temporal coherence, spatial locality
• Global
– No fragmentation, freed at end
• Heap
– new, delete, malloc, free
– No spatial locality, no temporal coherence,
fragmentation
Fanafzar Game Studio
25. Load-Hit-Store
• Write data to address x and then read the
data from address x -> Large stall
• Writing data all the way to the main
memory through all caches -> 40 to 80
CPU cycle delay
• http://assemblyrequired.crashworks.org/
2008/07/08/load-hit-stores-and-the-
__restrict-keyword/
Fanafzar Game Studio
27. Memory Solutions
• Don’t allocate
• Linearize allocations
– Use arrays
• Memory pools
– Coherent
– No fragmentation
– No construction/destruction
• Don’t construct or destruct
– Plain Old Structures (POS)
Fanafzar Game Studio
28. Memory Solutions
• Time scoped pools
– Frame allocator
– Pool for one level content, discarded at the
end
Fanafzar Game Studio
29. Memory Manager
“If you don’t have a custom memory
manager in your game, you’re a fool (or a
PC game developer)”
Christer Ericson, Director of Tools and
Technology, Sony Santa Monica
Fanafzar Game Studio
30. Memory Related Solutions
• Reducing memory footprint at compile time and
runtime
• Algorithms that reduce memory fetching
• Reduce cache miss
– Spatial Locality
– Proper Stride
– Correct Alignment
• Increase Temporal Coherence
• Utilize Pre-fetching
• Avoid worst-case access patterns that break
caching
Fanafzar Game Studio
31. Pitfalls of Object Oriented
Programming
Summary of study (Tony Albrecht, 2009)
• Case study for CPU side rendering code
• Just re-organizing data locations was a win
• + pre-fetching is more win
• Can you decouple data from objects?
• Be aware of what the compiler and hardware
are doing, watch the generated assembly!
Fanafzar Game Studio
32. Pitfalls of OOP
• Optimize for data first, then code
– Memory access is going to be your biggest
bottleneck
• Simplify Systems
– KISS
– Easier to optimize, Easier to parallelize
• Keep code and data homogeneous
• Not everything needs to be an object
Fanafzar Game Studio
33. Pitfalls of OOP
• You are writing a game
– You have control over the input data
– Don’t be afraid to pre-format it if needed
• Design for specifics, not generics
Fanafzar Game Studio
34. Data Oriented Design
• Better performance
• Better realization of code optimization
• Often simpler code
• More parallelizable code
Fanafzar Game Studio
35. CPU Bound: Compute
• Lots of arithmetic operations not load and
store
Fanafzar Game Studio
36. CPU Compute: Solutions
• Compiler flags (float: precise/fast)
• Time against Space
– Use of lookup tables
• Memoization
• Function Inlining
• Branch prediction, out of order execution
– Branch mis-prediction is much less costly than
cache miss
• Make branches more predictable
Fanafzar Game Studio
37. CPU Computer: Solutions
• Remove Branches
– If (a) z=c; else z=d;
– Z = a * c + (1 – a) * d
• Profile Guided Optimization
• Loop unrolling
Fanafzar Game Studio
38. Loop Unrolling
for (i = 0; i < 100; ++i)
sum += intArray[i];
------------------------------------------------------
for (i = 0; i < 100; i+=4)
{
sum1 += intArray[i];
sum2 += intArray[i+1];
sum3 += intArray[i+2];
sum4 += intArray[i+3];
}
sum = sum1+sum2+sum3+sum4;
Fanafzar Game Studio
39. Virtual Functions
• How slow are virtual functions really?
http://assemblyrequired.crashworks.org/2009/01/19/how-slow-are-
virtual-functions-really/
• 1000 iterations over 1024 vectors
• 12,288,000 function calls
• Virtual: 159.856 ms
• Direct: 67.962
• Inline: 8.040 ms
Fanafzar Game Studio
40. Slow Virtual Functions
• Problem is not the cost of looking up the
indirect function pointer from vtable.
• The issue lies in “branch prediction” and
the way marshalling parameters for the
calling convention can get in the way of
good instruction scheduling.
Fanafzar Game Studio
41. Micro Optimization
• Bit Tricks
– Bitwise Swap
• X^=Y; Y^=X; X^=Y;
– Bitmasks
• isFlagSet = someInt & MY_FLAG, someInt |= Flag2;
• Example use: Collisions in Physics
– Fast Modulo
• X%Y = X & (Y -1) iff Y is a power of 2
– Even and Odd
• (X & 1) == 0; // same as X%2==0
Fanafzar Game Studio
42. Book on Bit Tricks
• Hacker’s Delight (Henry S. Warren,
Addison Wesley, 2003)
Fanafzar Game Studio
43. Other Micro Optimization
• Data type conversion
• SSE Instructions
• Removing loop invariant code
• Loop unrolling
• Cross-.obj optimization
– Whole program optimization
• Hardware Specific Optimizations
Fanafzar Game Studio
44. Vector vs. List
• Random data insertion and deletion into a
c++ vector and list compared
• Data kept sorted in the containers
Fanafzar Game Studio
47. STL iterator debugging
STL Iterator Debugging and Secure SCL
http://channel9.msdn.com/Shows/Going
+Deep/STL-Iterator-Debugging-and-Secure-
SCL
Fanafzar Game Studio
48. Copy vs. Move
• Vector of strings with 4 dimensions
• 100 x 100 x 100 x 500
• Construction: 564 ms
• Copy Construction: 537 ms
• Move Construction: 0.001 ms
• Empty Destruction: 0.001 ms
• Destruction: 285 ms
Fanafzar Game Studio
49. GPU Bound
• GPU related issues
– Synchronization
– Capabilities Management
– Resource Management
– Global Ordering
• Reflections/Shadows before scene
• Opaque front to back/Translucent back to front
• Sort by material or texture to reduce state changes
– Instrumentation
– Debugging
Fanafzar Game Studio
50. GPU Optimization Tricks
• State Changes
• Draw Call (Most common issue)
• Instancing and Batching
– Shader Instancing
– Hardware Instancing
• Video RAM
– Device Resets
– Resource uploads/locks
• Minimize Copies
• Minimize Locks
• Double Buffer
Fanafzar Game Studio
51. GPU Optimization Ctd.
• Fragmentation
– Power of 2 allocations help
• Lock culling
– Debug visualization for those culled
• Texture debugging
– Different texture for each mip level
Fanafzar Game Studio
52. GPU Bound?
• Spend a long time in API calls (Draw calls
or swap/present frame buffer)
• Front End / Back End
– Triangles/Geometry – Pixels/Shaders
– Vary each workload and measure
performance
Fanafzar Game Studio
53. Back End
• Fill Rate (ex. 1000 MP/sec)
– FPS, Overdraw, resolution
– Fill Rate / FPS = overdraw * resolution
– Render Target Format (16 / 32 bit)
– Blending
• Transparency instead of translucency
– Shading
• Pixel shaders
– Texture Sampling
• Format, Filter Mode, Count (DXT1)
Fanafzar Game Studio
54. Front End
• Bottlenecks
– Vertex Transformation
• Lighting calculations, skinning, …
– Vertex Fetching and caching
• Vertex format, indexes (16/32 bit)
– Tessellation
Fanafzar Game Studio
55. Other GPU factors
• Multi-sample antialiasing (MSAA)
– Downsample from high-res render
– Can significantly affect fill-rate
• Lights and Shadows
– CPU, vertex processing, pixel processing
Fanafzar Game Studio
56. Forward VS. Deferred
• Multiple render targets needed for
deferred
• Lot of fill-rate needed for deferred
• Performance is flattened
Fanafzar Game Studio
57. Shaders
• Memory
• Inter-shader communication
• Texture sampling (biggest problem with
memory)
• Computation
Fanafzar Game Studio
58. Other shader notes
• Shader compilation
• Shader count
– Penalty for many shaders in one scene
– Limits on GPU for shader execution
• Effect framework
– CgFX, ColladaFX (by tools like Nvidia FX
composer)
– Oriented towards ease of use than performance
– Engines have their own (Unreal 3, Unity, Source,
torque, Gamebryo)
Fanafzar Game Studio
61. Game Networking Data
• Events
– Guaranteed, Ordered
• State data
– Unordered, Not Guaranteed (opportunities for
optimization)
– Unless using lock step simulation
Fanafzar Game Studio
62. Bandwidth
• Bitstreams and Bit packing
– Flag -> one bit
– Health -> 7 bits
• Encoding on streams
TCP/UDP
BitStream
Decimation LZW Huffman
Most Recent State Events
Fanafzar Game Studio
63. Prioritizing Data
• Fill packet with most important data first
• Heuristic for most recent data (ex. how
close to player)
• Only send what you must
– ex. Cull enemy behind the wall
Fanafzar Game Studio
64. Packets
• Smaller than 1400 bytes
• Send packets regularly (Routers allocate
bandwidth to those who use it)
Fanafzar Game Studio
66. Profiling Networking
• Make sure networking code is efficient
– Measure compute and memory
• Expose what the networking layer is doing
– Number of packets
– Bandwidth for each packet
• Be aware of situations that client and
server get out of sync.
Fanafzar Game Studio
67. Mass Storage
• Hard Drives
• CD, DVD
• Blu-Ray
• Flash Drives
Fanafzar Game Studio
68. Performance Issues
• Seek Time
• Transfer Rate (ex. 75MB/sec)
• Worst Case
– 8ms delay between blocks on disk
– 4KB blocks
– Loading 1MB -> (1024/4) * 8 = 2048 ms = 2
secs
– Loading 1GB -> 34 min
Fanafzar Game Studio
69. Rule
• No disk IO in the inner loops
Fanafzar Game Studio
70. IO Profiling is hard
• File systems optimize themselves based on
access patterns
• Disk will rebalance data based on load and
sector failure
• Disk, disk controller, file system and OS will
cache and reorder requests
• User software may intercept the disk access
for virus scanning
• Good idea to test on fresh machines from
time to time
Fanafzar Game Studio
71. Disk IO performance tips
• Limit disk access
• Minimize reads and writes
– Read larger chunks
• Asynchronous Access
• Optimize file order
• Optimize data for fast loading
– Space on disk vs. Time to load (ex.
decompressing a JPG file)
Fanafzar Game Studio
72. Disk IO Tips
• Support development and runtime formats
• Support dynamic reloading
• Automate resource processing
• Centralize resource loading
– Resource Managers
• Preload when appropriate
• Stream
– First second of sound in memory
– Small texture mip levels in memory
– Small mesh LODs in memory
Fanafzar Game Studio
75. Scalability
• High performance is proportional to the
parallelizable section of an algorithm
• Amdahl’s Law
– S(N) = 1 / ((1 – P) + P/N)
– N: Processors, P: Parallelizable Portion
Fanafzar Game Studio
76. Contention
• More than one thread accessing the same
resource
• Some solutions
– Thread Safety (Mutex)
– Redundant Data
– Efficient Synchronization (Locks, Atomic
Operations, …)
Fanafzar Game Studio
79. False Sharing Ctd.
Struct vertex
{
float xyz[3]; // data 1
float tutuv[2]; // data 2
};
vertex triList[N];
------------------------------------------------------------
Struct vertices
{
float xyz[3][N];
float tutuv[3][N];
};
vertices triList;
Fanafzar Game Studio
80. Multi-threaded Profiling
• Look for time spent on synchronization
primitives
• Look out for Heisenbugs!
• Assess Amdahl’s Law
• Use multi-threaded profilers
Fanafzar Game Studio
81. No Synchronization is best
• Lock-free algorithms are great.
• Wait-free algorithms are event better!!
Mike Acton notes on wait free coding:
http://cellperformance.beyond3d.com/
articles/2009/08/roundup-recent-sketches-
on-concurrency-data-design-and-
performance.html
Fanafzar Game Studio
82. Managed Languages
• Execute on a runtime
• C#, Java, Javascript, lua, python, php,
Actionscript
Fanafzar Game Studio
83. Concerns for Profiling
• Garbage Collector
• Just in Time compiler
• No high accuracy timers
• Allocation can be costly, usually no stack
Fanafzar Game Studio
84. Managed/Unmanaged
• Gameplay code is usually not performance
critical
• Bottlenecks can be replaced with native
code
Fanafzar Game Studio
85. Dealing with GC
• Memory pressure causes GC to run
frequently and cause sudden hitches
• Memory pressure causes big memory
footprint and hurts cache efficiency
• Big total working set needs the GC to
check all the pointers
• Incremental GC behavior is helpful but
high pressure can force GC to collect all
Fanafzar Game Studio
86. Strategies for dealing with
GC
• Less data on heap
• Your own memory management
• Memory pooling
• Using temporary objects that are instances
as class members instead of local variable
creation
Fanafzar Game Studio
87. Dealing with JIT
• JIT activation time is important for
performance (startup, after a few function
calls, …)
• Constructors usually left out (Heavy
initialization code needs to be in a helper
function)
• JIT might not be available on all platforms
Fanafzar Game Studio
88. Optimizing Animation
• Channel Omission
• Quantization
• Sample Frequency and Key Omission
• Curve Based Compression
• Selective Loading and Streaming
• Hardware Skinning
Fanafzar Game Studio
89. Misc. Optimization Related
Topics
• Mesh LOD
• Animation LOD
• AI LOD
• Collision Detection Spatial Partitioning
• Physics Optimizations (GPU, Sleeps, …)
Fanafzar Game Studio
90. PIX Test Case
• PIX (Performance Investigator for Xbox
• Part of DirectX SDK
• Used for DirectX based applications
• Used for analyzing Garshasp 1 and
Garshasp: Temple of the Dragon
(Expansion)
Fanafzar Game Studio
91. Using PIX to Analyze
Garshasp
Fanafzar Game Studio
95. Garshasp Performance Post-
Mortem
• Animation skinning (Intel VTune)
– Switched to Hardware Skinning
• Asset Loading
– Used background thread
• Draw Calls
– Dynamic Far-Clip distance
• High RAM consumption
– Reduced particle quotas
– Reduced Area arrangement (changes in camera
system needed)
– Reduced Texture size
– Better strategies for audio loading/unloading
Fanafzar Game Studio
96. Garshasp Ctd.
• Large Video memory usage
– Changed mesh geometry
– Better seamlessness strategy
• Frame rate drops
– Better use of particles
– Modifications to camera angles and
seamlessness strategy
– Smaller areas for more even distribution of
resource loading.
Fanafzar Game Studio
97. Some un-resolved issues
• Un-optimized animation system
• Overdraw
• Slow Game Object update loop
• No static batching
– Use of vertex color for baked color
• Huge game save data
• In-efficient texture size usage
• No sound/video streaming
• + may more!
Fanafzar Game Studio
106. References
• Video Game Optimization, Ben Garney and Eric Preisz
• “How the left and right brain learned to love one another”, Tim Moss
http://timmoss.blogspot.com/2007/02/it-seems-reasonable-that-my-
very-first.html
• “Optimization is a Full time job”, Maciej Sinilo
http://msinilo.pl/blog/?p=483
• “Memory Optimizaton”, Christer Ericson,
http://www.research.scea.com/research/pdfs/
GDC2003_Memory_Optimization_18Mar03.pdf
• “A pragmatic approach to optimization”, Niklas Frykholm,
http://bitsquid.blogspot.com/2011/12/pragmatic-approach-to-
performance.html
Fanafzar Game Studio
107. References Ctd.
• Hacker’s Delight (Henry S. Warren, Addison
Wesley 2003)
• Advanced Bit Manipulation-fu, Christer Ericson
http://realtimecollisiondetection.net/blog/?p=78
• Networking for Programmers, Glenn Fiedler,
http://gafferongames.com/networking-for-game-
programmers/
• Source Multiplayer Networking, Valve Software,
https://developer.valvesoftware.com/wiki/
Source_Multiplayer_Networking
Fanafzar Game Studio
108. References Ctd.
• False sharing and its effect on memory performance,
William J. Bolosky,
http://static.usenix.org/publications/library/
proceedings/sedms4/full_papers/bolosky.txt
• Concurrency, Data Design and Performance, Mike
Acton,
http://cellperformance.beyond3d.com/articles/2009/08/
roundup-recent-sketches-on-concurrency-data-design-
and-performance.html
• Diving down the concurrency rabbit hole, Mike Acton,
http://www.insomniacgames.com/tech/articles/0809/
files/concurrency_rabit_hole.pdf
Fanafzar Game Studio
109. References Ctd.
• Scalar Quantization, Jonathan Blow,
http://number-none.com/product/Scalar%20Quantization/
index.html
• Are we out of memory, Christian Gyrling,
http://www.swedishcoding.com/2008/08/31/
are-we-out-of-memory/
• Practical Efficient Memory Management,
Jesus De Santos,
http://entland.homelinux.com/blog/
2008/08/19/practical-efficient-memory-
management/
•
Fanafzar Game Studio
110. References Ctd.
• Load Hit Store and the restrict keyword, Elan
Ruskin,
http://assemblyrequired.crashworks.org/
2008/07/08/load-hit-stores-and-the-__restrict-
keyword/
• How slow are virtual functions really, Elan Ruskin,
http://assemblyrequired.crashworks.org/
2009/01/19/how-slow-are-virtual-functions-really/
• Current Generation Parallelism in Games, Jon
Olick,
http://s08.idav.ucdavis.edu/olick-current-and-next-
generation-parallelism-in-games.pdf
Fanafzar Game Studio
111. References Ctd.
• Real Life Performance Pitfalls, Alan Murphy,
http://www.microsoft.com/en-us/download/
confirmation.aspx?id=3539
• Graphics Programming Black Book, Michael
Abrash
• Zen of Code Optimization, Michael Abrash
• The Free Lunch is Over, Herb Sutter,
http://www.gotw.ca/publications/concurrency-
ddj.htm
Fanafzar Game Studio
112. References Ctd.
• Intel Software Optimization Cookbook,
http://www.intel.com/intelpress/sum_swcb2.htm
• Pitfalls of Objects Oriented Programming, Tony
Albrecht,
http://www.reddit.com/r/programming/comments/
ag43j/
pitfalls_of_object_oriented_programming_pdf/
• Microsoft PIX,
http://msdn.microsoft.com/en-us/library/
ee663275(v=vs.85).aspx
Fanafzar Game Studio
113. References Ctd.
• Top 10 Myths of Video Game
Optimization,
http://www.gamasutra.com/view/feature/
130296/
the_top_10_myths_of_video_game_.php?
print=1
Fanafzar Game Studio