2. Group of people from Princeton University, NJ
Published in International Symposium on Code
Generation and Optimization: Feedback-Directed
and Runtime Optimization
Year 2003.
4. INTRODUCTION - PROCESSORS
Become more complex
Incorporate additional computation resources
Compiler can no longer rely on simple instruction
count to guide optimization
It has to balance resource utilization, register usage
and dependences
5. INTRODUCTION - COMPILERS
As a consequences
Compiler becomes complex
Use optimizations aggressively
Have to use predictive heuristics in order to decide
where and to what extend optimizations should be
applied
7. PROBLEMS - PREDICTIVE HEURISTICS?
Modern compilers employ predictive heuristics
Tries to determine a priori the benefits of certain
optimization
Are tuned by compiler writers to give the highest
average performance
Resulting optimization decisions remain suboptimal
for many individual code segments
Leaving significant potential performance gains
unrealized
9. SOME SOLUTIONS – ITERATIVE COMPILATION
Compiling a program multiple times with different
optimization configurations
After applying several optimizations the predictive
heuristics are eliminated
Results are not directly applicable to modern-
purpose architectures and applications
Incur large compile time
11. THEIR SOLUTION – OPTIMIZATION-SPACE
EXPLORATION
General and practical version of iterative
compilation
Explores the space of optimization configurations
through multiple compilations
12. THEIR SOLUTION – OPTIMIZATION-SPACE
EXPLORATION
To address the compile time:
It uses the experience of the compiler writer to prune
the number of configurations that should be explored
Uses a performance estimator to not evaluate the code
by execution
Selects a custom configuration for each code segment
Selects next optimization configuration by examining the
previous configurations characteristics
13. SINGLE FIXED CONFIGURATION
A set of fixed heuristics is applied to each code
segment
Only one version of the code exists at any given
time
That version is passed from transformation to
transformation
14.
15. OSE OVER MANY CONFIGURATIONS
OSE compiler simultaneously applies multiple
transformation sequences on each code segment
Each version is optimized using a different
optimization configuration.
The compiler emits the fittest version as determined
by the performance evaluator
16. OSE – LIMITING THE SEARCH SPACE
Optimization Space
Derived from a set of optimization parameters
Optimization Parameters
Optimization level
High Level Optimization (HLO) level
Micro-architecture type
Coalesce adjacent loads and stores
HLO phase order
Loop unroll limit
Update dependencies after unrolling
Perform software pipelining
17. OSE – LIMITING THE SEARCH SPACE
Optimization Parameters
Heuristic to disable software pipelining
Allow control speculation during software pipelining
Software pipeline outer loops
Enable if-conversion heuristic for software pipelining
Software pipeline loops with early exists
Enable if conversion
Enable non-standard predication
Enable pre-scheduling
Scheduler ready criterion
18. COMPILER CONSTRUCTION-TIME PRUNING
Limit the total number of configurations that will be
considered at compile time
Construct a set S with at most N configurations
S is chosen by determining the impact on a
representative set of code segments C as follows:
S’ = default configuration + configurations with non-default
parameters
a) run C compiled with S’ on real hardware and retain in S’
only the valuable configurations
b) consider the combination of configurations in S’ as S’’
repeat a) for S’’ and retain only the best N configurations
repeat b) until no new configurations can be generated or
the speedup does not improve
19. OSE – LIMITING THE SEARCH SPACE
Characterizing Configuration Correlations
build a optimization configuration tree
critical configurations = conf. at the same level
1. Construct O = set of m most important
configurations in S for all
code segments in C
2. Choose all oi in O as the successor of the
root node.
3. For each configurations oi in O:
4. Construct Ci = {cj: argmax(pj,k) = i} k=1…m
5. Repeat steps 3, 4 to find oi successors
limiting
the code segments to Ci and configurations
to SO.
20. OSE – LIMITING THE SEARCH SPACE
Compile-time search
Do a breadth first search on the optimization
configuration tree
Choose the configuration that yields the best estimated
performance
21. OSE – LIMITING THE SEARCH SPACE
Limit the OSE application
To hot code segments
Hot code segments are identified through profiling or
hardware performance counters during a program run
22. EVALUATION
OSE Compiler Algorithm
1. Profile the code
2. For each Function:
3. Compile to the high level IR
4. Optimize using HLO
5. For each Function:
6. If the function is hot:
7. Perform OSE on second HLO and CG
8. Emit the function using the best
configuration
9. If the function is not hot use the
standard configuration
23. COMPILE TIME PERFORMANCE ESTIMATION
Model Based on:
Ideal Cycle Count – T
Data cache performance, Lambda, L
Instruction cache performance, I
Branch mis-prediction, B
28. CONCLUSION
OSE doe not incur the prohibitive compile-time
costs of other iterative compilation approaches
Compile time is limited in three ways
OCE is capable of delivering significant
performance benefits, while keeping compile times
reasonable
It gets more than 20% performance improvement in
some cases for SPEC codes