1. LOOP FUSION FOR MEMORY SPACE
OPTIMIZATION
Antoine Fraboulet, Karen Kodary, Anne Mignotte
Presenter: Tanzir Musabbir
2. Group of people from National Institute for Applied
Science, Lyon, France
Published in IEEE International Symposium on
System Synthesis
Year 2001.
4. INTRODUCTION
Multimedia applications are memory intensive
application
Memory is known to be extremely power
consuming
Propose a new technique to optimize a behavioral
description of multimedia application
An optimal algorithm to reduce the use of
temporary arrays by loop fusion
Polynomial but efficient
6. PROBLEMS
This paper mainly focused on data flow dominated
embedded systems
It consume memory for multidimensional data
storage.
More than half of the surface of this type of
integrated system is filled by memory.
That’s why control over power consumption is
necessary.
8. SOME SOLUTIONS
Memory optimization can be done in several ways:
the reduction of the size of the memory and
improvement data movement strategies over the
memory hierarchy
It is useful to make optimization before partitioning
As after hardware-software partitioning is
done, memory has already been divided
It also allows to optimize both types of
memories, the one that is included in hardware and
the one controlled by software.
9. SOME SOLUTIONS
In-place-mapping can allow to share memory by
overlapping arrays when possible
Memory allocation can select memory modules
upon several criteria such as size, number of ports.
On the opposite, during silicon compilation the
physical memory is optimized all along the design
chain
11. THEIR PROPOSED SOLUTION
Proposed a “for” loop transformation to optimize
memory size
Loop fusion is a program transformation that
collapses several loops into one
In their proposed solution, memory optimization by
loop fusion is obtained by reducing the size of
temporary arrays that are typically used to store
intermediate results during multimedia processing.
12. THEIR PROPOSED SOLUTION
Scalar replacement technique can remove an entire
array from the applications memory
13. THEIR PROPOSED SOLUTION
Sometimes dependencies may not allow to remove
completely an array by scalar replacement
We can apply intra-array storage order optimization
in such cases
14. THEIR PROPOSED SOLUTION
Loop fusion increases the number of statements
and accessed arrays within a loop nest.
Sometimes loops access more data than can be
handled by a cache.
That’s why further code transformation are needed
to complete the optimization process.
15. THEIR PROPOSED SOLUTION
They consider a wider class of problems where the
loops being considered for fusion need not have
conformable headers.
Loops with non conformable headers can be fused
by using conditional statements to control the
execution of operations within the loop nest
As both scalar replacement and intra-array storage
optimization techniques can handle conditional
control flow.
16. THEIR PROPOSED SOLUTION
They approximate memory size requirement by the
maximum size of time overlapping arrays
Memory cost function
20. CONFLICTS DETECTIONS AND RESOLUTION
Some problems can arise when we consider the
removable arrays altogether.
21. CONFLICTS DETECTIONS AND RESOLUTION
We need to solve all possible conflicts by reducing
the set of starred arrays without compromising the
global optimality.
Identifies all possible conflicts in the graph
Solves all these conflicts in a global optimal way
23. INTEGER LINEAR PROGRAMMING CONFLICT
RESOLUTION
ILP formulation for solving dependency cycles
conflicts detected in the previous steps
We have to decide which array will be taken out of
the set of removable arrays
It associates a binary variable Xa_i to each starred
array a(i) that could be removed
If Xa_i = 0 then the array will a_i will be considered
for fusion otherwise (Xa_i = 1) the array will cease
to be starred.
24. INTEGER LINEAR PROGRAMMING CONFLICT
RESOLUTION
For multi-graphs between two nodes u and v, a new
variable X_uv is introduced to resume the arrays on
this multi-edges.
If X_uv is set to 1 then all associated variables will
also be set to 1 and all arrays will be unstarred.
Otherwise a variable X_ai can be set to 1 without
interfering with other arrays on the multi-edge.
The objective of their ILP is to minimize the sum of
the size of arrays that have to be removed from the
set of all possible starred arrays detected in
previous section
26. EXPERIMENTS AND RESULTS
Tested their algorithms using randomly generated
graphs.
They have chosen to generate graphs that are a lot
more complex
Generated graphs were ranging from 10 to 30
nodes.
Each edge has the probability of 1/3 to be fusion
preventing and all weights were randomly chosen