SlideShare una empresa de Scribd logo
1 de 28
LOOP FUSION FOR MEMORY SPACE
OPTIMIZATION
Antoine Fraboulet, Karen Kodary, Anne Mignotte


Presenter: Tanzir Musabbir
 Group of people from National Institute for Applied
  Science, Lyon, France
 Published in IEEE International Symposium on
  System Synthesis
 Year 2001.
OUTLINE
 Introduction
 Problems

 Some solutions

 Their Proposed Solution

 Experiments and Results

 Conclusion
INTRODUCTION
 Multimedia applications are memory intensive
  application
 Memory is known to be extremely power
  consuming
 Propose a new technique to optimize a behavioral
  description of multimedia application
 An optimal algorithm to reduce the use of
  temporary arrays by loop fusion
 Polynomial but efficient
OUTLINE
 Introduction
 Problems

 Some solutions

 Their Proposed Solution

 Experiments and Results

 Conclusion
PROBLEMS
 This paper mainly focused on data flow dominated
  embedded systems
 It consume memory for multidimensional data
  storage.
 More than half of the surface of this type of
  integrated system is filled by memory.
 That’s why control over power consumption is
  necessary.
OUTLINE
 Introduction
 Problems

 Some Solutions

 Their Proposed Solution

 Experiments and Results

 Conclusion
SOME SOLUTIONS
 Memory optimization can be done in several ways:
  the reduction of the size of the memory and
  improvement data movement strategies over the
  memory hierarchy
 It is useful to make optimization before partitioning

 As after hardware-software partitioning is
  done, memory has already been divided
 It also allows to optimize both types of
  memories, the one that is included in hardware and
  the one controlled by software.
SOME SOLUTIONS
 In-place-mapping can allow to share memory by
  overlapping arrays when possible
 Memory allocation can select memory modules
  upon several criteria such as size, number of ports.
 On the opposite, during silicon compilation the
  physical memory is optimized all along the design
  chain
OUTLINE
 Introduction
 Problems

 Some Solutions

 Their Proposed Solution

 Experiments and Results

 Conclusion
THEIR PROPOSED SOLUTION
 Proposed a “for” loop transformation to optimize
  memory size
 Loop fusion is a program transformation that
  collapses several loops into one
 In their proposed solution, memory optimization by
  loop fusion is obtained by reducing the size of
  temporary arrays that are typically used to store
  intermediate results during multimedia processing.
THEIR PROPOSED SOLUTION
   Scalar replacement technique can remove an entire
    array from the applications memory
THEIR PROPOSED SOLUTION
 Sometimes dependencies may not allow to remove
  completely an array by scalar replacement
 We can apply intra-array storage order optimization
  in such cases
THEIR PROPOSED SOLUTION
 Loop fusion increases the number of statements
  and accessed arrays within a loop nest.
 Sometimes loops access more data than can be
  handled by a cache.
 That’s why further code transformation are needed
  to complete the optimization process.
THEIR PROPOSED SOLUTION
 They consider a wider class of problems where the
  loops being considered for fusion need not have
  conformable headers.
 Loops with non conformable headers can be fused
  by using conditional statements to control the
  execution of operations within the loop nest
 As both scalar replacement and intra-array storage
  optimization techniques can handle conditional
  control flow.
THEIR PROPOSED SOLUTION
 They approximate memory size requirement by the
  maximum size of time overlapping arrays
 Memory cost function
THEIR PROPOSED SOLUTION
THEIR PROPOSED SOLUTION - DFG
THEIR PROPOSED SOLUTION – REMOVABLE
ARRAY DETECTION
CONFLICTS DETECTIONS AND RESOLUTION
   Some problems can arise when we consider the
    removable arrays altogether.
CONFLICTS DETECTIONS AND RESOLUTION
 We need to solve all possible conflicts by reducing
  the set of starred arrays without compromising the
  global optimality.
 Identifies all possible conflicts in the graph

 Solves all these conflicts in a global optimal way
CYCLE DETECTION ALGORITHM
INTEGER LINEAR PROGRAMMING CONFLICT
RESOLUTION
 ILP formulation for solving dependency cycles
  conflicts detected in the previous steps
 We have to decide which array will be taken out of
  the set of removable arrays
 It associates a binary variable Xa_i to each starred
  array a(i) that could be removed
 If Xa_i = 0 then the array will a_i will be considered
  for fusion otherwise (Xa_i = 1) the array will cease
  to be starred.
INTEGER LINEAR PROGRAMMING CONFLICT
RESOLUTION
 For multi-graphs between two nodes u and v, a new
  variable X_uv is introduced to resume the arrays on
  this multi-edges.
 If X_uv is set to 1 then all associated variables will
  also be set to 1 and all arrays will be unstarred.
  Otherwise a variable X_ai can be set to 1 without
  interfering with other arrays on the multi-edge.
 The objective of their ILP is to minimize the sum of
  the size of arrays that have to be removed from the
  set of all possible starred arrays detected in
  previous section
OUTLINE
 Introduction
 Problems

 Some solutions

 Their Proposed Solution

 Experiments and Results

 Conclusion
EXPERIMENTS AND RESULTS
 Tested their algorithms using randomly generated
  graphs.
 They have chosen to generate graphs that are a lot
  more complex
 Generated graphs were ranging from 10 to 30
  nodes.
 Each edge has the probability of 1/3 to be fusion
  preventing and all weights were randomly chosen
EXPERIMENTS AND RESULTS
OUTLINE
 Introduction
 Problems

 Some solutions

 Their Proposed Solution

 Experiments and Results

 Conclusion

Más contenido relacionado

Similar a Loop Fusion for Memory Space Optimization

ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxDebabrataPain1
 
Extending Perforce Scalability Using Job Content Synchronization
Extending Perforce Scalability Using Job Content SynchronizationExtending Perforce Scalability Using Job Content Synchronization
Extending Perforce Scalability Using Job Content SynchronizationPerforce
 
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...Aalto University
 
Iaetsd march c algorithm for embedded memories in fpga
Iaetsd march c algorithm for embedded memories in fpgaIaetsd march c algorithm for embedded memories in fpga
Iaetsd march c algorithm for embedded memories in fpgaIaetsd Iaetsd
 
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...taeseon ryu
 
On Applying Or-Parallelism and Tabling to Logic Programs
On Applying Or-Parallelism and Tabling to Logic ProgramsOn Applying Or-Parallelism and Tabling to Logic Programs
On Applying Or-Parallelism and Tabling to Logic ProgramsLino Possamai
 
A Literature Survey of Benchmark Functions For Global Optimization Problems
A Literature Survey of Benchmark Functions For Global Optimization ProblemsA Literature Survey of Benchmark Functions For Global Optimization Problems
A Literature Survey of Benchmark Functions For Global Optimization ProblemsXin-She Yang
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...IEEEFINALSEMSTUDENTPROJECTS
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...IEEEGLOBALSOFTSTUDENTPROJECTS
 
Product failure analysis using Explicit dynamic
Product failure analysis using Explicit dynamicProduct failure analysis using Explicit dynamic
Product failure analysis using Explicit dynamicnaga ram
 
Basic Learning Algorithms of ANN
Basic Learning Algorithms of ANNBasic Learning Algorithms of ANN
Basic Learning Algorithms of ANNwaseem khan
 
Diseño rapido de amplificadores con valores
Diseño rapido de amplificadores con valoresDiseño rapido de amplificadores con valores
Diseño rapido de amplificadores con valoresFélix Chávez
 
Integrating lock free and combining techniques for a practical and scalable f...
Integrating lock free and combining techniques for a practical and scalable f...Integrating lock free and combining techniques for a practical and scalable f...
Integrating lock free and combining techniques for a practical and scalable f...jpstudcorner
 
Intel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIntel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIlya Kryukov
 
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip ParallelismSummary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip ParallelismFarwa Ansari
 

Similar a Loop Fusion for Memory Space Optimization (20)

Fahroo - Computational Mathematics - Spring Review 2012
Fahroo - Computational Mathematics - Spring Review 2012 Fahroo - Computational Mathematics - Spring Review 2012
Fahroo - Computational Mathematics - Spring Review 2012
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
Extending Perforce Scalability Using Job Content Synchronization
Extending Perforce Scalability Using Job Content SynchronizationExtending Perforce Scalability Using Job Content Synchronization
Extending Perforce Scalability Using Job Content Synchronization
 
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...
 
Iaetsd march c algorithm for embedded memories in fpga
Iaetsd march c algorithm for embedded memories in fpgaIaetsd march c algorithm for embedded memories in fpga
Iaetsd march c algorithm for embedded memories in fpga
 
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
 
On Applying Or-Parallelism and Tabling to Logic Programs
On Applying Or-Parallelism and Tabling to Logic ProgramsOn Applying Or-Parallelism and Tabling to Logic Programs
On Applying Or-Parallelism and Tabling to Logic Programs
 
A Literature Survey of Benchmark Functions For Global Optimization Problems
A Literature Survey of Benchmark Functions For Global Optimization ProblemsA Literature Survey of Benchmark Functions For Global Optimization Problems
A Literature Survey of Benchmark Functions For Global Optimization Problems
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
 
Product failure analysis using Explicit dynamic
Product failure analysis using Explicit dynamicProduct failure analysis using Explicit dynamic
Product failure analysis using Explicit dynamic
 
Basic Learning Algorithms of ANN
Basic Learning Algorithms of ANNBasic Learning Algorithms of ANN
Basic Learning Algorithms of ANN
 
ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18
 
Diseño rapido de amplificadores con valores
Diseño rapido de amplificadores con valoresDiseño rapido de amplificadores con valores
Diseño rapido de amplificadores con valores
 
Integrating lock free and combining techniques for a practical and scalable f...
Integrating lock free and combining techniques for a practical and scalable f...Integrating lock free and combining techniques for a practical and scalable f...
Integrating lock free and combining techniques for a practical and scalable f...
 
Intel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIntel Cluster Poisson Solver Library
Intel Cluster Poisson Solver Library
 
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip ParallelismSummary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
 
Data Structure and Algorithm - Divide and Conquer
Data Structure and Algorithm - Divide and ConquerData Structure and Algorithm - Divide and Conquer
Data Structure and Algorithm - Divide and Conquer
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 

Loop Fusion for Memory Space Optimization

  • 1. LOOP FUSION FOR MEMORY SPACE OPTIMIZATION Antoine Fraboulet, Karen Kodary, Anne Mignotte Presenter: Tanzir Musabbir
  • 2.  Group of people from National Institute for Applied Science, Lyon, France  Published in IEEE International Symposium on System Synthesis  Year 2001.
  • 3. OUTLINE  Introduction  Problems  Some solutions  Their Proposed Solution  Experiments and Results  Conclusion
  • 4. INTRODUCTION  Multimedia applications are memory intensive application  Memory is known to be extremely power consuming  Propose a new technique to optimize a behavioral description of multimedia application  An optimal algorithm to reduce the use of temporary arrays by loop fusion  Polynomial but efficient
  • 5. OUTLINE  Introduction  Problems  Some solutions  Their Proposed Solution  Experiments and Results  Conclusion
  • 6. PROBLEMS  This paper mainly focused on data flow dominated embedded systems  It consume memory for multidimensional data storage.  More than half of the surface of this type of integrated system is filled by memory.  That’s why control over power consumption is necessary.
  • 7. OUTLINE  Introduction  Problems  Some Solutions  Their Proposed Solution  Experiments and Results  Conclusion
  • 8. SOME SOLUTIONS  Memory optimization can be done in several ways: the reduction of the size of the memory and improvement data movement strategies over the memory hierarchy  It is useful to make optimization before partitioning  As after hardware-software partitioning is done, memory has already been divided  It also allows to optimize both types of memories, the one that is included in hardware and the one controlled by software.
  • 9. SOME SOLUTIONS  In-place-mapping can allow to share memory by overlapping arrays when possible  Memory allocation can select memory modules upon several criteria such as size, number of ports.  On the opposite, during silicon compilation the physical memory is optimized all along the design chain
  • 10. OUTLINE  Introduction  Problems  Some Solutions  Their Proposed Solution  Experiments and Results  Conclusion
  • 11. THEIR PROPOSED SOLUTION  Proposed a “for” loop transformation to optimize memory size  Loop fusion is a program transformation that collapses several loops into one  In their proposed solution, memory optimization by loop fusion is obtained by reducing the size of temporary arrays that are typically used to store intermediate results during multimedia processing.
  • 12. THEIR PROPOSED SOLUTION  Scalar replacement technique can remove an entire array from the applications memory
  • 13. THEIR PROPOSED SOLUTION  Sometimes dependencies may not allow to remove completely an array by scalar replacement  We can apply intra-array storage order optimization in such cases
  • 14. THEIR PROPOSED SOLUTION  Loop fusion increases the number of statements and accessed arrays within a loop nest.  Sometimes loops access more data than can be handled by a cache.  That’s why further code transformation are needed to complete the optimization process.
  • 15. THEIR PROPOSED SOLUTION  They consider a wider class of problems where the loops being considered for fusion need not have conformable headers.  Loops with non conformable headers can be fused by using conditional statements to control the execution of operations within the loop nest  As both scalar replacement and intra-array storage optimization techniques can handle conditional control flow.
  • 16. THEIR PROPOSED SOLUTION  They approximate memory size requirement by the maximum size of time overlapping arrays  Memory cost function
  • 19. THEIR PROPOSED SOLUTION – REMOVABLE ARRAY DETECTION
  • 20. CONFLICTS DETECTIONS AND RESOLUTION  Some problems can arise when we consider the removable arrays altogether.
  • 21. CONFLICTS DETECTIONS AND RESOLUTION  We need to solve all possible conflicts by reducing the set of starred arrays without compromising the global optimality.  Identifies all possible conflicts in the graph  Solves all these conflicts in a global optimal way
  • 23. INTEGER LINEAR PROGRAMMING CONFLICT RESOLUTION  ILP formulation for solving dependency cycles conflicts detected in the previous steps  We have to decide which array will be taken out of the set of removable arrays  It associates a binary variable Xa_i to each starred array a(i) that could be removed  If Xa_i = 0 then the array will a_i will be considered for fusion otherwise (Xa_i = 1) the array will cease to be starred.
  • 24. INTEGER LINEAR PROGRAMMING CONFLICT RESOLUTION  For multi-graphs between two nodes u and v, a new variable X_uv is introduced to resume the arrays on this multi-edges.  If X_uv is set to 1 then all associated variables will also be set to 1 and all arrays will be unstarred. Otherwise a variable X_ai can be set to 1 without interfering with other arrays on the multi-edge.  The objective of their ILP is to minimize the sum of the size of arrays that have to be removed from the set of all possible starred arrays detected in previous section
  • 25. OUTLINE  Introduction  Problems  Some solutions  Their Proposed Solution  Experiments and Results  Conclusion
  • 26. EXPERIMENTS AND RESULTS  Tested their algorithms using randomly generated graphs.  They have chosen to generate graphs that are a lot more complex  Generated graphs were ranging from 10 to 30 nodes.  Each edge has the probability of 1/3 to be fusion preventing and all weights were randomly chosen
  • 28. OUTLINE  Introduction  Problems  Some solutions  Their Proposed Solution  Experiments and Results  Conclusion