Processor Allocation and Task Scheduling of Matrix Chain Products
on Parallel Systems(Parallelizing matrix chain products)
Heejo Lee, Jong Kim, Sungje Hong, Sunggu Lee
Dept of Computer Science and Engineering Pohang University of
Science and Technology, Korea
2. Outline
1 Introduction
About the paper
The Problem
Notation
Problem Description
2 Processor Allocation for Matrix Products
Processor Allocation
DPA Algorithm
3 Matrix Chain Scheduling Algorithm
Matrix Chain Scheduling Algorithm
Two-Pass Matrix Chain Scheduling Algorithm
4 Conclusion
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 2 / 30
3. Introduction About the paper
About the paper
Processor Allocation and Task Scheduling of Matrix Chain Products
on Parallel Systems(Parallelizing matrix chain products)
Heejo Lee, Jong Kim, Sungje Hong, Sunggu Lee
Dept of Computer Science and Engineering Pohang University of
Science and Technology, Korea
Technical Report CS-HPC-97-003
April, 2003(December, 1997)
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 3 / 30
4. Introduction The Problem
MCOP: the Matrix Chain Ordering Problem
MCSP: the Matrix Chain Scheduling Problem
The paper introduce a processor scheduling algorithm for MCSP
which attempts to minimize the evaluations time of a chain of matrix
products on a parallel computer, even at the expense of a slight
increase in the total number of operations.
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 4 / 30
5. Introduction Notation
A lot of symbol...Orz
P: the number of processors in a parallel system.
M: a matrix chain product with n matrices, i.e,
M = M1 × M2 × · · · × Mn .
Mi : an mi × mi+1 matrix (mi ≥ 1, 1 ≤ i ≤ n).
L: a product sequence subtree of L for a matrix chain M.
Li,j : the sequence subtree of L for (Mi × · · · × Mj ).
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 5 / 30
6. Introduction Notation
A lot of symbol (cont’d)
C : the minimum amount of computation for evaluating M.
∆C : the amount of increased computation by modifying the current
sequence tree.
pi,j : the number of processor assigned for evaluating (Mi × · · · × Mj )
(mi , mj , mk ): single matrix product for multiplying an mi × mj matrix
by an mj × mk matrix.
Φ(mi , mj , mk , p): the execution time of single matrix product
(mi , mj , mk ) when p processors are allocated.
D(x): the set of divisors of x, i.e., D(x) = {d|d divides x}
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 6 / 30
7. Introduction Notation
A lot of symbol - 2003 new!
LD(x, y ): the largest divisor in D(x) that is not larger than y .
SD(x, y ): if x > y , then SD(x, y ) is the smallest divisor in D(x) that
is larger then y. Otherwise, SD(x, y ) = x
m: the largest dimension among all of the matrices, i.e.,
m = max1≤i≤n+1 (mi )
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 7 / 30
8. Introduction Problem Description
The Problem
The problemis finding the optimal schedule with minimum evaluation
time of M = M1 × M2 × · · · × Mn 1 on a P processor parallel system.
The matrix multiplication parallel algorithm is based that
multiplying A by B with p processors, the executions time
Φ(mi , mj , mk , p)2 can be approx imated as follows.
mi mj mk
p if 1 ≤ p ≤ mi mk
Φ(mi , mj , mk , p) ≈ mi mj mk
p log( mip k ) if
m mi mk < p ≤ mi mj mk
1
M: a matrix chain product with n matrices, i.e, M = M1 × M2 × · · · × Mn .
2
Φ(mi , mj , mk , p): the execution time of single matrix product (mi , mj , mk ) when p
processors are allocated.
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 8 / 30
9. Introduction Problem Description
The Problem - cont’d
The matrix products can divide to two parts. The Time is
3
Ti,j ≈ min (Ti,k (pi,j ) + Tk+1,j (pi,j ) + Φ(mi , mk+1 , mj+1 , pi,j ),
i≤k<j
max(Ti,k (pi,k ), Tk+1,j (pk+1,j )) + Φ(mi , mk+1 , mj+1 , pi,j ))
So, The paper define MCSP: find the product sequence for evaluating
a chain of matrix chain of matrix products and the processor schedule
for the sequence such that the evaluation time is minimized an a
parallel system.
3
pi,j : the number of processor assigned for evaluating (Mi × · · · × Mj )
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 9 / 30
10. Introduction Problem Description
MCSP Complexity
The paper assume that each matrix multiplication
(mi × mj ) × (mj × mk ) has log(mj ) time complexity mi mj mk
processors.
The problem can be presented
mini≤k<j max(Ti,k , Tk+1,j ) + log(mk+1 ) 1≤i <j ≤n
Ti,j =
0 i = j, 1 ≤ i ≤ n
but, MCSP is NP-hard.
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 10 / 30
11. Processor Allocation for Matrix Products Processor Allocation
the Concurrent Computations of Multiple Matrix Products
proportional allocation: the algorithm allocates processors
proportional to the computation amount of each task. Trying to
minimize the completion of all tasks by balancing the execution time
of each task.
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 11 / 30
12. Processor Allocation for Matrix Products Processor Allocation
the Concurrent Computations of Multiple Matrix Products
proportional allocation: the algorithm allocates processors
proportional to the computation amount of each task. Trying to
minimize the completion of all tasks by balancing the execution time
of each task.
For example
We have two matrix multiplication (2, 3, 8), (4, 4, 3) and 20 processor
If we allocate 10 processors to each matrix multiplication, we will
DA (2, 3, 8, 10) = D(2, 3, 8, 8) = 6 unit time
DB (4, 4, 3, 10) = D(4, 4, 3, 6) = 8 unit time
D(DA , DB ) = 8 unit time
If we allocate 8 processors to A, 12 processors to B, we will
DA (2, 3, 8, 8) = D(2, 3, 8, 8) = 6 unit time
DB (4, 4, 3, 12) = D(4, 4, 3, 12) = 4 unit time
D(DA , DB ) = 6 unit time
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 11 / 30
13. Processor Allocation for Matrix Products DPA Algorithm
DPA for Two Matrix Products- O(P) time
Discrete Processor Allocation for Two Matrix Products(DPA)
Input: Two Matrix products X = (mx , mx+1 , mx+2 ) and
Y = (my , my +1 , my +2 ) and a set of P processors.
Output: The number of processors allocated to the matrix products
X and Y , denoted as Px and Py , which satisfy 1 ≤ Px , Py ≤ P and
Px + Py ≤ P
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 12 / 30
14. Processor Allocation for Matrix Products DPA Algorithm
DPA for Two Matrix Products- O(P) time
Discrete Processor Allocation for Two Matrix Products(DPA)
mx mx+1 mx+2
1 Set Pprop = mx mx+1 mx+2 +my my +1 my +2 P
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 13 / 30
15. Processor Allocation for Matrix Products DPA Algorithm
DPA for Two Matrix Products- O(P) time
Discrete Processor Allocation for Two Matrix Products(DPA)
mx mx+1 mx+2
1 Set Pprop = mx mx+1 mx+2 +my my +1 my +2 P
2 Find dx,i in D(mx , mx+2 ) which satisfies dx,i ≤ Pprop
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 13 / 30
16. Processor Allocation for Matrix Products DPA Algorithm
DPA for Two Matrix Products- O(P) time
Discrete Processor Allocation for Two Matrix Products(DPA)
mx mx+1 mx+2
1 Set Pprop = mx mx+1 mx+2 +my my +1 my +2 P
2 Find dx,i in D(mx , mx+2 ) which satisfies dx,i ≤ Pprop
3 Find dy ,j in D(my , my +2 ) which satisfies dy ,j ≤ P − Pprop
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 13 / 30
17. Processor Allocation for Matrix Products DPA Algorithm
DPA for Two Matrix Products- O(P) time
Discrete Processor Allocation for Two Matrix Products(DPA)
mx mx+1 mx+2
1 Set Pprop = mx mx+1 mx+2 +my my +1 my +2 P
2 Find dx,i in D(mx , mx+2 ) which satisfies dx,i ≤ Pprop
3 Find dy ,j in D(my , my +2 ) which satisfies dy ,j ≤ P − Pprop
4 If Φ(mx , mx+1 , mx+2 , dx,i ) < Φ(my , my +1 , my +2 , dy ,j ), then
Px = dx,i , Py = P − dx,i . Otherwise Px = P − dy ,j , Py = dy ,j
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 13 / 30
18. Processor Allocation for Matrix Products DPA Algorithm
DPA description in 2003
mx mx+1 mx+2
1 Pprop = P × mx mx+1 mx+2 +my my +1 my +2
2 di = LD(mx mx+2 , Pprop )
3 di+1 = SD(mx mx+2 , Pprop )
4 dj = LD(mj mj+2 , P − Pprop )
5 dj+1 = SD(mj mj+2 , P − Pprop )
6 if Φ(X , di ) ≥ Φ(Y , dj ) then
1 if max(Φ(X , di+1 ), Φ(Y , LD(P − di+1 )))
Px = di+1 , Py = LD(my my +2 , P − di+1 )
2 else
Px = di , Py = LD(my my +2 , P − di + 1)
3 elseif max(Φ(X , LD(P − dj+1 )), Φ(Y , dj+1 )) < Φ(Y , dj ) then
Px = LD(mx mx+2 , P − dj+1 ), Py = dj+1
4 else
Px = LD(mx mx + 2, P − dj ), Py = dj
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 14 / 30
19. Processor Allocation for Matrix Products DPA Algorithm
DPA for Two Matrix Products - cont’d
the na¨ algorithm time complexity is O(P) , P is the number of
ıve
processors.(The major reason is step 2,3)
The completion time of two matrix products when processors are
allocated by the DPA algorithm is shorter then or equal to that by the
proportional allocation.
DPA guarantees the minimum completion time for two matrix
products.
DPA can easily extended for k independent matrix products.
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 15 / 30
20. Processor Allocation for Matrix Products DPA Algorithm
DPA for k Matrix Products
DPA-k guarantees the minimum completion time of k independent
matrix products.
Discrete Processor Allocation for k Matrix Products (DPA-k)
Input: k matrix produets Xi = (mi,1 , mi,2 , mi,3 ) for 1 ≤ i ≤ k are
given on P processos (k ≤ P)
Output: The number of allocated processors Pi for each matrix Xi
which satisfies k Pi ≤ P
i=1
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 16 / 30
21. Processor Allocation for Matrix Products DPA Algorithm
DPA for k Matrix Products - cont’d
Discrete Processor Allocation for k Matrix Products (DPA-k)
1 For i = 1 to k do
m m m
1 Pprop,i = Pk i,1 i,2 i,3 P
j=1 mj,1 mj,2 mj,3
2 Find the maximum di,j in D(mi,1 , mi,3 ) which satisfies di,j ≤ Pprop, i
3 Let Pi = di,j
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 17 / 30
22. Processor Allocation for Matrix Products DPA Algorithm
DPA for k Matrix Products - cont’d
Discrete Processor Allocation for k Matrix Products (DPA-k)
1 For i = 1 to k do
m m m
1 Pprop,i = Pk i,1 i,2 i,3 P
j=1 mj,1 mj,2 mj,3
2 Find the maximum di,j in D(mi,1 , mi,3 ) which satisfies di,j ≤ Pprop, i
3 Let Pi = di,j
k
2 While P − i=1 Pi > 0 do
1 Find the product Xi with the maimum Φ(mi,1 , mi,2 , mi,3 , Pi )
2 If (Pi < mi,1 mi,3 ) then
1 Find the minimum di,j in D(mi,1 , mi,3 ) which satisfies di,j > Pi
If (P − k Pi − (di,j − Pi ) > 0) then
P
2 i=1
Pi = di,j Otherwise, stop the algorithm.
else stop the algorithm
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 17 / 30
23. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm
Matrix Chain Scheduling Algorithm - Introduction
Matrix Chain Scheduling Algorithm
The algorithm has three stages.
1 Find optimal product sequence by MCOP
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 18 / 30
24. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm
Matrix Chain Scheduling Algorithm - Introduction
Matrix Chain Scheduling Algorithm
The algorithm has three stages.
1 Find optimal product sequence by MCOP
2 Top-Down Processor Assignment
processors are partitioned and assigned to each subtree to balance the
evaluation time of both matrix product chains.
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 18 / 30
25. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm
Matrix Chain Scheduling Algorithm - Introduction
Matrix Chain Scheduling Algorithm
The algorithm has three stages.
1 Find optimal product sequence by MCOP
2 Top-Down Processor Assignment
processors are partitioned and assigned to each subtree to balance the
evaluation time of both matrix product chains.
3 Bottom-Up Concurrent Execution
The steps executes products independently from the leaf and tries to
modify the product sequence to enhance concurrency so as to reduce
the evaluation time of Ma .
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 18 / 30
26. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm
Matrix Chain Scheduling Algorithm - Introduction
Matrix Chain Scheduling Algorithm
The algorithm has three stages.
1 Find optimal product sequence by MCOP
2 Top-Down Processor Assignment
processors are partitioned and assigned to each subtree to balance the
evaluation time of both matrix product chains.
3 Bottom-Up Concurrent Execution
The steps executes products independently from the leaf and tries to
modify the product sequence to enhance concurrency so as to reduce
the evaluation time of Ma .
a
M: a matrix chain product with n matrices, i.e, M = M1 × M2 × · · · × Mn .
The Algorithm Time Complexity is O(n2 + nP)
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 18 / 30
27. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm
First Step - Find Optimal Product Sequence by MCOP
The optimal product sequence can be found of in O(n log(n)) time
using a sequential algorithm.
Many parallel algorithms have been studied which run in ploylog time
on P processor system.
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 19 / 30
28. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm
First Step - Find Optimal Product Sequence by MCOP
The optimal product sequence can be found of in O(n log(n)) time
using a sequential algorithm.
Many parallel algorithms have been studied which run in ploylog time
on P processor system.
notation
W [i][j]: the minimum number for operations for evaluating Li,j a .
S[i][j]: the matrix index for partitioning the matrix chain
(Mi × · · · × Mj )
a
Li,j : the sequence subtree of L for (Mi × · · · × Mj ).
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 19 / 30
29. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm
Second Step - Top-Down Processor Assigment
pi,j 4 processors were assigned to Li,j 5
W [i][S[i][j]]
pi,j × W [i][S[i][j]]+W [S[i][j]+1][j] processors are assigned to the subtree
Li,S[i][j] .
W [S[i][j]+1][j]
pi,j × W [i][S[i][j]]+W [S[i][j]+1][j] processors are assigned to the subtree
LS[i][j]+1,j
It define the tree recursively.
4
pi,j : the number of processor assigned for evaluating (Mi × · · · × Mj )
5
Li,j : the sequence subtree of L for (Mi × · · · × Mj ).
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 20 / 30
30. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm
Second Step - For Example
For example, given a chain of 8 matrices with dimensions
{3, 8, 9, 5, 3, 3, 3, 4} on 64 processors parallel system.
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 21 / 30
31. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm
Third Step - Bottom-Up Concurrent Execution
There are idle processors in the execution of Li,j , the step try to
modify the product sequence to use these idle processors.
There are two condition.
y > x + 1 and the node associated with My +1 has the left child node
associated with My in the sequence tree.
y < x − 1 and the node associated with My has the right child node
associated with My +1 in the sequence tree.
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 22 / 30
33. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm
If y > z then
∆C = my +1 my +2 (my − mz ) + mz my (my +2 − my +1 )6
If y < z then
∆C = my +1 my +2 (my − mz+1 ) + mz+1 my (my +2 − my +1 )
Lemma
If a leaf product (Mx , Mx+1 ) has a candidate product (My My +1 ) and the
DPA algorithm will allocate px and py processors to the two matrix
products, respectively, then evaluation using the modified sequence
reduces the evaluation time when
∆C < min(Φ(mx , mx+1 , mx+2 , mx mx+2 )×(px +py −mx mx+2 ), my my +1 my +2 )
6
∆C : the amount of increased computation by modifying the current sequence tree.
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 24 / 30
35. Matrix Chain Scheduling Algorithm Two-Pass Matrix Chain Scheduling Algorithm
Stage-1 MCOP
Two-Pass Matrix Chain Scheduling Algorithm - Stage 1
Stage-1 MCOP
1 Find the optimal product sequence for the MCOP by using a parallel
algorithm.
2 Generate the sequence tree L.
The Dynamic Programing Algorhtm - O(n2 ) to O(n3 )
The sequential algorithm - O(n log(n))
The Parallel Algorithm - O(log3 n)
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 26 / 30
36. Matrix Chain Scheduling Algorithm Two-Pass Matrix Chain Scheduling Algorithm
Stage-2 Top-Down Processor Assignment
Two-Pass Matrix Chain Scheduling Algorithm - Stage 2
Stage-2 Top-Down Processor Assignment
1 Intialize i = 1, j = n, pi,j = P.
W [i][S[i][j]]
2 If i is not S[i][j], then allocate pi,j × W [i][S[i][j]]+W [S[i][j]+1][j]
processors to Li,S[i][j] .ab
W [S[i][j]+1][j]
3 If j is not S[i][j] + 1, then allocate pi,j × W [i][S[i][j]]+W [S[i][j]+1][j]
processors to LS[i][j]+1,j .
4 If i is j + 1 or j, then finish this stage; otherwise, call this algorithm
recursively, once with i = i, j = S[i][j] and once i = S[i][j] + 1, j = j.
a
W [i][j]: the minimum number for operations for evaluating Li,j .
b
S[i][j]: the matrix index for partitioning the matrix chain (Mi × · · · × Mj )
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 27 / 30
37. Matrix Chain Scheduling Algorithm Two-Pass Matrix Chain Scheduling Algorithm
Stage-3 Bottom-Up Concurrent Execution
Two-Pass Matrix Chain Scheduling Algorithm - Stage 3
Stage-3 Bottom-Up Concurrent Execution
1. Let (Mk , Mk+1 ) be a leaf product and pk,k+1 be the number of
processors allocated to the leaf product. If pk,k+1 < mk mk+2 , then go
to 5.
2. Find a candidate product by tracing ancestors of the leaf product using
postorder traversal. If there is no such candidate product, go to 5.
3. Let the product (Ml Ml+1 ) be a candidate product found by tracing
ancestors of the leaf product (Mk Mk+1 ). Check whether the candidate
product satisfies Lemma 2. If not, go to 2.
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 28 / 30
38. Matrix Chain Scheduling Algorithm Two-Pass Matrix Chain Scheduling Algorithm
Stage-3 Bottom-Up Concurrent Execution
Two-Pass Matrix Chain Scheduling Algorithm - Stage 3
Stage-3 Bottom-Up Concurrent Execution
4. Modify the sequence tree such that the candidate product (Ml Ml+1 )
can be executed concurrently with (Mk , Mk+1 ). Relocate pk pk+1
processors using the DPA algorithm and go to 1 for each leaf product
of two split subtrees.
5. Schedule the leaf product on min(pi,j , mk mk+2 ) processors. Set the
parent of the leaf procuctas a new leaf product.
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 29 / 30
39. Conclusion
Conclusion
We discuss the Matrix Chain Products Problem, DPA Algorithm,
Matrix Chain Scheduling Algorithm.
The paper emphasizes the optimal matrix chain products tree
modification on parallel system.
I will try to reduce my slide XD.
yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 30 / 30