Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems

Motivation Gilbert Evaluation Conclusion
Gilbert: Declarative Sparse Linear Algebra on
Massively Parallel Dataﬂow Systems
Till Rohrmann 1
Sebastian Schelter 2
Tilmann Rabl 2
Volker Markl 2
1
Apache Software Foundation
2
Technische Universität Berlin
March 8, 2017
1 / 25

Motivation
2 / 25

Information Age
Collected data grows exponentially
Valuable information stored in data
Need for scalable analytical methods
3 / 25

Distributed Computing and Data Analytics
Writing parallel algorithms is tedious
and error-prone
Huge existing code base in form of
libraries
Need for parallelization tool
4 / 25

Requirements
Linear algebra is lingua franca of analytics
Parallelize programs automatically to simplify development
Sparse operations to support sparse problems eﬃciently
Goal
Development of distributed sparse linear algebra system
5 / 25

Gilbert
6 / 25

Gilbert in a Nutshell
7 / 25

System architecture
8 / 25

Gilbert Language
Subset of MATLAB R
language
Support of basic linear algebra
operations
Fixpoint operator serves as side-eﬀect
free loop abstraction
Expressive enough to implement a wide
variety of machine learning algorithms
1 A = rand (10 , 2 ) ;
2 B = eye ( 1 0 ) ;
3 A’∗B;
4 f = @( x ) x . ^ 2 . 0 ;
5 eps = 0 . 1 ;
6 c = @(p , c ) norm (p−c , 2 ) < eps ;
7 f i x p o i n t (1/2 , f , 10 , c ) ;
9 / 25

Gilbert Typer
Matlab is dynamically typed
Dataﬂow systems require type knowledge at compile type
Automatic type inference using the Hindley-Milner type inference
algorithm
Infer also matrix dimensions for optimizations
1 A = rand (10 , 2 ) : Matrix ( Double , 10 , 2)
2 B = eye ( 1 0 ) : Matrix ( Double , 10 , 10)
3 A’∗B: Matrix ( Double , 2 , 10)
4 f = @( x ) x . ^ 2 . 0 : N −> N
5 eps = 0 . 1 : Double
6 c = @(p , c ) norm (p−c , 2 ) < eps : (N,N) −> Boolean
7 f i x p o i n t (1/2 , f , 10 , c ) : Double
10 / 25

Intermediate Representation & Gilbert Optimizer
Language independent representation of linear algebra programs
Abstraction layer facilitates easy extension with new programming
languages (such as R)
Enables language independent optimizations
Transpose push down
Matrix multiplication re-ordering
11 / 25

Distributed Matrices
(a) Row partitioning (b) Quadratic block partitioning
Which partitioning is better suited for matrix multiplications?
io_costrow = O n3
io_costblock = O n2
√
n
12 / 25

Distributed Operations: Addition
Apache Flink and Apache Spark oﬀer MapReduce-like API with
additional operators: join, coGroup, cross
13 / 25

Evaluation
14 / 25

Gaussian Non-Negative Matrix Factorization
Given V ∈ Rd×w
find W ∈ Rd×t
and H ∈ Rt×w
such that V ≈ WH
Used in many fields: Computer vision, document clustering and
topic modeling
Efficient distributed implementation for MapReduce systems
Algorithm
H ← randomMatrix(t, w)
W ← randomMatrix(d, t)
while V − WH 2 > eps do
H ← H · (W T
V /W T
WH)
W ← W · (VHT
/WHHT
)
end while
15 / 25

Testing Setup
Set t = 10 and w = 100000
V ∈ Rd×100000
with sparsity 0.001
Block size 500 × 500
Numbers of cores 64
Flink 1.1.2 & Spark 2.0.0
Gilbert implementation: 5 lines
Distributed GNMF on Flink: 70 lines
1 V = rand ( $rows , 100000 , 0 , 1 , 0 . 0 0 1 ) ;
2 H = rand (10 , 100000 , 0 , 1 ) ;
3 W = rand ( $rows , 10 , 0 , 1 ) ;
4 nH = H. ∗ ( (W’ ∗V) . / (W’∗W∗H))
5 nW = W. ∗ (V∗nH ’ ) . / (W∗nH∗nH ’ )
16 / 25

Gilbert Optimizations
103
104
0
100
200
300
Rows d of V
Executiontimetins Optimized Spark
Optimized Flink
Non-optimized Spark
Non-optimized Flink
17 / 25

Optimizations Explained
Matrix updates
H ← H · (W T
V /W T
WH)
W ← W · (VHT
/WHHT
)
Non-optimized matrix multiplications
∈R10×100000
W T
W
∈R10×10
H
∈Rd×10
(WH)
∈Rd×100000
HT
Optimized matrix multiplications
∈R10×100000
W T
W
∈R10×10
H
∈Rd×10
W HHT
∈R10×10
18 / 25

GNMF Step: Scaling Problem Size
103
104
105
101
102
Number of rows of matrix V
Executiontimetins
Flink SP Flink
Spark SP Spark
Local
Distributed Gilbert execution handles much larger problem sizes than
local execution
Specialized implementation is slightly faster than Gilbert
19 / 25

GNMF Step: Weak Scaling
100
101
102
0
20
40
60
Number of cores
Executiontimetins
Flink
Spark
Both distributed backends show good weak scaling behaviour
20 / 25

PageRank
Ranking between entities with reciprocal quotations and references
PR(pi ) = d
pj ∈L(pi )
PR(pj )
D(pj )
+
1 − d
N
N - number of pages
d - damping factor
L(pi ) - set of pages being linked by pi
D(pi ) - number of linked pages by pi
M - transition matrix derived from adjacency matrix
R = d · MR +
1 − d
N
· 1
21 / 25

PageRank Implementation
MATLAB R
1 i t = 10;
2 d = sum(A, 2) ;
3 M = ( diag (1 . / d ) ∗ A) ’ ;
4 r_0 = ones (n , 1) / n ;
5 e = ones (n , 1) / n ;
6 f o r i = 1: i t
7 r = .85 ∗ M ∗ r + .15 ∗ e
8 end
Gilbert
1 i t = 10;
2 d = sum(A, 2) ;
3 M = ( diag (1 ./ d ) ∗ A) ’ ;
4 r_0 = ones (n , 1) / n ;
5 e = ones (n , 1) / n ;
6 f i x p o i n t ( r_0 ,
7 @( r ) .85 ∗ M ∗ r + .15 ∗ e ,
8 i t )
22 / 25

PageRank: 10 Iterations
104
105
101
102
103
104
Number of vertices n
Executiontimetins
Spark
Flink
SP Flink
SP Spark
Gilbert backends show similar performance
Specialized implementation faster because it can fuse operations
23 / 25

Conclusion
24 / 25

Conclusion
Easy to use sparse linear algebra
environment for people familiar with
MATLAB R
Scales to data sizes exceeding a single
computer
High-level linear algebra optimizations
improve runtime
Slower than specialized
implementations due to abstraction
overhead
25 / 25

Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems

Similar a Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems (20)

Más de Till Rohrmann

Más de Till Rohrmann (17)

Último

Último (20)

Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems