In recent years, the generated and collected data is increasing at an almost exponential rate. At the same time, the data’s value has been identified in terms of insights that can be provided. However, retrieving the value requires powerful analysis tools, since valuable insights are buried deep in large amounts of noise. Unfortunately, analytic capacities did not scale well with the growing data. Many existing tools run only on a single computer and are limited in terms of data size by its memory. A very promising solution to deal with large-scale data is scaling systems and exploiting parallelism.
In this presentation, we propose Gilbert, a distributed sparse linear algebra system, to decrease the imminent lack of analytic capacities. Gilbert offers a MATLAB-like programming language for linear algebra programs, which are automatically executed in parallel. Transparent parallelization is achieved by compiling the linear algebra operations first into an intermediate representation. This language-independent form enables high-level algebraic optimizations. Different optimization strategies are evaluated and the best one is chosen by a cost-based optimizer. The optimized result is then transformed into a suitable format for parallel execution. Gilbert generates execution plans for Apache Spark and Apache Flink, two massively parallel dataflow systems. Distributed matrices are represented by square blocks to guarantee a well-balanced trade-off between data parallelism and data granularity.
An exhaustive evaluation indicates that Gilbert is able to process varying amounts of data exceeding the memory of a single computer on clusters of different sizes. Two well known machine learning (ML) algorithms, namely PageRank and Gaussian non-negative matrix factorization (GNMF), are implemented with Gilbert. The performance of these algorithms is compared to optimized implementations based on Spark and Flink. Even though Gilbert is not as fast as the optimized algorithms, it simplifies the development process significantly due to its high-level programming abstraction.
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems
1. Motivation Gilbert Evaluation Conclusion
Gilbert: Declarative Sparse Linear Algebra on
Massively Parallel Dataflow Systems
Till Rohrmann 1
Sebastian Schelter 2
Tilmann Rabl 2
Volker Markl 2
1
Apache Software Foundation
2
Technische Universität Berlin
March 8, 2017
1 / 25
3. Motivation Gilbert Evaluation Conclusion
Information Age
Collected data grows exponentially
Valuable information stored in data
Need for scalable analytical methods
3 / 25
4. Motivation Gilbert Evaluation Conclusion
Distributed Computing and Data Analytics
Writing parallel algorithms is tedious
and error-prone
Huge existing code base in form of
libraries
Need for parallelization tool
4 / 25
5. Motivation Gilbert Evaluation Conclusion
Requirements
Linear algebra is lingua franca of analytics
Parallelize programs automatically to simplify development
Sparse operations to support sparse problems efficiently
Goal
Development of distributed sparse linear algebra system
5 / 25
9. Motivation Gilbert Evaluation Conclusion
Gilbert Language
Subset of MATLAB R
language
Support of basic linear algebra
operations
Fixpoint operator serves as side-effect
free loop abstraction
Expressive enough to implement a wide
variety of machine learning algorithms
1 A = rand (10 , 2 ) ;
2 B = eye ( 1 0 ) ;
3 A’∗B;
4 f = @( x ) x . ^ 2 . 0 ;
5 eps = 0 . 1 ;
6 c = @(p , c ) norm (p−c , 2 ) < eps ;
7 f i x p o i n t (1/2 , f , 10 , c ) ;
9 / 25
10. Motivation Gilbert Evaluation Conclusion
Gilbert Typer
Matlab is dynamically typed
Dataflow systems require type knowledge at compile type
Automatic type inference using the Hindley-Milner type inference
algorithm
Infer also matrix dimensions for optimizations
1 A = rand (10 , 2 ) : Matrix ( Double , 10 , 2)
2 B = eye ( 1 0 ) : Matrix ( Double , 10 , 10)
3 A’∗B: Matrix ( Double , 2 , 10)
4 f = @( x ) x . ^ 2 . 0 : N −> N
5 eps = 0 . 1 : Double
6 c = @(p , c ) norm (p−c , 2 ) < eps : (N,N) −> Boolean
7 f i x p o i n t (1/2 , f , 10 , c ) : Double
10 / 25
11. Motivation Gilbert Evaluation Conclusion
Intermediate Representation & Gilbert Optimizer
Language independent representation of linear algebra programs
Abstraction layer facilitates easy extension with new programming
languages (such as R)
Enables language independent optimizations
Transpose push down
Matrix multiplication re-ordering
11 / 25
12. Motivation Gilbert Evaluation Conclusion
Distributed Matrices
(a) Row partitioning (b) Quadratic block partitioning
Which partitioning is better suited for matrix multiplications?
io_costrow = O n3
io_costblock = O n2
√
n
12 / 25
13. Motivation Gilbert Evaluation Conclusion
Distributed Operations: Addition
Apache Flink and Apache Spark offer MapReduce-like API with
additional operators: join, coGroup, cross
13 / 25
15. Motivation Gilbert Evaluation Conclusion
Gaussian Non-Negative Matrix Factorization
Given V ∈ Rd×w
find W ∈ Rd×t
and H ∈ Rt×w
such that V ≈ WH
Used in many fields: Computer vision, document clustering and
topic modeling
Efficient distributed implementation for MapReduce systems
Algorithm
H ← randomMatrix(t, w)
W ← randomMatrix(d, t)
while V − WH 2 > eps do
H ← H · (W T
V /W T
WH)
W ← W · (VHT
/WHHT
)
end while
15 / 25
16. Motivation Gilbert Evaluation Conclusion
Testing Setup
Set t = 10 and w = 100000
V ∈ Rd×100000
with sparsity 0.001
Block size 500 × 500
Numbers of cores 64
Flink 1.1.2 & Spark 2.0.0
Gilbert implementation: 5 lines
Distributed GNMF on Flink: 70 lines
1 V = rand ( $rows , 100000 , 0 , 1 , 0 . 0 0 1 ) ;
2 H = rand (10 , 100000 , 0 , 1 ) ;
3 W = rand ( $rows , 10 , 0 , 1 ) ;
4 nH = H. ∗ ( (W’ ∗V) . / (W’∗W∗H))
5 nW = W. ∗ (V∗nH ’ ) . / (W∗nH∗nH ’ )
16 / 25
17. Motivation Gilbert Evaluation Conclusion
Gilbert Optimizations
103
104
0
100
200
300
Rows d of V
Executiontimetins Optimized Spark
Optimized Flink
Non-optimized Spark
Non-optimized Flink
17 / 25
18. Motivation Gilbert Evaluation Conclusion
Optimizations Explained
Matrix updates
H ← H · (W T
V /W T
WH)
W ← W · (VHT
/WHHT
)
Non-optimized matrix multiplications
∈R10×100000
W T
W
∈R10×10
H
∈Rd×10
(WH)
∈Rd×100000
HT
Optimized matrix multiplications
∈R10×100000
W T
W
∈R10×10
H
∈Rd×10
W HHT
∈R10×10
18 / 25
19. Motivation Gilbert Evaluation Conclusion
GNMF Step: Scaling Problem Size
103
104
105
101
102
Number of rows of matrix V
Executiontimetins
Flink SP Flink
Spark SP Spark
Local
Distributed Gilbert execution handles much larger problem sizes than
local execution
Specialized implementation is slightly faster than Gilbert
19 / 25
20. Motivation Gilbert Evaluation Conclusion
GNMF Step: Weak Scaling
100
101
102
0
20
40
60
Number of cores
Executiontimetins
Flink
Spark
Both distributed backends show good weak scaling behaviour
20 / 25
21. Motivation Gilbert Evaluation Conclusion
PageRank
Ranking between entities with reciprocal quotations and references
PR(pi ) = d
pj ∈L(pi )
PR(pj )
D(pj )
+
1 − d
N
N - number of pages
d - damping factor
L(pi ) - set of pages being linked by pi
D(pi ) - number of linked pages by pi
M - transition matrix derived from adjacency matrix
R = d · MR +
1 − d
N
· 1
21 / 25
22. Motivation Gilbert Evaluation Conclusion
PageRank Implementation
MATLAB R
1 i t = 10;
2 d = sum(A, 2) ;
3 M = ( diag (1 . / d ) ∗ A) ’ ;
4 r_0 = ones (n , 1) / n ;
5 e = ones (n , 1) / n ;
6 f o r i = 1: i t
7 r = .85 ∗ M ∗ r + .15 ∗ e
8 end
Gilbert
1 i t = 10;
2 d = sum(A, 2) ;
3 M = ( diag (1 ./ d ) ∗ A) ’ ;
4 r_0 = ones (n , 1) / n ;
5 e = ones (n , 1) / n ;
6 f i x p o i n t ( r_0 ,
7 @( r ) .85 ∗ M ∗ r + .15 ∗ e ,
8 i t )
22 / 25
23. Motivation Gilbert Evaluation Conclusion
PageRank: 10 Iterations
104
105
101
102
103
104
Number of vertices n
Executiontimetins
Spark
Flink
SP Flink
SP Spark
Gilbert backends show similar performance
Specialized implementation faster because it can fuse operations
23 / 25
25. Motivation Gilbert Evaluation Conclusion
Conclusion
Easy to use sparse linear algebra
environment for people familiar with
MATLAB R
Scales to data sizes exceeding a single
computer
High-level linear algebra optimizations
improve runtime
Slower than specialized
implementations due to abstraction
overhead
25 / 25