Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Próxima SlideShare
Cargando en…5
×

# Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

8.257 visualizaciones

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Sé el primero en comentar

### Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

1. 1. Distributed Machine Learning with the Samsara DSL Sebastian Schelter, Flink Forward 2015
2. 2. About me • about to finish my PhD on „Scaling Data Mining in Massively Parallel Dataflow Systems“ • currently: – Machine Learning Scientist / Post-Doctoral Researcher at Amazon's Berlin-based ML group – senior researcher at the Database Group of TU Berlin • member of the Apache Software Foundation (Mahout/Giraph/Flink)
3. 3. Samsara • Samsara is an easy-to-use domain specific language (DSL) for distributed large-scale machine learning on systems like Apache Spark and Apache Flink • part of the Apache Mahout project • uses Scala as programming/scripting environment • system-agnostic, R-like DSL : val G = B %*% B.t - C - C.t + (ksi dot ksi) * (s_q cross s_q) • algebraic expression optimizer for distributed linear algebra – provides a translation layer to distributed engines T qq TTT ssCCBBG 
4. 4. Data Types • Scalar real values • In-memory vectors – dense – 2 types of sparse • In-memory matrices – sparse and dense – a number of specialized matrices • Distributed Row Matrices (DRM) – huge matrix, partitioned by rows – lives in the main memory of the cluster – provides small set of parallelized operations – lazily evaluated operation execution val x = 2.367 val v = dvec(1, 0, 5) val w = svec((0 -> 1)::(2 -> 5)::Nil) val A = dense((1, 0, 5), (2, 1, 4), (4, 3, 1)) val drmA = drmFromHDFS(...)
5. 5. Features (1) • matrix, vector, scalar operators: in-memory, distributed • slicing operators • assignments (in-memory only) • vector-specific • summaries drmA %*% drmB A %*% x A.t %*% drmB A * B A(5 until 20, 3 until 40) A(5, ::); A(5, 5) x(a to b) A(5, ::) := x A *= B A -=: B; 1 /:= x x dot y; x cross y A.nrow; x.length; A.colSums; B.rowMeans x.sum; A.norm
6. 6. Features (2) • solving linear systems • in-memory decompositions • distributed decompositions • caching of DRMs val x = solve(A, b) val (inMemQ, inMemR) = qr(inMemM) val ch = chol(inMemM) val (inMemV, d) = eigen(inMemM) val (inMemU, inMemV, s) = svd(inMemM) val (drmQ, inMemR) = thinQR(drmA) val (drmU, drmV, s) = dssvd(drmA, k = 50, q = 1) val drmA_cached = drmA.checkpoint() drmA_cached.uncache()
7. 7. Example
8. 8. Cereals Name protein fat carbo sugars rating Apple Cinnamon Cheerios 2 2 10.5 10 29.509541 Cap‘n‘Crunch 1 2 12 12 18.042851 Cocoa Puffs 1 1 12 13 22.736446 Froot Loops 2 1 11 13 32.207582 Honey Graham Ohs 1 2 12 11 21.871292 Wheaties Honey Gold 2 1 16 8 36.187559 Cheerios 6 2 17 1 50.764999 Clusters 3 2 13 7 40.400208 Great Grains Pecan 3 3 13 4 45.811716 http://lib.stat.cmu.edu/DASL/Datafiles/Cereals.html
9. 9. Linear Regression • Assumption: target variable y generated by linear combination of feature matrix X with parameter vector β, plus noise ε • Goal: find estimate of the parameter vector β that explains the data well • Cereals example X = weights of ingredients y = customer rating  Xy
10. 10. Data Ingestion • Usually: load dataset as DRM from a distributed filesystem: val drmData = drmFromHdfs(...) • ‚Mimick‘ a large dataset for our example: val drmData = drmParallelize(dense( (2, 2, 10.5, 10, 29.509541), // Apple Cinnamon Cheerios (1, 2, 12, 12, 18.042851), // Cap'n'Crunch (1, 1, 12, 13, 22.736446), // Cocoa Puffs (2, 1, 11, 13, 32.207582), // Froot Loops (1, 2, 12, 11, 21.871292), // Honey Graham Ohs (2, 1, 16, 8, 36.187559), // Wheaties Honey Gold (6, 2, 17, 1, 50.764999), // Cheerios (3, 2, 13, 7, 40.400208), // Clusters (3, 3, 13, 4, 45.811716)), // Great Grains Pecan numPartitions = 2)
11. 11. Data Preparation • Cereals example: target variable y is customer rating, weights of ingredients are features X • extract X as DRM by slicing, fetch y as in-core vector val drmX = drmData(::, 0 until 4) val y = drmData.collect(::, 4)                             8117164541333 4002084071323 7649995011726 1875593681612 87129221111221 20758232131112 73644622131211 04285118121221 509541291051022 . . . . . . . . .. drmX y
12. 12. Estimating β • Ordinary Least Squares: minimizes the sum of residual squares between true target variable and prediction of target variable • Closed-form expression for estimation of ß as • Computing XTX andXTy is as simple as typing the formulas: val drmXtX = drmX.t %*% drmX val drmXty = drmX %*% y yXXX TT 1 )(ˆ  
13. 13. Estimating β • Solve the following linear system to get least-squares estimate of ß • Fetch XTX andXTy onto the driver and use an in-core solver – assumes XTX fits into memory – uses analogon to R’s solve() function val XtX = drmXtX.collect val Xty = drmXty.collect(::, 0) val betaHat = solve(XtX, Xty) yXXX TT ˆ
14. 14. Estimating β • Solve the following linear system to get least-squares estimate of ß • Fetch XTX andXTy onto the driver and use an in-memory solver – assumes XTX fits into memory – uses analogon to R’s solve() function val XtX = drmXtX.collect val Xty = drmXty.collect(::, 0) val betaHat = solve(XtX, Xty) yXXX TT ˆ → We have implemented distributed linear regression!
15. 15. Goodness of fit • Prediction of the target variable is simple matrix-vector multiplication • Check L2 norm of the difference between true target variable and our prediction val yHat = (drmX %*% betaHat).collect(::, 0) (y - yHat).norm(2) ˆˆ Xy 
16. 16. Adding a bias term • Bias term left out so far – constant factor added to the model, “shifts the line vertically” • Common trick is to add a column of ones to the feature matrix – bias term will be learned automatically                             141333 171323 111726 181612 1111221 1131112 1131211 1121221 11051022 .                             41333 71323 11726 81612 111221 131112 131211 121221 105.1022
17. 17. Adding a bias term • How do we add a new column to a DRM? → mapBlock() allows for custom modifications of the matrix val drmXwithBiasColumn = drmX.mapBlock(ncol = drmX.ncol + 1) { case(keys, block) => // create a new block with an additional column val blockWithBiasCol = block.like(block.nrow, block.ncol+1) // copy data from current block into the new block blockWithBiasCol(::, 0 until block.ncol) := block // last column consists of ones blockWithBiasColumn(::, block.ncol) := 1 keys -> blockWithBiasColumn }
18. 18. Under the covers
19. 19. Underlying systems • prototype on Apache Spark • prototype on h20: • coming up: support for Apache Flink
20. 20. Runtime & Optimization • Execution is defered, user composes logical operators • Computational actions implicitly trigger optimization (= selection of physical plan) and execution • Optimization factors: size of operands, orientation of operands, partitioning, sharing of computational paths • e. g.: matrix multiplication: – 5 physical operators for drmA %*% drmB – 2 operators for drmA %*% inMemA – 1 operator for drm A %*% x – 1 operator for x %*% drmA val C = A.t %*% A I.writeDrm(path); val inMemV =(U %*% M).collect
21. 21. Optimization Example • Computation of ATA in example • Naïve execution 1st pass: transpose A (requires repartitioning of A) 2nd pass: multiply result with A (expensive, potentially requires repartitioning again) • Logical optimization: rewrite plan to use specialized logical operator for Transpose-Times-Self matrix multiplication val C = A.t %*% A
22. 22. Optimization Example • Computation of ATA in example • Naïve execution 1st pass: transpose A (requires repartitioning of A) 2nd pass: multiply result with A (expensive, potentially requires repartitioning again) • Logical optimization: rewrite plan to use specialized logical operator for Transpose-Times-Self matrix multiplication val C = A.t %*% A Transpose A
23. 23. Optimization Example • Computation of ATA in example • Naïve execution 1st pass: transpose A (requires repartitioning of A) 2nd pass: multiply result with A (expensive, potentially requires repartitioning again) • Logical optimization: rewrite plan to use specialized logical operator for Transpose-Times-Self matrix multiplication val C = A.t %*% A Transpose MatrixMult A A C
24. 24. Optimization Example • Computation of ATA in example • Naïve execution 1st pass: transpose A (requires repartitioning of A) 2nd pass: multiply result with A (expensive, potentially requires repartitioning again) • Logical optimization Optimizer rewrites plan to use specialized logical operator for Transpose-Times-Self matrix multiplication val C = A.t %*% A Transpose MatrixMult A A C Transpose- Times-Self A C
25. 25. Transpose-Times-Self • Samsara computes ATA via row-outer-product formulation – executes in a single pass over row-partitioned A   m i T ii T aaAA 0
26. 26. Transpose-Times-Self • Samsara computes ATA via row-outer-product formulation – executes in a single pass over row-partitioned A   m i T ii T aaAA 0 A
27. 27. Transpose-Times-Self • Samsara computes ATA via row-outer-product formulation – executes in a single pass over row-partitioned A   m i T ii T aaAA 0 x AAT
28. 28. Transpose-Times-Self • Samsara computes ATA via row-outer-product formulation – executes in a single pass over row-partitioned A   m i T ii T aaAA 0 x = x AAT a1• a1• T
29. 29. Transpose-Times-Self • Samsara computes ATA via row-outer-product formulation – executes in a single pass over row-partitioned A   m i T ii T aaAA 0 x = x + x AAT a1• a1• T a2• a2• T
30. 30. Transpose-Times-Self • Mahout computes ATA via row-outer-product formulation – executes in a single pass over row-partitioned A   m i T ii T aaAA 0 x = x + +x x AAT a1• a1• T a2• a2• T a3• a3• T
31. 31. Transpose-Times-Self • Samsara computes ATA via row-outer-product formulation – executes in a single pass over row-partitioned A   m i T ii T aaAA 0 x = x + + +x x x AAT a1• a1• T a2• a2• T a3• a3• T a4• a4• T
32. 32. Physical operators for the distributed computation of ATA
33. 33. Physical operators for Transpose-Times-Self • Two physical operators (concrete implementations) available for Transpose-Times-Self operation – standard operator AtA – operator AtA_slim, specialized implementation for tall & skinny matrices • Optimizer must choose – currently: depends on user-defined threshold for number of columns – ideally: cost based decision, dependent on estimates of intermediate result sizes Transpose- Times-Self A C
34. 34. Physical operator AtA           1100 0101 0111 A
35. 35. A2  1100 Physical operator AtA           1100 0101 0111 A1 A worker 1 worker 2       0101 0111
36. 36. A2  1100 Physical operator AtA           1100 0101 0111 A1 A worker 1 worker 2       0101 0111 for 1st partition for 1st partition
37. 37. A2  1100 Physical operator AtA           1100 0101 0111 A1 A worker 1 worker 2       0101 0111  0111 1 1        1100 0 0       for 1st partition for 1st partition
38. 38. A2  1100 Physical operator AtA           1100 0101 0111 A1 A worker 1 worker 2       0101 0111  0111 1 1        1100 0 0       for 1st partition for 1st partition  0101 0 1      
39. 39. A2  1100 Physical operator AtA           1100 0101 0111 A1 A worker 1 worker 2       0101 0111  0111 1 1        1100 0 0       for 1st partition for 1st partition  0101 0 1       for 2nd partition for 2nd partition
40. 40. A2  1100 Physical operator AtA           1100 0101 0111 A1 A worker 1 worker 2       0101 0111  0111 1 1        1100 0 0       for 1st partition for 1st partition  0101 0 1        0111 0 1       for 2nd partition  1100 1 1       for 2nd partition
41. 41. A2  1100 Physical operator AtA           1100 0101 0111 A1 A worker 1 worker 2       0101 0111  0111 1 1        1100 0 0       for 1st partition for 1st partition  0101 0 1        0111 0 1       for 2nd partition  0101 0 1        1100 1 1       for 2nd partition
42. 42. A2  1100 Physical operator AtA           1100 0101 0111 A1 A worker 1 worker 2       0101 0111       0111 0111       0000 0000 for 1st partition for 1st partition       0000 0101       0000 0111 for 2nd partition       0000 0101       1100 1100 for 2nd partition
43. 43. A2  1100 Physical operator AtA           1100 0101 0111 A1 A worker 1 worker 2       0101 0111       0111 0111       0000 0000 for 1st partition for 1st partition       0000 0101       0000 0111 for 2nd partition       0000 0101       1100 1100 for 2nd partition       0111 0212 worker 3       1100 1312 worker 4 ∑ ∑ AT A
44. 44. Physical operator AtA_slim           1100 0101 0111 A
45. 45. A2  1100 Physical operator AtA_slim           1100 0101 0111 A1 A worker 1 worker 2       0101 0111
46. 46. A2 T A2A2  1100                  1 11 000 0000 Physical operator AtA_slim           1100 0101 0111 A1 T A1A1 A worker 1 worker 2       0101 0111                  0 02 011 0212
47. 47. A2 T A2A2  1100                  1 11 000 0000 Physical operator AtA_slim           1100 0101 0111 A1 T A1A1 A C = AT A worker 1 worker 2 A1 T A1 + A2 T A2 driver       0101 0111                  0 02 011 0212               1100 1312 0111 0212
48. 48. Pointers • Contribution to Apache Mahout in progress: https://issues.apache.org/jira/browse/MAHOUT-1570 • Apache Mahout has extensive documentation on Samsara – http://mahout.apache.org/users/environment/in-core-reference.html – https://mahout.apache.org/users/environment/out-of-core-reference.html
49. 49. Thank you. Questions?