SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 1
Flink Meetup #8
Data flow vs. procedural
programming: How to put your
algorithms into Flink
June 23, 2015
Mikio L. Braun
@mikiobraun
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 2
Programming how we're used to
● Computing a sum
● Tools at our disposal:
– variables
– control flow (loops, if)
– function calls as basic piece of abstraction
def computeSum(a):
sum = 0
for i in range(len(a))
sum += a[i]
return sum
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 3
Data Analysis Algorithms
Let's consider centering
becomes
or even just
def centerPoints(xs):
sum = xs[0].copy()
for i in range(1, len(xs)):
sum += xs[i]
mean = sum / len(xs)
for i in range(len(xs)):
xs[i] -= mean
return xs
xs -
xs.mean(axis=0)
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 4
Don't use for-loops
● Put your data into a matrix
● Don't use for loops
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 5
Least Squares Regression
● Compute
● Becomes
What you learn is thinking in matrices, breaking
down computations in terms of matrix algebra
def lsr(X, y, lam):
d = X.shape[1]
C = X.T.dot(X) + lam * pl.eye(d)
w = np.linalg.solve(C, X.T.dot(y))
return w
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 6
Basic tools
Advantage
– very familiar
– close to math
Disadvantage
– hard to scale
● Basic procedural programming paradigm
● Variables
● Ordered arrays and efficient functions on those
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 7
Parallel Data Flow
Often you have stuff like
Which is inherently easy to scale
for i in someSet:
map x[i] to y[i]
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 8
New Paradigm
● Basic building block is an (unordered) set.
● Basic operations inherently parallel
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 9
Computing, Data Flow Style
Computing a sum
Computing a mean
sum(x) = xs.reduce((x,y) => x + y)
mean(x) = xs.map(x => (x,1))
.reduce((xc, yc) => (xc._1 + yc._1, xc._2 + yc._2))
.map(xc => xc._1 / xc._2)
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 10
Apache Flink
● Data Flow system
● Basic building block is a DataSet[X]
● For execution, sets up all computing nodes,
streams through data
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 11
Apache Flink: Getting Started
● Use Scala API
● Minimal project with Maven (build tool) or
Gradle
● Use an IDE like IntelliJ
● Always import
org.apache.flink.api.scala._
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 12
Centering (First Try)
def computeMeans(xs: DataSet[DenseVector]) =
xs.map(x => (x,1))
.reduce((xc, yc) => (xc._1 + yc._1, xc._2 + yc._2))
.map(xc => xc._1 / xc._2)
def centerPoints(xs: DataSet[DenseVector]) = {
val mean = computeMean(xs)
xs.map(x => x – mean)
}
You cannot nest DataSet operations!
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 13
Sorry, restrictions apply.
● Variables hold (lazy) computations
● You can't work with sets within the operations
● Even if result is just a single element, it's a
DataSet[Elem].
● So what to do?
– cross joins
– broadcast variables
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 14
Centering (Second Try)
Works, but seems excessive because the mean
is copied to each data element.
def computeMeans(xs: DataSet[DenseVector]) =
xs.map(x => (x,1))
.reduce((xc, yc) => (xc._1 + yc._1, xc._2 + yc._2))
.map(xc => xc._1 / xc._2)
def centerPoints(xs: DataSet[DenseVector]) = {
val mean = computeMean(xs)
xs.crossWithTiny(mean).map(xm => xm._1 – xm._2)
}
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 15
Broadcast Variables
● Side information sent to all worker nodes
● Can be a DataSet
● Gets accessed as a Java collection
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 16
class BroadcastSingleElementMapper[T, B, O](fun: (T, B) => O)
extends RichMapFunction[T, O] {
var broadcastVariable: B = _
@throws(classOf[Exception])
override def open(configuration: Configuration): Unit = {
broadcastVariable = getRuntimeContext
.getBroadcastVariable[B]("broadcastVariable")
.get(0)
}
override def map(value: T): O = {
fun(value, broadcastVariable)
}
}
Broadcast Variables
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 17
Centering (Third Try)
def computeMeans(xs: DataSet[DenseVector]) =
xs.map(x => (x,1))
.reduce((xc, yc) => (xc._1 + yc._1, xc._2 + yc._2))
.map(xc => xc._1 / xc._2)
def centerPoints(xs: DataSet[DenseVector]) = {
val mean = computeMean(xs)
xs.mapWithBcVar(mean).map((x, m) => x – m)
}
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 18
Intermediate Results pattern
val x = someDataSetComputation()
val y = someOtherDataSetComputation()
val z = dataSet.mapWithBcVar(x)((d, x) => …)
val result = anotherDataSet.mapWithBcVar((y,z)) {
(d, yz) =>
val (y,z) = yz
…
}
x = someComputation()
y = someOtherComputation()
z = someComputationOn(dataSet, x)
result = moreComputationOn(y, z)
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 19
Matrix Algebra
● No ordered sets per se in Data Flow context.
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 20
Vector operations by explicit joins
● Encode vector (a1, a2, …, an) with
{(1, a1), (2, a2), … (n, an)}
● Addition:
– a.join(b).where(0).equalTo(0)
.map((ab) => (ab._1._1, ab._1._2 + ab._2._2))
after join: {((1, a1), (1, b1)), ((2, a1), (2, b1)), … }
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 21
Back to Least Squares Regression
Two operations: computing X'X and X'Y
def lsr(xys: DataSet[(DenseVector, Double)]) = {
val XTX = xs.map(x => x.outer(x)).reduce(_ + _)
val XTY = xys.map(xy => xy._1 * xy._2).reduce(_ + _)
C = XTX.mapWithBcVar(XTY) { vars =>
val XTX = vars._1
val XTY = var.s_2
val weight = XTX  XTY
}
}
June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 22
Summary and Outlook
● Procedural vs. Data Flow
– basic building blocks elementwise operations on
unordered sets
– can't be nested
– combine intermediate results via broadcast vars
● Iterations
● Beware of TypeInformation implicits.

Más contenido relacionado

La actualidad más candente

Computing and Data Analysis for Environmental Applications
Computing and Data Analysis for Environmental ApplicationsComputing and Data Analysis for Environmental Applications
Computing and Data Analysis for Environmental ApplicationsStatistics Assignment Help
 
CLASS XII COMPUTER SCIENCE MONTHLY TEST PAPER
CLASS XII COMPUTER SCIENCE MONTHLY TEST  PAPERCLASS XII COMPUTER SCIENCE MONTHLY TEST  PAPER
CLASS XII COMPUTER SCIENCE MONTHLY TEST PAPERRc Os
 
Your first TensorFlow programming with Jupyter
Your first TensorFlow programming with JupyterYour first TensorFlow programming with Jupyter
Your first TensorFlow programming with JupyterEtsuji Nakai
 
GIS Work Example Portfolio
GIS Work Example PortfolioGIS Work Example Portfolio
GIS Work Example PortfolioNicholas Raio
 
Assignment premier academic writing agency with industry
Assignment premier academic writing agency with industry Assignment premier academic writing agency with industry
Assignment premier academic writing agency with industry Nicole Valerio
 
R class 5 -data visualization
R class 5 -data visualizationR class 5 -data visualization
R class 5 -data visualizationVivian S. Zhang
 
2 transformation computer graphics
2 transformation computer graphics2 transformation computer graphics
2 transformation computer graphicscairo university
 
System approach in civil engg slideshare.vvs
System approach in civil engg slideshare.vvsSystem approach in civil engg slideshare.vvs
System approach in civil engg slideshare.vvsvrushali sasane
 
Functional Programming for Fun and Profit
Functional Programming for Fun and ProfitFunctional Programming for Fun and Profit
Functional Programming for Fun and Profitjamesottaway
 
Enhancing Partition Crossover with Articulation Points Analysis
Enhancing Partition Crossover with Articulation Points AnalysisEnhancing Partition Crossover with Articulation Points Analysis
Enhancing Partition Crossover with Articulation Points Analysisjfrchicanog
 
Graphing Functions and Their Transformations
Graphing Functions and Their TransformationsGraphing Functions and Their Transformations
Graphing Functions and Their Transformationszacho1c
 
Non Deterministic and Deterministic Problems
Non Deterministic and Deterministic Problems Non Deterministic and Deterministic Problems
Non Deterministic and Deterministic Problems Scandala Tamang
 
Machine Learning lecture5(octave)
Machine Learning lecture5(octave)Machine Learning lecture5(octave)
Machine Learning lecture5(octave)cairo university
 

La actualidad más candente (20)

Probability
ProbabilityProbability
Probability
 
Data Visualization With R
Data Visualization With RData Visualization With R
Data Visualization With R
 
Computing and Data Analysis for Environmental Applications
Computing and Data Analysis for Environmental ApplicationsComputing and Data Analysis for Environmental Applications
Computing and Data Analysis for Environmental Applications
 
Lecture 3.6 bt
Lecture 3.6 btLecture 3.6 bt
Lecture 3.6 bt
 
Dma
DmaDma
Dma
 
CLASS XII COMPUTER SCIENCE MONTHLY TEST PAPER
CLASS XII COMPUTER SCIENCE MONTHLY TEST  PAPERCLASS XII COMPUTER SCIENCE MONTHLY TEST  PAPER
CLASS XII COMPUTER SCIENCE MONTHLY TEST PAPER
 
Your first TensorFlow programming with Jupyter
Your first TensorFlow programming with JupyterYour first TensorFlow programming with Jupyter
Your first TensorFlow programming with Jupyter
 
GIS Work Example Portfolio
GIS Work Example PortfolioGIS Work Example Portfolio
GIS Work Example Portfolio
 
Lecture 3.2 bt
Lecture 3.2 btLecture 3.2 bt
Lecture 3.2 bt
 
Assignment premier academic writing agency with industry
Assignment premier academic writing agency with industry Assignment premier academic writing agency with industry
Assignment premier academic writing agency with industry
 
R class 5 -data visualization
R class 5 -data visualizationR class 5 -data visualization
R class 5 -data visualization
 
2 transformation computer graphics
2 transformation computer graphics2 transformation computer graphics
2 transformation computer graphics
 
Data visualization
Data visualizationData visualization
Data visualization
 
System approach in civil engg slideshare.vvs
System approach in civil engg slideshare.vvsSystem approach in civil engg slideshare.vvs
System approach in civil engg slideshare.vvs
 
Functional Programming for Fun and Profit
Functional Programming for Fun and ProfitFunctional Programming for Fun and Profit
Functional Programming for Fun and Profit
 
Enhancing Partition Crossover with Articulation Points Analysis
Enhancing Partition Crossover with Articulation Points AnalysisEnhancing Partition Crossover with Articulation Points Analysis
Enhancing Partition Crossover with Articulation Points Analysis
 
Graphing Functions and Their Transformations
Graphing Functions and Their TransformationsGraphing Functions and Their Transformations
Graphing Functions and Their Transformations
 
Non Deterministic and Deterministic Problems
Non Deterministic and Deterministic Problems Non Deterministic and Deterministic Problems
Non Deterministic and Deterministic Problems
 
R programmingmilano
R programmingmilanoR programmingmilano
R programmingmilano
 
Machine Learning lecture5(octave)
Machine Learning lecture5(octave)Machine Learning lecture5(octave)
Machine Learning lecture5(octave)
 

Destacado

Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis PatternsMikio L. Braun
 
Hardcore Data Science - in Practice
Hardcore Data Science - in PracticeHardcore Data Science - in Practice
Hardcore Data Science - in PracticeMikio L. Braun
 
Individual and societal risk
Individual and societal riskIndividual and societal risk
Individual and societal riskSruthi Madhu
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarCloudera, Inc.
 
Presentación de Moodle
Presentación de MoodlePresentación de Moodle
Presentación de Moodlecruizgaray
 
El cambio
El cambioEl cambio
El cambiomemoop
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebookparallellabs
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningSwiss Big Data User Group
 
Building Distributed Data Streaming System
Building Distributed Data Streaming SystemBuilding Distributed Data Streaming System
Building Distributed Data Streaming SystemAshish Tadose
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopCloudera, Inc.
 
Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)Cloudera, Inc.
 
Hpca2012 facebook keynote
Hpca2012 facebook keynoteHpca2012 facebook keynote
Hpca2012 facebook keynoteparallellabs
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingApache Apex
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataApache Apex
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentApache Apex
 

Destacado (20)

Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
 
Hardcore Data Science - in Practice
Hardcore Data Science - in PracticeHardcore Data Science - in Practice
Hardcore Data Science - in Practice
 
Individual and societal risk
Individual and societal riskIndividual and societal risk
Individual and societal risk
 
REDES NEURONALES
REDES NEURONALESREDES NEURONALES
REDES NEURONALES
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
 
Presentación de Moodle
Presentación de MoodlePresentación de Moodle
Presentación de Moodle
 
El cambio
El cambioEl cambio
El cambio
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
 
The influence-of-prayer-coping-on-patients
The influence-of-prayer-coping-on-patientsThe influence-of-prayer-coping-on-patients
The influence-of-prayer-coping-on-patients
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Building Distributed Data Streaming System
Building Distributed Data Streaming SystemBuilding Distributed Data Streaming System
Building Distributed Data Streaming System
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)
 
Hpca2012 facebook keynote
Hpca2012 facebook keynoteHpca2012 facebook keynote
Hpca2012 facebook keynote
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
 

Similar a Data flow vs. procedural programming: How to put your algorithms into Flink

Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Flink Forward
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
 
Pytorch for tf_developers
Pytorch for tf_developersPytorch for tf_developers
Pytorch for tf_developersAbdul Muneer
 
Stratosphere Intro (Java and Scala Interface)
Stratosphere Intro (Java and Scala Interface)Stratosphere Intro (Java and Scala Interface)
Stratosphere Intro (Java and Scala Interface)Robert Metzger
 
Matlab for beginners, Introduction, signal processing
Matlab for beginners, Introduction, signal processingMatlab for beginners, Introduction, signal processing
Matlab for beginners, Introduction, signal processingDr. Manjunatha. P
 
Neural networks with python
Neural networks with pythonNeural networks with python
Neural networks with pythonSimone Piunno
 
Class 26: Objectifying Objects
Class 26: Objectifying ObjectsClass 26: Objectifying Objects
Class 26: Objectifying ObjectsDavid Evans
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowEtsuji Nakai
 
Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Omkar Rane
 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Yueshen Xu
 
Introduction to Matlab
Introduction to MatlabIntroduction to Matlab
Introduction to MatlabAmr Rashed
 
Deep Learning, Scala, and Spark
Deep Learning, Scala, and SparkDeep Learning, Scala, and Spark
Deep Learning, Scala, and SparkOswald Campesato
 
20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdfMariaKhan905189
 
Matlab solved problems
Matlab solved problemsMatlab solved problems
Matlab solved problemsMake Mannan
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELJoel Falcou
 
Aggregation computation over distributed data streams
Aggregation computation over distributed data streamsAggregation computation over distributed data streams
Aggregation computation over distributed data streamsYueshen Xu
 

Similar a Data flow vs. procedural programming: How to put your algorithms into Flink (20)

Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming
 
slides.07.pptx
slides.07.pptxslides.07.pptx
slides.07.pptx
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
Pytorch for tf_developers
Pytorch for tf_developersPytorch for tf_developers
Pytorch for tf_developers
 
ML .pptx
ML .pptxML .pptx
ML .pptx
 
Stratosphere Intro (Java and Scala Interface)
Stratosphere Intro (Java and Scala Interface)Stratosphere Intro (Java and Scala Interface)
Stratosphere Intro (Java and Scala Interface)
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Matlab for beginners, Introduction, signal processing
Matlab for beginners, Introduction, signal processingMatlab for beginners, Introduction, signal processing
Matlab for beginners, Introduction, signal processing
 
Neural networks with python
Neural networks with pythonNeural networks with python
Neural networks with python
 
Class 26: Objectifying Objects
Class 26: Objectifying ObjectsClass 26: Objectifying Objects
Class 26: Objectifying Objects
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlow
 
Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Linear Regression (Machine Learning)
Linear Regression (Machine Learning)
 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)
 
Introduction to Matlab
Introduction to MatlabIntroduction to Matlab
Introduction to Matlab
 
Deep Learning, Scala, and Spark
Deep Learning, Scala, and SparkDeep Learning, Scala, and Spark
Deep Learning, Scala, and Spark
 
20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf
 
Matlab solved problems
Matlab solved problemsMatlab solved problems
Matlab solved problems
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 
Aggregation computation over distributed data streams
Aggregation computation over distributed data streamsAggregation computation over distributed data streams
Aggregation computation over distributed data streams
 
Matlab1
Matlab1Matlab1
Matlab1
 

Más de Mikio L. Braun

Bringing ML To Production, What Is Missing? AMLD 2020
Bringing ML To Production, What Is Missing? AMLD 2020Bringing ML To Production, What Is Missing? AMLD 2020
Bringing ML To Production, What Is Missing? AMLD 2020Mikio L. Braun
 
Academia to industry looking back on a decade of ml
Academia to industry looking back on a decade of mlAcademia to industry looking back on a decade of ml
Academia to industry looking back on a decade of mlMikio L. Braun
 
Architecting AI Applications
Architecting AI ApplicationsArchitecting AI Applications
Architecting AI ApplicationsMikio L. Braun
 
Machine Learning for Time Series, Strata London 2018
Machine Learning for Time Series, Strata London 2018Machine Learning for Time Series, Strata London 2018
Machine Learning for Time Series, Strata London 2018Mikio L. Braun
 
Scalable Machine Learning
Scalable Machine LearningScalable Machine Learning
Scalable Machine LearningMikio L. Braun
 
Cassandra - An Introduction
Cassandra - An IntroductionCassandra - An Introduction
Cassandra - An IntroductionMikio L. Braun
 
Cassandra - Eine Einführung
Cassandra - Eine EinführungCassandra - Eine Einführung
Cassandra - Eine EinführungMikio L. Braun
 

Más de Mikio L. Braun (7)

Bringing ML To Production, What Is Missing? AMLD 2020
Bringing ML To Production, What Is Missing? AMLD 2020Bringing ML To Production, What Is Missing? AMLD 2020
Bringing ML To Production, What Is Missing? AMLD 2020
 
Academia to industry looking back on a decade of ml
Academia to industry looking back on a decade of mlAcademia to industry looking back on a decade of ml
Academia to industry looking back on a decade of ml
 
Architecting AI Applications
Architecting AI ApplicationsArchitecting AI Applications
Architecting AI Applications
 
Machine Learning for Time Series, Strata London 2018
Machine Learning for Time Series, Strata London 2018Machine Learning for Time Series, Strata London 2018
Machine Learning for Time Series, Strata London 2018
 
Scalable Machine Learning
Scalable Machine LearningScalable Machine Learning
Scalable Machine Learning
 
Cassandra - An Introduction
Cassandra - An IntroductionCassandra - An Introduction
Cassandra - An Introduction
 
Cassandra - Eine Einführung
Cassandra - Eine EinführungCassandra - Eine Einführung
Cassandra - Eine Einführung
 

Último

Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 

Último (20)

Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 

Data flow vs. procedural programming: How to put your algorithms into Flink

  • 1. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 1 Flink Meetup #8 Data flow vs. procedural programming: How to put your algorithms into Flink June 23, 2015 Mikio L. Braun @mikiobraun
  • 2. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 2 Programming how we're used to ● Computing a sum ● Tools at our disposal: – variables – control flow (loops, if) – function calls as basic piece of abstraction def computeSum(a): sum = 0 for i in range(len(a)) sum += a[i] return sum
  • 3. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 3 Data Analysis Algorithms Let's consider centering becomes or even just def centerPoints(xs): sum = xs[0].copy() for i in range(1, len(xs)): sum += xs[i] mean = sum / len(xs) for i in range(len(xs)): xs[i] -= mean return xs xs - xs.mean(axis=0)
  • 4. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 4 Don't use for-loops ● Put your data into a matrix ● Don't use for loops
  • 5. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 5 Least Squares Regression ● Compute ● Becomes What you learn is thinking in matrices, breaking down computations in terms of matrix algebra def lsr(X, y, lam): d = X.shape[1] C = X.T.dot(X) + lam * pl.eye(d) w = np.linalg.solve(C, X.T.dot(y)) return w
  • 6. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 6 Basic tools Advantage – very familiar – close to math Disadvantage – hard to scale ● Basic procedural programming paradigm ● Variables ● Ordered arrays and efficient functions on those
  • 7. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 7 Parallel Data Flow Often you have stuff like Which is inherently easy to scale for i in someSet: map x[i] to y[i]
  • 8. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 8 New Paradigm ● Basic building block is an (unordered) set. ● Basic operations inherently parallel
  • 9. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 9 Computing, Data Flow Style Computing a sum Computing a mean sum(x) = xs.reduce((x,y) => x + y) mean(x) = xs.map(x => (x,1)) .reduce((xc, yc) => (xc._1 + yc._1, xc._2 + yc._2)) .map(xc => xc._1 / xc._2)
  • 10. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 10 Apache Flink ● Data Flow system ● Basic building block is a DataSet[X] ● For execution, sets up all computing nodes, streams through data
  • 11. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 11 Apache Flink: Getting Started ● Use Scala API ● Minimal project with Maven (build tool) or Gradle ● Use an IDE like IntelliJ ● Always import org.apache.flink.api.scala._
  • 12. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 12 Centering (First Try) def computeMeans(xs: DataSet[DenseVector]) = xs.map(x => (x,1)) .reduce((xc, yc) => (xc._1 + yc._1, xc._2 + yc._2)) .map(xc => xc._1 / xc._2) def centerPoints(xs: DataSet[DenseVector]) = { val mean = computeMean(xs) xs.map(x => x – mean) } You cannot nest DataSet operations!
  • 13. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 13 Sorry, restrictions apply. ● Variables hold (lazy) computations ● You can't work with sets within the operations ● Even if result is just a single element, it's a DataSet[Elem]. ● So what to do? – cross joins – broadcast variables
  • 14. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 14 Centering (Second Try) Works, but seems excessive because the mean is copied to each data element. def computeMeans(xs: DataSet[DenseVector]) = xs.map(x => (x,1)) .reduce((xc, yc) => (xc._1 + yc._1, xc._2 + yc._2)) .map(xc => xc._1 / xc._2) def centerPoints(xs: DataSet[DenseVector]) = { val mean = computeMean(xs) xs.crossWithTiny(mean).map(xm => xm._1 – xm._2) }
  • 15. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 15 Broadcast Variables ● Side information sent to all worker nodes ● Can be a DataSet ● Gets accessed as a Java collection
  • 16. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 16 class BroadcastSingleElementMapper[T, B, O](fun: (T, B) => O) extends RichMapFunction[T, O] { var broadcastVariable: B = _ @throws(classOf[Exception]) override def open(configuration: Configuration): Unit = { broadcastVariable = getRuntimeContext .getBroadcastVariable[B]("broadcastVariable") .get(0) } override def map(value: T): O = { fun(value, broadcastVariable) } } Broadcast Variables
  • 17. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 17 Centering (Third Try) def computeMeans(xs: DataSet[DenseVector]) = xs.map(x => (x,1)) .reduce((xc, yc) => (xc._1 + yc._1, xc._2 + yc._2)) .map(xc => xc._1 / xc._2) def centerPoints(xs: DataSet[DenseVector]) = { val mean = computeMean(xs) xs.mapWithBcVar(mean).map((x, m) => x – m) }
  • 18. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 18 Intermediate Results pattern val x = someDataSetComputation() val y = someOtherDataSetComputation() val z = dataSet.mapWithBcVar(x)((d, x) => …) val result = anotherDataSet.mapWithBcVar((y,z)) { (d, yz) => val (y,z) = yz … } x = someComputation() y = someOtherComputation() z = someComputationOn(dataSet, x) result = moreComputationOn(y, z)
  • 19. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 19 Matrix Algebra ● No ordered sets per se in Data Flow context.
  • 20. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 20 Vector operations by explicit joins ● Encode vector (a1, a2, …, an) with {(1, a1), (2, a2), … (n, an)} ● Addition: – a.join(b).where(0).equalTo(0) .map((ab) => (ab._1._1, ab._1._2 + ab._2._2)) after join: {((1, a1), (1, b1)), ((2, a1), (2, b1)), … }
  • 21. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 21 Back to Least Squares Regression Two operations: computing X'X and X'Y def lsr(xys: DataSet[(DenseVector, Double)]) = { val XTX = xs.map(x => x.outer(x)).reduce(_ + _) val XTY = xys.map(xy => xy._1 * xy._2).reduce(_ + _) C = XTX.mapWithBcVar(XTY) { vars => val XTX = vars._1 val XTY = var.s_2 val weight = XTX XTY } }
  • 22. June 23, 2015Mikio L. Braun, Data Flow vs. Procedural Programming, Berlin Flink Meetup 22 Summary and Outlook ● Procedural vs. Data Flow – basic building blocks elementwise operations on unordered sets – can't be nested – combine intermediate results via broadcast vars ● Iterations ● Beware of TypeInformation implicits.