SlideShare una empresa de Scribd logo
1 de 58
Descargar para leer sin conexión
Deep Learning
with Spark
Anastasia Lieva
Fuzzy Humanist, Data-Scientist
@lievAnastazia
Spark is a new Hero
Deep Learning is a new Hero
BigDL is a new epic story
BigDL
High-level deep learning library
BigDL
High-level deep learning library
BigDL
High-level deep learning library
Intel MKL
Scale-out w/ Spark
BigDL
Intel MKL
BigDL : Deep Learning on Spark
BigDL : Deep Learning on Spark
API:
Scala and Python
API:
Scala and Python
BUT
API:
Scala and Python
BUT
the disadvantage of all Python APIs
is
API:
Scala and Python
BUT
the disadvantage of all Python APIs
is
that they are written in Python
API:
Scala a̶̶̶n̶̶̶d̶̶̶ ̶̶̶P̶̶̶y̶̶̶t̶̶̶h̶̶̶o̶̶̶n̶̶̶
val conf = Engine.createSparkConf()
.setAppName("DeepLearningOnSpark")
.setMaster("local[3]")
val sparkSession = SparkSession.builder()
.config(conf).getOrCreate()
val sqlContext = sparkSession.sqlContext
val sparkContext = sparkSession.sparkContext
Engine.init
The same configs as Spark
val conf = Engine.createSparkConf()
.setAppName("DeepLearningOnSpark")
.setMaster("local[3]")
val sparkSession = SparkSession.builder()
.config(conf).getOrCreate()
val sqlContext = sparkSession.sqlContext
val sparkContext = sparkSession.sparkContext
Engine.init
The same configs as Spark
L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Model Architecture
Tensor
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/linear_algebra.html
DATA
Tensor
Sparse TensorTable
Sample
DATA
Tensor
Sparse TensorTable
Sample
Lua / Torch Tables
(Tensor of Features, Tensor of Targets)
Tensor(indices, values, shape)
DATA
Tensor
Sparse TensorTable
Sample
Mini-batch
Batch of Samples
DATA
DATA
Tensor
Sparse TensorTable
Sample
Mini-batch DataSet
For advanced applications only
L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Model Architecture
More than 100 layers !
Embedding
Pooling
Convolution
Normalization
Reccurent
DropOut
Sparse
… and others
Layers
L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Expected
Learning by Backpropagation
L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Prediction
Learning by Backpropagation
L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Prediction
Ground truth
Error
Learning by Backpropagation
L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Prediction
Ground truth
Error
Update weights in every layer w/ an optimization algorithm
Learning by Backpropagation
L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Prediction
Ground truth
Error
Update weights in every layer w/ an optimization algorithm
Retry prediction with updated weights
Learning by Backpropagation
Losses
More than 30 criterions :
mean squared error,
binary cross entropy,
negative log likelihood criterion,
KL-divergence of the Gaussian distribution...
Losses
More than 30 criterions :
mean squared error,
binary cross entropy,
negative log likelihood criterion,
KL-divergence of the Gaussian distribution...
Optimization algorithms
Most popular gradient descent algorithms :
SGD, Adam, Adagrad, Adadelta, AdaMax
Let’s predict something!
Let’s predict something!
X X
Let’s predict something!
X X
Good BadMore Or Less
RegexTokenizer()
Word2Vec()
SpakMLlib
Preprocess unstructured data
RegexTokenizer()
Word2Vec()
Tensor[Vector]
Sample(featureTensor, label)
SpakMLlib
BigDL
Preprocess unstructured data
http://intellabs.github.io/RiverTrail/tutorial/
Convolutional Neural Network
Convolutional Neural Network
Convolutional Neural Network
Bonjour, on recrute à Montpellier (#systeme, reseau , #Devops, #Linux ).
n'hésitez pas à postuler et à diffuser, Merci beaucoup .
PS nous ne sommes pas une SSII
Convolutional Neural Network
Bonjour, on recrute à Montpellier (#systeme, reseau , #Devops, #Linux ).
n'hésitez pas à postuler et à diffuser, Merci beaucoup .
PS nous ne sommes pas une SSII
Montpellier
#systeme, reseau , #Devops, #Linux
pas une SSII
Convolutional Neural Network
Bonjour, on recrute à Montpellier (#systeme, reseau , #Devops, #Linux ).
n'hésitez pas à postuler et à diffuser, Merci beaucoup .
PS nous ne sommes pas une SSII
Montpellier
#systeme, reseau , #Devops, #Linux
pas une SSII
$$$$$ ?
Convolutional Neural Network
Bonjour, on recrute à Montpellier (#systeme, reseau , #Devops, #Linux ).
n'hésitez pas à postuler et à diffuser, Merci beaucoup .
PS nous ne sommes pas une SSII
Montpellier
#systeme, reseau , #Devops, #Linux
pas une SSII
$$$$$ ?
Bad
T
E
M
P
O
R
A
L
Conv
R
E
L
U
T
E
M
P
O
R
A
L
MaxP
ool
L
I
N
E
A
R
D
R
O
P
O
U
T
R
E
L
U
L
I
N
E
A
R
L
O
G
S
O
F
T
M
A
X
Model Architecture
val model = Sequential[Double]()
.add(TemporalConvolution(inputSize, outputSizeTempConv, kernelSize))
.add(ReLU())
.add(TemporalMaxPooling(outputSizeMaxPooling)
.add(Linear(inputSizeLinearLayer, outputSizeLinearLayer))
.add(Dropout(0.1))
.add(ReLU())
.add(Linear(inputSizeLinearLayer2, outputSizeLinearLayer2))
.add(LogSoftMax())
Model Architecture
In BigDL
val criterion = new ClassNLLCriterion[Double]
val optimizer = Optimizer(model, trainData, criterion, batchSize)
optimizer
.setOptimMethod(
new Adagrad(learningRate, learningRateDecay))
.optimize()
Training model
In BigDL
val criterion = new ClassNLLCriterion[Double]
val optimizer = Optimizer(model, trainData, criterion, batchSize)
optimizer
.setOptimMethod(
new Adagrad(learningRate, learningRateDecay))
.optimize()
Training model
In BigDL
val optimizer = Optimizer.apply(model, trainData, criterion, 6)
val logdir = "mylogdir"
val appName = "job-offers-filter"
val trainSummary = TrainSummary(logdir, appName)
val validationSummary = ValidationSummary(logdir, appName)
optimizer.setTrainSummary(trainSummary)
optimizer.setValidationSummary(validationSummary)
optimizer
.setOptimMethod(
new Adagrad(learningRate = 0.01, learningRateDecay = 0.0002))
.optimize()
Config for tensorboard
BigDL & Tensorboard
BigDL & Tensorboard
Spark Pipelines Integration
Preprocess unstructured data
RegexTokenizer()
Word2Vec()
Dataframe
.select(“features”, “label”)
SpakMLlib
val model = Sequential[Double]()
.add(TemporalConvolution(100, 20, 5))
.add(ReLU())
.add(TemporalMaxPooling(96))
.add(Linear(20, 100))
.add(Dropout(0.1))
.add(ReLU())
.add(Linear(100, 3))
.add(LogSoftMax())
val criterion = new ClassNLLCriterion[Double]
Spark Integration
Spark Integration
val estimator = new DLEstimator(model, criterion, featureSize, labelSize)
.setLearningRate(0.01)
.setBatchSize(6)
val trainedModel = estimator.fit(trainDataframe)
val predictions = trainedModel.transform(testDataframe)
Spark Integration
val estimator = new DLEstimator(model, criterion, featureSize, labelSize)
.setLearningRate(0.01)
.setBatchSize(6)
val trainedModel = estimator.fit(trainDataframe)
val predictions = trainedModel.transform(testDataframe)
Estimator
Transformer
Interoperability
Your
model
BigDL Torch
Tensor
flowCaffe
Keras
Post more job offers on comm-montpellier.slack !
https://bit.ly/comm-mtp
offres qualifiées correctement tant sur le domaine,
les technos que la fourchette salariale. Ou à minima
avec un pitch marrant ;)

Más contenido relacionado

Similar a Deep Learning with Spark

SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
Keiichiro Ono
 
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
Christian Schneider
 

Similar a Deep Learning with Spark (20)

Power of linked list
Power of linked listPower of linked list
Power of linked list
 
DWX 2013 Nuremberg
DWX 2013 NurembergDWX 2013 Nuremberg
DWX 2013 Nuremberg
 
Java Hurdling: Obstacles and Techniques in Java Client Penetration-Testing
Java Hurdling: Obstacles and Techniques in Java Client Penetration-TestingJava Hurdling: Obstacles and Techniques in Java Client Penetration-Testing
Java Hurdling: Obstacles and Techniques in Java Client Penetration-Testing
 
2 Years of Real World FP at REA
2 Years of Real World FP at REA2 Years of Real World FP at REA
2 Years of Real World FP at REA
 
Learn you some Ansible for great good!
Learn you some Ansible for great good!Learn you some Ansible for great good!
Learn you some Ansible for great good!
 
Who pulls the strings?
Who pulls the strings?Who pulls the strings?
Who pulls the strings?
 
[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike Back[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike Back
 
Games for the Masses (Jax)
Games for the Masses (Jax)Games for the Masses (Jax)
Games for the Masses (Jax)
 
Reuse, Reduce, Recycle in Serverless World
Reuse, Reduce, Recycle in Serverless WorldReuse, Reduce, Recycle in Serverless World
Reuse, Reduce, Recycle in Serverless World
 
Istanbul Spark Meetup Nov 28 2015
Istanbul Spark Meetup Nov 28 2015Istanbul Spark Meetup Nov 28 2015
Istanbul Spark Meetup Nov 28 2015
 
Ransack, an Application Built on Ansible's API for Rackspace -- AnsibleFest N...
Ransack, an Application Built on Ansible's API for Rackspace -- AnsibleFest N...Ransack, an Application Built on Ansible's API for Rackspace -- AnsibleFest N...
Ransack, an Application Built on Ansible's API for Rackspace -- AnsibleFest N...
 
Ep keyote slides
Ep  keyote slidesEp  keyote slides
Ep keyote slides
 
Ep keyote slides
Ep  keyote slidesEp  keyote slides
Ep keyote slides
 
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
 
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
 
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
 
Puppet for dummies - ZendCon 2011 Edition
Puppet for dummies - ZendCon 2011 EditionPuppet for dummies - ZendCon 2011 Edition
Puppet for dummies - ZendCon 2011 Edition
 
Mind Control to Major Tom: Is It Time to Put Your EEG Headset On?
Mind Control to Major Tom: Is It Time to Put Your EEG Headset On? Mind Control to Major Tom: Is It Time to Put Your EEG Headset On?
Mind Control to Major Tom: Is It Time to Put Your EEG Headset On?
 
Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015
 
The State of Wicket
The State of WicketThe State of Wicket
The State of Wicket
 

Más de Anastasia Bobyreva

Big Data Science in Scala ( Joker 2017, slides in Russian)
Big Data Science in Scala ( Joker 2017, slides in Russian)Big Data Science in Scala ( Joker 2017, slides in Russian)
Big Data Science in Scala ( Joker 2017, slides in Russian)
Anastasia Bobyreva
 

Más de Anastasia Bobyreva (10)

Extreme data Science (English version)
Extreme data Science (English version)Extreme data Science (English version)
Extreme data Science (English version)
 
Extreme Data Science
Extreme Data ScienceExtreme Data Science
Extreme Data Science
 
Make Data Science Great Again. Pourquoi et comment crafter la Data Science su...
Make Data Science Great Again. Pourquoi et comment crafter la Data Science su...Make Data Science Great Again. Pourquoi et comment crafter la Data Science su...
Make Data Science Great Again. Pourquoi et comment crafter la Data Science su...
 
NUPIC : new concept of AI
NUPIC : new concept of AINUPIC : new concept of AI
NUPIC : new concept of AI
 
LearnLink project for Startup Week-End Montpellier
LearnLink project for Startup Week-End MontpellierLearnLink project for Startup Week-End Montpellier
LearnLink project for Startup Week-End Montpellier
 
Google voice transcriptions demystified: Introduction to recurrent neural ne...
 Google voice transcriptions demystified: Introduction to recurrent neural ne... Google voice transcriptions demystified: Introduction to recurrent neural ne...
Google voice transcriptions demystified: Introduction to recurrent neural ne...
 
Big Data Science in Scala ( Joker 2017, slides in Russian)
Big Data Science in Scala ( Joker 2017, slides in Russian)Big Data Science in Scala ( Joker 2017, slides in Russian)
Big Data Science in Scala ( Joker 2017, slides in Russian)
 
Big Data Science in Scala V2
Big Data Science in Scala V2 Big Data Science in Scala V2
Big Data Science in Scala V2
 
Which library should you choose for data-science? That's the question!
Which library should you choose for data-science? That's the question!Which library should you choose for data-science? That's the question!
Which library should you choose for data-science? That's the question!
 
Big Data Science in Scala
Big Data Science in ScalaBig Data Science in Scala
Big Data Science in Scala
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Deep Learning with Spark