SlideShare una empresa de Scribd logo
1 de 41
Online learning, Vowpal Wabbit and Hadoop
Héloïse Nonne
Data Science Consulting
hnonne@quantmetry.com
June 2nd, 2015
Quantmetry
2
Episode IV: A New Hope
1936: Turing machine: Online processing
The first computers
1956
IBM 350 RAMAC
Capacity: 3.75 MB
Artificial intelligence: almost done!
6
1943 – 1960’s: Multiple layers neural networks
7
• Not enough computing power
• Not enough storage capacity
• Not enough data
• Algorithms are not efficient enough
Episode V: Reality Strikes Back
Episode VI: Return of the Regression
Learn a weight vector 𝑤 ∈ 𝑅 𝑀
such that :
𝑦 𝑤 𝑥 =
𝑖
𝑤𝑖 𝑥𝑖 ∼ 𝑦
N samples with m features: 𝑥p
∈ 𝑅 𝑀
Result to predict: 𝑦 𝑝
∈ 𝑅
N samples with m features: 𝑥p
∈ 𝑅 𝑀
Class to predict: 𝑦 𝑝
= 0,1/ blue,red
Learn a weight vector 𝑤 ∈ 𝑅 𝑀
such that :
𝑦 𝑤 𝑥 =
1
1+exp(− 𝑖 𝑤 𝑖 𝑥 𝑖)
is close to 𝑦
L𝐨𝐬𝐬 𝐟𝐮𝐧𝐜𝐭𝐢𝐨𝐧 Model: 𝑓𝑤(𝑥) Meaning of 𝑦 𝑤(𝑥)
Linear regression 1
2
𝑦 𝑤 − 𝑦 2
𝑖
𝑤𝑖 𝑥𝑖
Conditional
expectation
𝐸(𝑦|𝑥)
Logistic regression log(𝑦 𝑤) 1
1 + exp(− 𝑖 𝑤𝑖 𝑥𝑖)
Probability
𝑃(𝑦|𝑥)
Hinge regression max(0,1 − 𝑦𝑦 𝑤) sign 𝑥 Approximation
-1 or 1
Loss functions
Accounts for your model error
Choose a loss function according to your usecase
Many iterations
Each on the entire dataset
Batch learning algorithm
N samples
(𝑥 𝑝
, 𝑦 𝑝
)
Minimize the global loss to find the
best parameters
𝑤𝑡+1 = 𝑤𝑡 −
𝜂
𝑁
𝑝
𝛻𝑤 𝐿(𝑦 𝑝, 𝑓𝑤 𝑥 𝑝 )
Predictive model with
parameters 𝑤
Prediction on (𝑥, 𝑦)
𝑦 𝑤(𝑥) = 𝑓𝑤(𝑥)
Compute the loss
Accumulate for all samples
→ global loss
𝐸 𝑤 =
1
𝑁
𝑝
𝐿(𝑦 𝑝
, 𝑓𝑤 𝑥 𝑝
)
Complexity:
𝜗 𝑁 for each iteration
Many iterations
Each on the entire dataset
Direction is orthogonal to the isocontours
Global loss function in weights space
Lines = isocontours
Batch learning algorithm
Minimize the global loss to find the
best parameters
𝑤𝑡+1 = 𝑤𝑡 −
𝜂
𝑁
𝑝
𝛻𝑤 𝐿(𝑦 𝑝, 𝑓𝑤 𝑥 𝑝 )
Complexity:
𝜗 𝑁 for each iteration
Source: http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
Episode I: The Big Data Menace
What if
• data does not fit in memory?
• we want to combine features together (polynomials, n-grams)?
→ dataset size inflation
• new samples come with new features?
• the phenomenon we try to model drift with time?
Online learning algorithm
Update the parameters to minimize
individual loss
𝑤𝑡+1 = 𝑤𝑡 − 𝜂𝛻𝑤 𝐿(𝑦, 𝑓𝑤 𝑥 )
Predictive model with
parameters 𝑤
Prediction on (𝑥, 𝑦)
𝑦 𝑤(𝑥) = 𝑓𝑤(𝑥)
Compute the individual loss
Complexity:
𝜗 1 for each iteration
N samples
(𝑥 𝑝
, 𝑦 𝑝
)
Many iterations
One sample at a time
Online learning algorithm
Update the parameters to minimize
individual loss
𝑤𝑡+1 = 𝑤𝑡 − 𝜂𝛻𝑤 𝐿(𝑦, 𝑓𝑤 𝑥 )
Complexity:
𝜗 1 for each iteration
Many iterations
One sample at a time
Direction is not perpendicular,
but is updated much more often
Having more updates allows to stabilize
and approach the minimum very quickly
Online learning algorithm
Update the parameters to minimize
individual loss
𝑤𝑡+1 = 𝑤𝑡 − 𝜂𝛻𝑤 𝐿(𝑦, 𝑓𝑤 𝑥 )
Complexity:
𝜗 1 for each iteration
Many iterations
One sample at a time
Direction is not perpendicular,
but is updated much more often
Having more updates allows to stabilize
and approach the minimum very quickly
Problem: a lot of noise
The time required for convergence
16
Optimization accuracy against training time for online (SGD) and batch (TRON)
Online learning
requires a lot
less time to
approximately
converge
Once close to the
minimum, batch is
much faster because
it is noiseless
Bottou, Stochastic gradient descent tricks, Neural Networks: Tricks of the Trade (2012).
WARNING
Once batch becomes better,
the validation error has
already converged anyway.
An implementation of online learning: Vowpal Wabbit
• Originally developed at Yahoo!, currently at Microsoft
• Led by John Langford
• C++
• efficient scalable implementation of online learning
• First public version 2007
• 2015: 4400 commits, 81 contributors, 18 releases
Vowpal Wabbit’s features
Nice features of VW
• MANY algorithms are implemented
• Optimization algorithms (BFGS, Conjugate gradient, etc.)
• Combinations of features, N-grams (NLP)
• Automatic tuning (learning rate, adaptive learning, on the fly normalization features)
• And more (boostraping, multi-core CPUs, etc.)
Vowpal Wabbit Input format
Input agnostic
• Binary
• Numerical
• Categorical (hashing trick)
• Can deal with missing values/sparse-features
Very little data preparation
1 1.0 |height:1.5 length:2.0 |has stripes
1 1.0 |length:3.0 |has four legs
-1 1.0 |height:0.3 |has wings
-1 1.0 |height: 0.9 length: 0.6 |has a shell and a nice color
Vowpal Wabbit
• Fast learning for scoring on large datasets
• Can handle quite raw (unprepared data)
• Great for exploring a new dataset with simple and fast models
• Uncover phenomena
• Figure out what your should do for feature engineering
• Very good tool for experimentations
Yes but … I want to work in parallel
Episode II: Attack of the Clones
Why parallelizing?
• Speed-up
• Data does not fit on a single machine
(Subsampling is not always good if you have many features)
• Take advantage of distributed storage and avoid the bottleneck of data transfer
• Take advantage of distributed memory to explore
combination (multiplication to billions
of distinct features)
The solution: online + batch learning
1. Each node 𝑘 makes an online pass over its data (adaptive gradient update rule)
2. Average the weights
𝑤 = (
𝑘=1
𝐾
𝐺 𝑘)
−1
(
𝑘=1
𝐾
𝐺 𝑘 𝑤 𝑘)
3. 𝑤 is broadcasted down to all nodes and continue learning.
4. Iterate then with batch learning to make the last few steps to the minimum.
How is it implemented in Vowpal Wabbit project?
Needs MPI (Message Passing Interface) MapReduce
Effective communication
infrastructure
Y Allreduce is simple
N Data transfers across network
N Large overhead (job scheduling, some
data transfer, data parsing)
Data-centric platform
(avoid data transfer)
N Lack of internal knowledge of data
location
Y Full knowledge of data location
Fault tolerant system N Little fault tolerant by default Y Robust and fault tolerant
Easy to code / good
programming language
Y Standard and portable (Fortran,
C/C++, Java)
Y Automatic cleanup of temp files by
default
N Rethink and rewrite learning code into
map and reduce operations
N Java eats up a lot of RAM
Good optimization
approach
Y No need to rewrite learning code N Map/reduce operations does not easily
allow iterative algorithms
Overall time must be
minimal
N Data transfers across network N Increased number of read/write
operations
N Increased time while waiting for free
nodes
Implementation by VW: AllReduce + MapReduce
All Reduce
1
2
9 5
8
8 1
Allreduce is based on a communication structure in trees
(binary is easier to implement)
Every node starts with a number
1. Reduce = sum up the tree
2. Broadcast down up the tree
Every node ends up with the sum of the
numbers across all the nodes
All Reduce
Every node starts with a number
1. Reduce = sum up the tree
2. Broadcast down up the tree
Every node ends up with the sum of the
numbers across all the nodes
1
16
9 5
17
8 1
Allreduce is based on a communication structure in trees
(binary is easier to implement)
All Reduce
Every node starts with a number
1. Reduce = sum up the tree
2. Broadcast down the tree
Every node ends up with the sum of the
numbers across all the nodes
34
16
9 5
17
8 1
Allreduce is based on a communication structure in trees
(binary is easier to implement)
All Reduce
Every node starts with a number
1. Reduce = sum up the tree
2. Broadcast down the tree
Every node ends up with the sum of the
numbers across all the nodes
34
34
9 5
34
8 1
Allreduce is based on a communication structure in trees
(binary is easier to implement)
All Reduce
Every node starts with a number
1. Reduce = sum up the tree
2. Broadcast down up the tree
Every node ends up with the sum of the
numbers across all the nodes
34
34
34 34
34
34 34
Allreduce is based on a communication structure in trees
(binary is easier to implement)
MapReduce (streaming)/AllReduce-online/batch
1. Start the daemon
(communication system)
2. Each node makes an
online pass over its data
4. Use allreduce to average
the weights over all nodes
3. Initialize a tree on the
masternode
5. Broadcast the averaged
weights down to all nodes
6. Use it to initialize a
batch learning step
7. Send back the weights
and average with allreduce
8. Iterate other batch steps
Advantages of VW implementation
• Minimal additional programming effort
• Data location knowledge: use mapreduce infrastructure with only one mapper
• Vowpal wabbit (C/C++) is not RAM greedy
• Small synchronisation overhead
• time spent in AllReduce operation ≪ computation time
• Reduced stalling time while waiting for other nodes
• delayed initialization of AllReduce’s tree to capitalize on Hadoop speculative execution
• Rapid convergence with online then accuracy with batch
Episode III: Revenge of Hadoop
1. Start the daemon (./spanning_tree.cc)
2. Launch the MapReduce job
3. Kill the spanning tree
hadoop jar /home/hadoop/contrib/streaming/hadoop-streaming.jar 
-D mapreduce.map.speculative=true 
-D mapreduce.job.reduces=0 
-input $in_directory 
-output $out_directory 
-files [“/usr/local/bin/vw,
/usr/lib64/libboost_program_options.so, /lib64/libz.so.1”]
-file runvw.sh 
-mapper runvw.sh 
-reducer NONE
Running vowpal wabbit on AWS
AWS Best practice: Transient clusters
• Get your data on S3 buckets
• Start an EMR (Elastic Map Reduce) cluster
• Bootstrap actions (install, config, etc.)
• Run your job(s) (steps)
• Shut down your cluster
Pros and cons
• Easy setup / works well
• Minimum maintenance
• Low cost
• Logs ???????
• Debugging can be is difficult
→ use an experimental cluster or a VM
Beware of the environment variables
VW needs MapReduce environment variables
• total number of mapper tasks
• number ID of the map task for each node
• ID of the MapReduce job
• private dns of the master node within the cluster
Update the names in VW-cluster code
Hack the environment variables with python
vw --total $nmappers --node $mapper 
--unique_id $mapred_job_id -d /dev/stdin 
--span_server $submit_host 
--loss_function=logistic -f sgd.vwmodel
Number of splits
You need to brute force the number of splits to the map reduce job
• Advice from John Langford / approach in the code
• compute the size of your minimal data size (total / nb of nodes)
• use option –D mapreduce.min.split.size
→ didn’t work
• Dirty workaround
• split the data into as many file as your nodes
• store it in a .gz file
Running vowpal wabbit on AWS
ip-10-38-138-36.eu-west-1.compute.internal
Starting training
SGD ...
creating quadratic features for pairs: ft tt ff fi ti
final_regressor = sgd.vwmodel
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
average since example example current current current
loss last counter weight label predict features
0.693147 0.693147 2 1.0 -1.0000 0.0000 325
0.400206 0.107265 3 2.0 -1.0000 -2.1783 325
[...]
0.414361 0.404726 131073 131072.0 -1.0000 -4.5625 325
0.406345 0.398329 262145 262144.0 1.0000 -1.8379 325
0.388375 0.370405 524289 524288.0 -1.0000 -1.3313 325
Running vowpal wabbit on AWS
0.388375 0.370405 524289 524288.0 -1.0000 -1.3313 325
connecting to 10.38.138.36 = ip-10-38-138-36.eu-west-1.compute.internal:26543
wrote unique_id=2
wrote total=1
wrote node=0
read ok=1
read kid_count=0
read parent_ip=255.255.255.255
read parent_port=65535
Net time taken by process = 8.767000 seconds
finished run
number of examples per pass = 1000001
passes used = 1
weighted example sum = 1000000.000000
weighted label sum = -679560.000000
average loss = 0.380041
total feature number = 325000000
Running vowpal wabbit on AWS
BFGS ...
[...]
num sources = 1
connecting to 10.38.138.36 = ip-10-38-138-36.eu-west-1.compute.internal:26543
wrote unique_id=4
wrote total=1
wrote node=0
[...]
read parent_ip=255.255.255.255
read parent_port=65535
Maximum number of passes reached.
Net time taken by process = 10.55 seconds
finished run
number of examples = 1800002
weighted example sum = 1.8e+06
weighted label sum = -1.22307e+06
average loss = 0.350998 h
total feature number = 585000000
Running vowpal wabbit on AWS
On Hadoop with 5 nodes
6 minutes
On a single machine
26 minutes and 30 seconds
6.4 GB
50 000 000 samples
52 974 510 080 features
Concluding remarks
Less computational time allows to explore more data
• Work on more data
• Include more features to the analysis
• Useful as a platform for research and experimentation
Optimization algorithms
• Lots of interesting papers (Langford, Bottou, Agarwal, LeCun, Duchi, Zinkevich, …)
VW on Hadoop
Learn a lot by doing and debugging :-)
If possible, use online learning when time of computation is the bottleneck
Episode VII: The Force Awakens
Coming soon
• Pushes on github
Benchmarks
• How is the training time affected by the size of the data set (measure overhead)
• Benchmark available approaches on various large datasets and usecases
• Benchmark against Graphlab, MLlib
More VW
• Exhaustive comparison and association with complex models (exploit vw for feature
engineering and feature selection)
• Nonlinear online learning (neural networks, SVM, ...)
A few references
• John Langford – Hunchnet – github
• Bottou, Stochastic gradient descent tricks, Neural
Networks: Tricks of the Trade (2012)
• Yann LeCun’s lectures
• http://quantmetry-blog.com
@HeloiseNonne
Master node
Worker
Worker
Worker

Más contenido relacionado

La actualidad más candente

Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017MLconf
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15MLconf
 
Neural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsNeural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsHéloïse Nonne
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLFlink Forward
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...MLconf
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016MLconf
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15MLconf
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaSpark Summit
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016MLconf
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureMani Goswami
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016MLconf
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...MLconf
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...MLconf
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkDatabricks
 
GDG-Shanghai 2017 TensorFlow Summit Recap
GDG-Shanghai 2017 TensorFlow Summit RecapGDG-Shanghai 2017 TensorFlow Summit Recap
GDG-Shanghai 2017 TensorFlow Summit RecapJiang Jun
 

La actualidad más candente (20)

Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
 
Neural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsNeural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for Physicists
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
Tensor flow
Tensor flowTensor flow
Tensor flow
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
 
Tensorflow
TensorflowTensorflow
Tensorflow
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
 
Machine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlowMachine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlow
 
GDG-Shanghai 2017 TensorFlow Summit Recap
GDG-Shanghai 2017 TensorFlow Summit RecapGDG-Shanghai 2017 TensorFlow Summit Recap
GDG-Shanghai 2017 TensorFlow Summit Recap
 

Destacado

Big Data Analytics for connected home
Big Data Analytics for connected homeBig Data Analytics for connected home
Big Data Analytics for connected homeHéloïse Nonne
 
Online learning - Apache Spark alternatives: Vowpal Wabbit. (18.06.2015)
Online learning - Apache Spark alternatives: Vowpal Wabbit. (18.06.2015)Online learning - Apache Spark alternatives: Vowpal Wabbit. (18.06.2015)
Online learning - Apache Spark alternatives: Vowpal Wabbit. (18.06.2015)bddmoscow
 
Présentation Big Data et REX Hadoop
Présentation Big Data et REX HadoopPrésentation Big Data et REX Hadoop
Présentation Big Data et REX HadoopJoseph Glorieux
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Preferred Networks
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Cedric CARBONE
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connectorDuyhai Doan
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandationModern Data Stack France
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielModern Data Stack France
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Modern Data Stack France
 

Destacado (11)

Big Data Analytics for connected home
Big Data Analytics for connected homeBig Data Analytics for connected home
Big Data Analytics for connected home
 
Online learning - Apache Spark alternatives: Vowpal Wabbit. (18.06.2015)
Online learning - Apache Spark alternatives: Vowpal Wabbit. (18.06.2015)Online learning - Apache Spark alternatives: Vowpal Wabbit. (18.06.2015)
Online learning - Apache Spark alternatives: Vowpal Wabbit. (18.06.2015)
 
Présentation Big Data et REX Hadoop
Présentation Big Data et REX HadoopPrésentation Big Data et REX Hadoop
Présentation Big Data et REX Hadoop
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connector
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandation
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
 
Spark dataframe
Spark dataframeSpark dataframe
Spark dataframe
 
SlideShare 101
SlideShare 101SlideShare 101
SlideShare 101
 

Similar a Online learning, Vowpal Wabbit and Hadoop

On the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonOn the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonTakeshi Akutsu
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of pythonYung-Yu Chen
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch Eran Shlomo
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Graph processing
Graph processingGraph processing
Graph processingyeahjs
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
Making fitting in RooFit faster
Making fitting in RooFit fasterMaking fitting in RooFit faster
Making fitting in RooFit fasterPatrick Bos
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwordsNitay Joffe
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)Yu Liu
 

Similar a Online learning, Vowpal Wabbit and Hadoop (20)

Ndp Slides
Ndp SlidesNdp Slides
Ndp Slides
 
On the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonOn the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of Python
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of python
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Graph processing
Graph processingGraph processing
Graph processing
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Making fitting in RooFit faster
Making fitting in RooFit fasterMaking fitting in RooFit faster
Making fitting in RooFit faster
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
MXNet Workshop
MXNet WorkshopMXNet Workshop
MXNet Workshop
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Deep learning
Deep learningDeep learning
Deep learning
 
2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
 

Último

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 

Último (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

Online learning, Vowpal Wabbit and Hadoop

  • 1. Online learning, Vowpal Wabbit and Hadoop Héloïse Nonne Data Science Consulting hnonne@quantmetry.com June 2nd, 2015
  • 3.
  • 4. Episode IV: A New Hope 1936: Turing machine: Online processing
  • 5. The first computers 1956 IBM 350 RAMAC Capacity: 3.75 MB
  • 6. Artificial intelligence: almost done! 6 1943 – 1960’s: Multiple layers neural networks
  • 7. 7 • Not enough computing power • Not enough storage capacity • Not enough data • Algorithms are not efficient enough Episode V: Reality Strikes Back
  • 8. Episode VI: Return of the Regression Learn a weight vector 𝑤 ∈ 𝑅 𝑀 such that : 𝑦 𝑤 𝑥 = 𝑖 𝑤𝑖 𝑥𝑖 ∼ 𝑦 N samples with m features: 𝑥p ∈ 𝑅 𝑀 Result to predict: 𝑦 𝑝 ∈ 𝑅 N samples with m features: 𝑥p ∈ 𝑅 𝑀 Class to predict: 𝑦 𝑝 = 0,1/ blue,red Learn a weight vector 𝑤 ∈ 𝑅 𝑀 such that : 𝑦 𝑤 𝑥 = 1 1+exp(− 𝑖 𝑤 𝑖 𝑥 𝑖) is close to 𝑦
  • 9. L𝐨𝐬𝐬 𝐟𝐮𝐧𝐜𝐭𝐢𝐨𝐧 Model: 𝑓𝑤(𝑥) Meaning of 𝑦 𝑤(𝑥) Linear regression 1 2 𝑦 𝑤 − 𝑦 2 𝑖 𝑤𝑖 𝑥𝑖 Conditional expectation 𝐸(𝑦|𝑥) Logistic regression log(𝑦 𝑤) 1 1 + exp(− 𝑖 𝑤𝑖 𝑥𝑖) Probability 𝑃(𝑦|𝑥) Hinge regression max(0,1 − 𝑦𝑦 𝑤) sign 𝑥 Approximation -1 or 1 Loss functions Accounts for your model error Choose a loss function according to your usecase
  • 10. Many iterations Each on the entire dataset Batch learning algorithm N samples (𝑥 𝑝 , 𝑦 𝑝 ) Minimize the global loss to find the best parameters 𝑤𝑡+1 = 𝑤𝑡 − 𝜂 𝑁 𝑝 𝛻𝑤 𝐿(𝑦 𝑝, 𝑓𝑤 𝑥 𝑝 ) Predictive model with parameters 𝑤 Prediction on (𝑥, 𝑦) 𝑦 𝑤(𝑥) = 𝑓𝑤(𝑥) Compute the loss Accumulate for all samples → global loss 𝐸 𝑤 = 1 𝑁 𝑝 𝐿(𝑦 𝑝 , 𝑓𝑤 𝑥 𝑝 ) Complexity: 𝜗 𝑁 for each iteration
  • 11. Many iterations Each on the entire dataset Direction is orthogonal to the isocontours Global loss function in weights space Lines = isocontours Batch learning algorithm Minimize the global loss to find the best parameters 𝑤𝑡+1 = 𝑤𝑡 − 𝜂 𝑁 𝑝 𝛻𝑤 𝐿(𝑦 𝑝, 𝑓𝑤 𝑥 𝑝 ) Complexity: 𝜗 𝑁 for each iteration Source: http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
  • 12. Episode I: The Big Data Menace What if • data does not fit in memory? • we want to combine features together (polynomials, n-grams)? → dataset size inflation • new samples come with new features? • the phenomenon we try to model drift with time?
  • 13. Online learning algorithm Update the parameters to minimize individual loss 𝑤𝑡+1 = 𝑤𝑡 − 𝜂𝛻𝑤 𝐿(𝑦, 𝑓𝑤 𝑥 ) Predictive model with parameters 𝑤 Prediction on (𝑥, 𝑦) 𝑦 𝑤(𝑥) = 𝑓𝑤(𝑥) Compute the individual loss Complexity: 𝜗 1 for each iteration N samples (𝑥 𝑝 , 𝑦 𝑝 ) Many iterations One sample at a time
  • 14. Online learning algorithm Update the parameters to minimize individual loss 𝑤𝑡+1 = 𝑤𝑡 − 𝜂𝛻𝑤 𝐿(𝑦, 𝑓𝑤 𝑥 ) Complexity: 𝜗 1 for each iteration Many iterations One sample at a time Direction is not perpendicular, but is updated much more often Having more updates allows to stabilize and approach the minimum very quickly
  • 15. Online learning algorithm Update the parameters to minimize individual loss 𝑤𝑡+1 = 𝑤𝑡 − 𝜂𝛻𝑤 𝐿(𝑦, 𝑓𝑤 𝑥 ) Complexity: 𝜗 1 for each iteration Many iterations One sample at a time Direction is not perpendicular, but is updated much more often Having more updates allows to stabilize and approach the minimum very quickly Problem: a lot of noise
  • 16. The time required for convergence 16 Optimization accuracy against training time for online (SGD) and batch (TRON) Online learning requires a lot less time to approximately converge Once close to the minimum, batch is much faster because it is noiseless Bottou, Stochastic gradient descent tricks, Neural Networks: Tricks of the Trade (2012). WARNING Once batch becomes better, the validation error has already converged anyway.
  • 17. An implementation of online learning: Vowpal Wabbit • Originally developed at Yahoo!, currently at Microsoft • Led by John Langford • C++ • efficient scalable implementation of online learning • First public version 2007 • 2015: 4400 commits, 81 contributors, 18 releases
  • 18. Vowpal Wabbit’s features Nice features of VW • MANY algorithms are implemented • Optimization algorithms (BFGS, Conjugate gradient, etc.) • Combinations of features, N-grams (NLP) • Automatic tuning (learning rate, adaptive learning, on the fly normalization features) • And more (boostraping, multi-core CPUs, etc.)
  • 19. Vowpal Wabbit Input format Input agnostic • Binary • Numerical • Categorical (hashing trick) • Can deal with missing values/sparse-features Very little data preparation 1 1.0 |height:1.5 length:2.0 |has stripes 1 1.0 |length:3.0 |has four legs -1 1.0 |height:0.3 |has wings -1 1.0 |height: 0.9 length: 0.6 |has a shell and a nice color
  • 20. Vowpal Wabbit • Fast learning for scoring on large datasets • Can handle quite raw (unprepared data) • Great for exploring a new dataset with simple and fast models • Uncover phenomena • Figure out what your should do for feature engineering • Very good tool for experimentations Yes but … I want to work in parallel
  • 21. Episode II: Attack of the Clones Why parallelizing? • Speed-up • Data does not fit on a single machine (Subsampling is not always good if you have many features) • Take advantage of distributed storage and avoid the bottleneck of data transfer • Take advantage of distributed memory to explore combination (multiplication to billions of distinct features)
  • 22. The solution: online + batch learning 1. Each node 𝑘 makes an online pass over its data (adaptive gradient update rule) 2. Average the weights 𝑤 = ( 𝑘=1 𝐾 𝐺 𝑘) −1 ( 𝑘=1 𝐾 𝐺 𝑘 𝑤 𝑘) 3. 𝑤 is broadcasted down to all nodes and continue learning. 4. Iterate then with batch learning to make the last few steps to the minimum. How is it implemented in Vowpal Wabbit project?
  • 23. Needs MPI (Message Passing Interface) MapReduce Effective communication infrastructure Y Allreduce is simple N Data transfers across network N Large overhead (job scheduling, some data transfer, data parsing) Data-centric platform (avoid data transfer) N Lack of internal knowledge of data location Y Full knowledge of data location Fault tolerant system N Little fault tolerant by default Y Robust and fault tolerant Easy to code / good programming language Y Standard and portable (Fortran, C/C++, Java) Y Automatic cleanup of temp files by default N Rethink and rewrite learning code into map and reduce operations N Java eats up a lot of RAM Good optimization approach Y No need to rewrite learning code N Map/reduce operations does not easily allow iterative algorithms Overall time must be minimal N Data transfers across network N Increased number of read/write operations N Increased time while waiting for free nodes Implementation by VW: AllReduce + MapReduce
  • 24. All Reduce 1 2 9 5 8 8 1 Allreduce is based on a communication structure in trees (binary is easier to implement) Every node starts with a number 1. Reduce = sum up the tree 2. Broadcast down up the tree Every node ends up with the sum of the numbers across all the nodes
  • 25. All Reduce Every node starts with a number 1. Reduce = sum up the tree 2. Broadcast down up the tree Every node ends up with the sum of the numbers across all the nodes 1 16 9 5 17 8 1 Allreduce is based on a communication structure in trees (binary is easier to implement)
  • 26. All Reduce Every node starts with a number 1. Reduce = sum up the tree 2. Broadcast down the tree Every node ends up with the sum of the numbers across all the nodes 34 16 9 5 17 8 1 Allreduce is based on a communication structure in trees (binary is easier to implement)
  • 27. All Reduce Every node starts with a number 1. Reduce = sum up the tree 2. Broadcast down the tree Every node ends up with the sum of the numbers across all the nodes 34 34 9 5 34 8 1 Allreduce is based on a communication structure in trees (binary is easier to implement)
  • 28. All Reduce Every node starts with a number 1. Reduce = sum up the tree 2. Broadcast down up the tree Every node ends up with the sum of the numbers across all the nodes 34 34 34 34 34 34 34 Allreduce is based on a communication structure in trees (binary is easier to implement)
  • 29. MapReduce (streaming)/AllReduce-online/batch 1. Start the daemon (communication system) 2. Each node makes an online pass over its data 4. Use allreduce to average the weights over all nodes 3. Initialize a tree on the masternode 5. Broadcast the averaged weights down to all nodes 6. Use it to initialize a batch learning step 7. Send back the weights and average with allreduce 8. Iterate other batch steps
  • 30. Advantages of VW implementation • Minimal additional programming effort • Data location knowledge: use mapreduce infrastructure with only one mapper • Vowpal wabbit (C/C++) is not RAM greedy • Small synchronisation overhead • time spent in AllReduce operation ≪ computation time • Reduced stalling time while waiting for other nodes • delayed initialization of AllReduce’s tree to capitalize on Hadoop speculative execution • Rapid convergence with online then accuracy with batch
  • 31. Episode III: Revenge of Hadoop 1. Start the daemon (./spanning_tree.cc) 2. Launch the MapReduce job 3. Kill the spanning tree hadoop jar /home/hadoop/contrib/streaming/hadoop-streaming.jar -D mapreduce.map.speculative=true -D mapreduce.job.reduces=0 -input $in_directory -output $out_directory -files [“/usr/local/bin/vw, /usr/lib64/libboost_program_options.so, /lib64/libz.so.1”] -file runvw.sh -mapper runvw.sh -reducer NONE
  • 32. Running vowpal wabbit on AWS AWS Best practice: Transient clusters • Get your data on S3 buckets • Start an EMR (Elastic Map Reduce) cluster • Bootstrap actions (install, config, etc.) • Run your job(s) (steps) • Shut down your cluster Pros and cons • Easy setup / works well • Minimum maintenance • Low cost • Logs ??????? • Debugging can be is difficult → use an experimental cluster or a VM
  • 33. Beware of the environment variables VW needs MapReduce environment variables • total number of mapper tasks • number ID of the map task for each node • ID of the MapReduce job • private dns of the master node within the cluster Update the names in VW-cluster code Hack the environment variables with python vw --total $nmappers --node $mapper --unique_id $mapred_job_id -d /dev/stdin --span_server $submit_host --loss_function=logistic -f sgd.vwmodel
  • 34. Number of splits You need to brute force the number of splits to the map reduce job • Advice from John Langford / approach in the code • compute the size of your minimal data size (total / nb of nodes) • use option –D mapreduce.min.split.size → didn’t work • Dirty workaround • split the data into as many file as your nodes • store it in a .gz file
  • 35. Running vowpal wabbit on AWS ip-10-38-138-36.eu-west-1.compute.internal Starting training SGD ... creating quadratic features for pairs: ft tt ff fi ti final_regressor = sgd.vwmodel Num weight bits = 18 learning rate = 0.5 initial_t = 0 power_t = 0.5 average since example example current current current loss last counter weight label predict features 0.693147 0.693147 2 1.0 -1.0000 0.0000 325 0.400206 0.107265 3 2.0 -1.0000 -2.1783 325 [...] 0.414361 0.404726 131073 131072.0 -1.0000 -4.5625 325 0.406345 0.398329 262145 262144.0 1.0000 -1.8379 325 0.388375 0.370405 524289 524288.0 -1.0000 -1.3313 325
  • 36. Running vowpal wabbit on AWS 0.388375 0.370405 524289 524288.0 -1.0000 -1.3313 325 connecting to 10.38.138.36 = ip-10-38-138-36.eu-west-1.compute.internal:26543 wrote unique_id=2 wrote total=1 wrote node=0 read ok=1 read kid_count=0 read parent_ip=255.255.255.255 read parent_port=65535 Net time taken by process = 8.767000 seconds finished run number of examples per pass = 1000001 passes used = 1 weighted example sum = 1000000.000000 weighted label sum = -679560.000000 average loss = 0.380041 total feature number = 325000000
  • 37. Running vowpal wabbit on AWS BFGS ... [...] num sources = 1 connecting to 10.38.138.36 = ip-10-38-138-36.eu-west-1.compute.internal:26543 wrote unique_id=4 wrote total=1 wrote node=0 [...] read parent_ip=255.255.255.255 read parent_port=65535 Maximum number of passes reached. Net time taken by process = 10.55 seconds finished run number of examples = 1800002 weighted example sum = 1.8e+06 weighted label sum = -1.22307e+06 average loss = 0.350998 h total feature number = 585000000
  • 38. Running vowpal wabbit on AWS On Hadoop with 5 nodes 6 minutes On a single machine 26 minutes and 30 seconds 6.4 GB 50 000 000 samples 52 974 510 080 features
  • 39. Concluding remarks Less computational time allows to explore more data • Work on more data • Include more features to the analysis • Useful as a platform for research and experimentation Optimization algorithms • Lots of interesting papers (Langford, Bottou, Agarwal, LeCun, Duchi, Zinkevich, …) VW on Hadoop Learn a lot by doing and debugging :-) If possible, use online learning when time of computation is the bottleneck
  • 40. Episode VII: The Force Awakens Coming soon • Pushes on github Benchmarks • How is the training time affected by the size of the data set (measure overhead) • Benchmark available approaches on various large datasets and usecases • Benchmark against Graphlab, MLlib More VW • Exhaustive comparison and association with complex models (exploit vw for feature engineering and feature selection) • Nonlinear online learning (neural networks, SVM, ...)
  • 41. A few references • John Langford – Hunchnet – github • Bottou, Stochastic gradient descent tricks, Neural Networks: Tricks of the Trade (2012) • Yann LeCun’s lectures • http://quantmetry-blog.com @HeloiseNonne Master node Worker Worker Worker

Notas del editor

  1. Clusters are live for the duration of the job and shut down when the job is done. The data persists on S3
  2. Clusters are live for the duration of the job and shut down when the job is done. The data persists on S3
  3. Clusters are live for the duration of the job and shut down when the job is done. The data persists on S3
  4. Clusters are live for the duration of the job and shut down when the job is done. The data persists on S3