Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Deep Learning 
through Examples 
Arno Candel 
! 
0xdata, H2O.ai 
Scalable In-Memory Machine Learning 
! 
Silicon Valley Bi...
Who am I? 
@ArnoCandel 
PhD in Computational Physics, 2005 
from ETH Zurich Switzerland 
! 
6 years at SLAC - Accelerator ...
H2O Deep Learning, @ArnoCandel 3 
H2O DeepLearning: 
Kaggle #1 rank (out of 413) - 40d left 
Achieved with H2O Deep Learni...
H2O Deep Learning, @ArnoCandel 
Outline 
Intro & Live Demo (10 mins) 
Methods & Implementation (20 mins) 
Results & Live D...
H2O Deep Learning, @ArnoCandel 
About H20 (aka 0xdata) 
Java, Apache v2 Open Source 
Join the www.h2o.ai/community! 
#1 Ja...
H2O Deep Learning, @ArnoCandel 
Customer Demands for 
Practical Machine Learning 
6 
Requirements Value 
In-Memory Fast (I...
H2O Deep Learning, @ArnoCandel 
H2O Integration 
H2O 
R JSON Scala Python 
YARN Hadoop MR 
HDFS HDFS HDFS 
Standalone Over...
H2O Deep Learning, @ArnoCandel 
H2O Architecture 
Prediction Engine 
Distributed 
In-Memory K-V store 
Col. compression 
M...
H2O Deep Learning, @ArnoCandel 
H2O - The Killer App on Spark 
9 
http://databricks.com/blog/2014/06/30/ 
sparkling-water-...
H2O Deep Learning, @ArnoCandel 
H2O DeepLearning on Spark 
10 
// Test if we can correctly learn A, B where Y = logistic(A...
H2O Deep Learning, @ArnoCandel 11 
H2O R CRAN package 
John Chambers (creator of the S language, R-core member) 
names H2O...
H2O Deep Learning, @ArnoCandel 
H2O + R = Happy Data Scientist 
12 
Machine Learning on Big Data with R: 
Data resides on ...
H2O Deep Learning, @ArnoCandel 13 
Higgs Particle Discovery 
Large Hadron Collider: Largest experiment of mankind! 
$13+ b...
H2O Deep Learning, @ArnoCandel 14 
Higgs: Binary Classification Problem 
Current methods of choice for physicists: 
- Boos...
H2O Deep Learning, @ArnoCandel 15 
Higgs: Can Deep Learning Do Better? 
Algorithm low-level H2O AUC all features H2O AUC 
...
H2O Deep Learning, @ArnoCandel 
What is Deep Learning? 
Wikipedia: 
Deep learning is a set of algorithms in 
machine learn...
H2O Deep Learning, @ArnoCandel 
What is NOT Deep 
Linear models are not deep 
(by definition) 
! 
Neural nets with 1 hidde...
H2O Deep Learning, @ArnoCandel 
Deep Learning is Trending 
Google trends 
2009 2011 
2013 
18 
Businesses are using 
Deep ...
H2O Deep Learning, @ArnoCandel 
Deep Learning History 
slides by Yan LeCun (now Facebook) 
19 
Deep Learning wins competit...
H2O Deep Learning, @ArnoCandel 
Deep Learning in H2O 
1970s multi-layer feed-forward Neural Network 
(supervised learning ...
H2O Deep Learning, @ArnoCandel 
Example Neural Network 
“fully connected” directed graph of neurons 
age 
income 
employme...
H2O Deep Learning, @ArnoCandel 
Prediction: Forward Propagation 
“neurons activate each other via weighted sums” 
age 
inc...
H2O Deep Learning, @ArnoCandel 
Data preparation & Initialization 
Neural Networks are sensitive to numerical noise, 
oper...
H2O Deep Learning, @ArnoCandel 
Training: Update Weights & Biases 
For each training row, we make a prediction and compare...
H2O Deep Learning, @ArnoCandel 
Backward Propagation 
How to compute ∂E/∂wi for wi <— wi - rate * ∂E/∂wi ? 
Naive: For eve...
H2O Deep Learning, @ArnoCandel 
H2O Deep Learning Architecture 
K-V 
HTTPD 
nodes/JVMs: sync 
threads: async 
communicatio...
H2O Deep Learning, @ArnoCandel 
Adaptive learning rate - ADADELTA (Google) 
Automatically set learning rate for each neuro...
H2O Deep Learning, @ArnoCandel 
Detail: Adaptive Learning Rate 
! 
Compute moving average of Δwi2 at time t for window len...
H2O Deep Learning, @ArnoCandel 
Detail: Dropout Regularization 
29 
Training: 
For each hidden neuron, for each training s...
H2O Deep Learning, @ArnoCandel 
MNIST: digits classification 
MNIST = Digitized handwritten 
digits database (Yann LeCun) ...
H2O Deep Learning, @ArnoCandel 
H2O Deep Learning on MNIST: 
0.87% test set error (so far) 
Frequent errors: confuse 2/7 a...
H2O Deep Learning, A. Candel 
Weather Dataset 
32 
Predict “RainTomorrow” from Temperature, 
Humidity, Wind, Pressure, etc...
H2O Deep Learning, A. Candel 
Live Demo: Weather Prediction 
5-fold cross validation 
Interactive ROC curve with 
real-tim...
H2O Deep Learning, @ArnoCandel 
Live Demo: Grid Search 
How did I find those parameters? Grid Search! 
(works for multiple...
H2O Deep Learning, @ArnoCandel 
Text Classification 
Goal: Predict the item from 
seller’s text description 
35 
“Vintage ...
H2O Deep Learning, @ArnoCandel 
36 
Text Classification 
Train: 578,361 rows 8,647 cols 467 classes 
Test: 64,263 rows 8,6...
H2O Deep Learning, @ArnoCandel 
Parallel Scalability 
(for 64 epochs on MNIST, with “0.87%” parameters) 
37 
Speedup 
40.0...
H2O Deep Learning, @ArnoCandel 
Deep Learning Auto-Encoders for 
Anomaly Detection 
38 
Toy example: 
Find anomaly in ECG ...
H2O Deep Learning, @ArnoCandel 39 
Deep Learning Auto-Encoders for 
Test set with anomaly 
Test set prediction is 
reconst...
H2O Deep Learning, @ArnoCandel 40 
H2O brings Deep Learning to R 
R Vignette with 
example R scripts 
http://0xdata.com/h2...
H2O Deep Learning, @ArnoCandel 
POJO Model Export for 
Production Scoring 
41 
Plain old Java code is 
auto-generated to t...
H2O Deep Learning, @ArnoCandel 42 
Higgs Particle Discovery with H2O 
How well did H2O 
Deep Learning do? 
<Your guess goe...
H2O Deep Learning, @ArnoCandel 
H2O Steam: Scoring Platform 
43 
http://server:port/steam/index.html 
Higgs Dataset Demo o...
H2O Deep Learning, @ArnoCandel 44 
Scoring Higgs Models in H2O Steam 
Live Demo on 10-node cluster: 
<10 minutes runtime f...
H2O Deep Learning, @ArnoCandel 45 
Higgs Particle Detection with H2O 
HIGGS UCI Dataset: 
21 low-level features AND 
7 hig...
H2O Deep Learning, @ArnoCandel 
Tips for H2O Deep Learning ! 
General: 
More layers for more complex functions (exp. more ...
H2O Deep Learning, @ArnoCandel 
Extensions for H2O Deep Learning 
47 
- Vision: Convolutional & Pooling Layers PUB-644 
- ...
H2O Deep Learning, @ArnoCandel 
Key Take-Aways 
H2O is a distributed in-memory data science 
platform. It was designed for...
Próxima SlideShare
Cargando en…5
×

Deep Learning through Examples - Kaggle #1

8.646 visualizaciones

Publicado el

Suggestions:
1) For best quality, download the PDF before viewing.
2) A screencast with audio is available at: http://youtu.be/fdbQreQacIQ


In this talk, we take Deep Learning to task with real world data puzzles to solve.

Data:
- Africa Soil Kaggle Challenge top (#1) position by H2O DeepLearning
- Higgs binary classification dataset (10M rows, 29 cols)
- MNIST 10-class dataset
- Weather categorical dataset
- eBay text classification dataset (8500 cols, 500k rows, 467 classes)
- ECG heartbeat anomaly detection

- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata

Publicado en: Software
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

Deep Learning through Examples - Kaggle #1

  1. 1. Deep Learning through Examples Arno Candel ! 0xdata, H2O.ai Scalable In-Memory Machine Learning ! Silicon Valley Big Data Science Meetup, Vendavo, Mountain View, 9/11/14 !
  2. 2. Who am I? @ArnoCandel PhD in Computational Physics, 2005 from ETH Zurich Switzerland ! 6 years at SLAC - Accelerator Physics Modeling 2 years at Skytree, Inc - Machine Learning 9 months at 0xdata/H2O - Machine Learning ! 15 years in HPC/Supercomputing/Modeling ! Named “2014 Big Data All-Star” by Fortune Magazine !
  3. 3. H2O Deep Learning, @ArnoCandel 3 H2O DeepLearning: Kaggle #1 rank (out of 413) - 40d left Achieved with H2O Deep Learning from R! ! #1 #17 @matlabulous (Jo-fai Chow, Blend it like a Bayesian!) says: “I am 99.99999999999% sure that I can still go further with H2O.”
  4. 4. H2O Deep Learning, @ArnoCandel Outline Intro & Live Demo (10 mins) Methods & Implementation (20 mins) Results & Live Demos (25 mins) Higgs boson detection MNIST handwritten digits text classification Q & A (5 mins) 4
  5. 5. H2O Deep Learning, @ArnoCandel About H20 (aka 0xdata) Java, Apache v2 Open Source Join the www.h2o.ai/community! #1 Java Machine Learning in Github 5
  6. 6. H2O Deep Learning, @ArnoCandel Customer Demands for Practical Machine Learning 6 Requirements Value In-Memory Fast (Interactive) Distributed Big Data (No Sampling) Open Source Ownership of Methods API / SDK Extensibility H2O was developed by 0xdata from scratch to meet these requirements
  7. 7. H2O Deep Learning, @ArnoCandel H2O Integration H2O R JSON Scala Python YARN Hadoop MR HDFS HDFS HDFS Standalone Over YARN On MRv1 7 H2O H2O Java
  8. 8. H2O Deep Learning, @ArnoCandel H2O Architecture Prediction Engine Distributed In-Memory K-V store Col. compression Machine Learning Algorithms R Engine Nano fast Scoring Engine Memory manager e.g. Deep Learning 8 MapReduce
  9. 9. H2O Deep Learning, @ArnoCandel H2O - The Killer App on Spark 9 http://databricks.com/blog/2014/06/30/ sparkling-water-h20-spark.html
  10. 10. H2O Deep Learning, @ArnoCandel H2O DeepLearning on Spark 10 // Test if we can correctly learn A, B where Y = logistic(A + B*X) test("deep learning log regression") { val nPoints = 10000 val A = 2.0 val B = -1.5 ! // Generate testing data val trainData = DeepLearningSuite.generateLogisticInput(A, B, nPoints, 42) // Create RDD from testing data val trainRDD = sc.parallelize(trainData, 2) trainRDD.cache() ! import H2OContext._ // Create H2O data frame (will be implicit in the future) val trainH2ORDD = toDataFrame(sc, trainRDD) // Create a H2O DeepLearning model val dlParams = new DeepLearningParameters() dlParams.source = trainH2ORDD dlParams.response = trainH2ORDD.lastVec() dlParams.classification = true val dl = new DeepLearning(dlParams) val dlModel = dl.train().get() ! // Score validation data val validationData = DeepLearningSuite.generateLogisticInput(A, B, nPoints, 17) val validationRDD = sc.parallelize(validationData, 2) val validationH2ORDD = toDataFrame(sc, validationRDD) val predictionH2OFrame = new DataFrame(dlModel.score(validationH2ORDD))('predict) val predictionRDD = toRDD[DoubleHolder](sc, predictionH2OFrame) // will be implicit in the future // Validate prediction validatePrediction( predictionRDD.collect().map (_.predict.getOrElse(Double.NaN)), validationData) } Brand-Sparkling-New Sneak Preview!
  11. 11. H2O Deep Learning, @ArnoCandel 11 H2O R CRAN package John Chambers (creator of the S language, R-core member) names H2O R API in top three promising R projects
  12. 12. H2O Deep Learning, @ArnoCandel H2O + R = Happy Data Scientist 12 Machine Learning on Big Data with R: Data resides on the H2O cluster!
  13. 13. H2O Deep Learning, @ArnoCandel 13 Higgs Particle Discovery Large Hadron Collider: Largest experiment of mankind! $13+ billion, 16.8 miles long, 120 MegaWatts, -456F, 1PB/day, etc. Higgs boson discovery (July ’12) led to 2013 Nobel prize! Higgs vs Background http://arxiv.org/pdf/1402.4735v2.pdf Images courtesy CERN / LHC Machine Learning Meets Physics Or rather: Back to the roots (WWW was invented at CERN in ’89…)
  14. 14. H2O Deep Learning, @ArnoCandel 14 Higgs: Binary Classification Problem Current methods of choice for physicists: - Boosted Decision Trees - Neural networks with 1 hidden layer BUT: Must first add derived high-level features (physics formulae) HIGGS UCI Dataset: 21 low-level features AND 7 high-level derived features Train: 10M rows, Test: 500k rows Metric: AUC = Area under the ROC curve (range: 0.5…1, higher is better) Algorithm low-level H2O AUC all features H2O AUC Generalized Linear Model 0.596 0.684 add derived Random Forest 0.764 0.840 features Gradient Boosted Trees 0.753 0.839 Neural Net 1 hidden layer 0.760 0.830
  15. 15. H2O Deep Learning, @ArnoCandel 15 Higgs: Can Deep Learning Do Better? Algorithm low-level H2O AUC all features H2O AUC Generalized Linear Model 0.596 0.684 Random Forest 0.764 0.840 Gradient Boosted Trees 0.753 0.839 Neural Net 1 hidden layer 0.760 0.830 Deep Learning ? ? <Your guess goes here> reference paper results: baseline 0.733 Let’s build a H2O Deep Learning model and find out! (That was my last weekend)
  16. 16. H2O Deep Learning, @ArnoCandel What is Deep Learning? Wikipedia: Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using architectures composed of multiple non-linear transformations. Example: Input data (image) Prediction (who is it?) 16 Facebook's DeepFace (Yann LeCun) recognises faces as well as humans
  17. 17. H2O Deep Learning, @ArnoCandel What is NOT Deep Linear models are not deep (by definition) ! Neural nets with 1 hidden layer are not deep (only 1 layer - no feature hierarchy) ! SVMs and Kernel methods are not deep (2 layers: kernel + linear) ! Classification trees are not deep (operate on original input space, no new features generated) 17
  18. 18. H2O Deep Learning, @ArnoCandel Deep Learning is Trending Google trends 2009 2011 2013 18 Businesses are using Deep Learning techniques! Google Brain (Andrew Ng, Jeff Dean & Geoffrey Hinton) ! FBI FACE: $1 billion face recognition project ! Chinese Search Giant Baidu Hires Man Behind the “Google Brain” (Andrew Ng)
  19. 19. H2O Deep Learning, @ArnoCandel Deep Learning History slides by Yan LeCun (now Facebook) 19 Deep Learning wins competitions AND makes humans, businesses and machines (cyborgs!?) smarter
  20. 20. H2O Deep Learning, @ArnoCandel Deep Learning in H2O 1970s multi-layer feed-forward Neural Network (supervised learning with stochastic gradient descent using back-propagation) ! + distributed processing for big data (H2O in-memory MapReduce paradigm on distributed data) ! + multi-threaded speedup (H2O Fork/Join worker threads update the model asynchronously) ! + smart algorithms for accuracy (weight initialization, adaptive learning rate, momentum, dropout regularization, l1/L2 regularization, grid search, checkpointing, auto-tuning, model averaging) ! = Top-notch prediction engine! 20
  21. 21. H2O Deep Learning, @ArnoCandel Example Neural Network “fully connected” directed graph of neurons age income employment input/output neuron hidden neuron married single Input layer Hidden layer 1 Hidden layer 2 Output layer #connections 3x4 4x3 3x2 information flow #neurons 3 4 3 2 21
  22. 22. H2O Deep Learning, @ArnoCandel Prediction: Forward Propagation “neurons activate each other via weighted sums” age income employment uij vjk zk pl yj = tanh(sumi(xi*uij)+bj) xi yj 22 married per-class probabilities sum(pl) = 1 wkl zk = tanh(sumj(yj*vjk)+ck) single pl = softmax(sumk(zk*wkl)+dl) softmax(xk) = exp(xk) / sumk(exp(xk)) activation function: tanh alternative: x -> max(0,x) “rectifier” pl is a non-linear function of xi: can approximate ANY function with enough layers! bj, ck, dl: bias values (indep. of inputs)
  23. 23. H2O Deep Learning, @ArnoCandel Data preparation & Initialization Neural Networks are sensitive to numerical noise, operate best in the linear regime (not saturated) age income employment xi Automatic standardization of data xi: mean = 0, stddev = 1 ! horizontalize categorical variables, e.g. {full-time, part-time, none, self-employed} -> {0,1,0} = part-time, {0,0,0} = self-employed married single wkl Automatic initialization of weights ! 23 Poor man’s initialization: random weights wkl ! Default (better): Uniform distribution in +/- sqrt(6/(#units + #units_previous_layer))
  24. 24. H2O Deep Learning, @ArnoCandel Training: Update Weights & Biases For each training row, we make a prediction and compare with the actual label (supervised learning): predicted actual 0.8 1 married Objective: minimize prediction error (MSE or cross-entropy) Mean Square Error = (0.22 + 0.22)/2 “penalize differences per-class” ! Cross-entropy = -log(0.8) “strongly penalize non-1-ness” 1 Stochastic Gradient Descent: Update weights and biases via gradient of the error (via back-propagation): w <— w - rate * ∂E/∂w 24 0.2 0 single E w rate
  25. 25. H2O Deep Learning, @ArnoCandel Backward Propagation How to compute ∂E/∂wi for wi <— wi - rate * ∂E/∂wi ? Naive: For every i, evaluate E twice at (w1,…,wi±Δ,…,wN)… Slow! Backprop: Compute ∂E/∂wi via chain rule going backwards xi ! net = sumi(wi*xi) + b wi y = activation(net) E = error(y) ∂E/∂wi = ∂E/∂y * ∂y/∂net * ∂net/∂wi = ∂(error(y))/∂y * ∂(activation(net))/∂net * xi 25
  26. 26. H2O Deep Learning, @ArnoCandel H2O Deep Learning Architecture K-V HTTPD nodes/JVMs: sync threads: async communication K-V HTTPD w 1 w w 2 1 w w w w 1 3 2 4 w1 w3 w2 w4 3 2 w w2+w4 1+w3 4 1 2 w* = (w1+w2+w3+w4)/4 map: each node trains a copy of the weights and biases with (some* or all of) its local data with asynchronous F/J threads initial model: weights and biases w 1 1 updated model: w* H2O atomic in-memory K-V store reduce: model averaging: average weights and biases from all nodes, speedup is at least #nodes/log(#rows) arxiv:1209.4129v3 i Query & display the model via JSON, WWW Keep iterating over the data (“epochs”), score from time to time *auto-tuned (default) or user-specified number of points per MapReduce iteration 26
  27. 27. H2O Deep Learning, @ArnoCandel Adaptive learning rate - ADADELTA (Google) Automatically set learning rate for each neuron based on its training history Regularization L1: penalizes non-zero weights L2: penalizes large weights Dropout: randomly ignore certain inputs Grid Search and Checkpointing Run a grid search to scan many hyper-parameters, then continue training the most promising model(s) 27 “Secret” Sauce to Higher Accuracy
  28. 28. H2O Deep Learning, @ArnoCandel Detail: Adaptive Learning Rate ! Compute moving average of Δwi2 at time t for window length rho: ! E[Δwi2]t = rho * E[Δwi2]t-1 + (1-rho) * Δwi2 ! Compute RMS of Δwi at time t with smoothing epsilon: ! RMS[Δwi]t = sqrt( E[Δwi2]t + epsilon ) Adaptive acceleration / momentum: accumulate previous weight updates, but over a window of time Adaptive annealing / progress: Gradient-dependent learning rate, moving window prevents “freezing” (unlike ADAGRAD: no window) Do the same for ∂E/∂wi, then obtain per-weight learning rate: RMS[Δwi]t-1 RMS[∂E/∂wi]t rate(wi, t) = cf. ADADELTA paper 28
  29. 29. H2O Deep Learning, @ArnoCandel Detail: Dropout Regularization 29 Training: For each hidden neuron, for each training sample, for each iteration, ignore (zero out) a different random fraction p of input activations. ! age income employment married single X X X Testing: Use all activations, but reduce them by a factor p (to “simulate” the missing activations during training). cf. Geoff Hinton's paper
  30. 30. H2O Deep Learning, @ArnoCandel MNIST: digits classification MNIST = Digitized handwritten digits database (Yann LeCun) Yann LeCun: “Yet another advice: don't get fooled by people who claim to have a solution to Artificial General Intelligence. Ask them what error rate they get on MNIST or ImageNet.” Data: 28x28=784 pixels with (gray-scale) values in 0…255 Standing world record: Without distortions or convolutions, the best-ever published error rate on test set: 0.83% (Microsoft) 30 Train: 60,000 rows 784 integer columns 10 classes Test: 10,000 rows 784 integer columns 10 classes Let’s see how H2O does on the MNIST dataset!
  31. 31. H2O Deep Learning, @ArnoCandel H2O Deep Learning on MNIST: 0.87% test set error (so far) Frequent errors: confuse 2/7 and 4/9 31 test set error: 1.5% after 10 mins 1.0% after 1.5 hours 0.87% after 4 hours World-class results! No pre-training No distortions No convolutions No unsupervised training Running on 4 nodes with 16 cores each
  32. 32. H2O Deep Learning, A. Candel Weather Dataset 32 Predict “RainTomorrow” from Temperature, Humidity, Wind, Pressure, etc.
  33. 33. H2O Deep Learning, A. Candel Live Demo: Weather Prediction 5-fold cross validation Interactive ROC curve with real-time updates 33 3 hidden Rectifier layers, Dropout, L1-penalty 12.7% 5-fold cross-validation error is at least as good as GBM/RF/GLM models
  34. 34. H2O Deep Learning, @ArnoCandel Live Demo: Grid Search How did I find those parameters? Grid Search! (works for multiple hyper parameters at once) 34 Then continue training the best model
  35. 35. H2O Deep Learning, @ArnoCandel Text Classification Goal: Predict the item from seller’s text description 35 “Vintage 18KT gold Rolex 2 Tone in great condition” Data: Binary word vector 0,0,1,0,0,0,0,0,1,0,0,0,1,…,0 gold vintage condition Train: 578,361 rows 8,647 cols 467 classes Test: 64,263 rows 8,647 cols 143 classes Let’s see how H2O does on the ebay dataset!
  36. 36. H2O Deep Learning, @ArnoCandel 36 Text Classification Train: 578,361 rows 8,647 cols 467 classes Test: 64,263 rows 8,647 cols 143 classes Out-Of-The-Box: 11.6% test set error after 10 epochs! Predicts the correct class (out of 143) 88.4% of the time! Note 1: H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB) Note 2: No tuning was done (results are for illustration only)
  37. 37. H2O Deep Learning, @ArnoCandel Parallel Scalability (for 64 epochs on MNIST, with “0.87%” parameters) 37 Speedup 40.00 30.00 20.00 10.00 0.00 1 2 4 8 16 32 63 H2O Nodes Training Time 2.7 mins 100 75 50 25 0 in minutes 1 2 4 8 16 32 63 H2O Nodes (4 cores per node, 1 epoch per node per MapReduce)
  38. 38. H2O Deep Learning, @ArnoCandel Deep Learning Auto-Encoders for Anomaly Detection 38 Toy example: Find anomaly in ECG heart beat data. First, train a model on what’s “normal”: 20 time-series samples of 210 data points each Deep Auto-Encoder: Learn low-dimensional non-linear “structure” of the data that allows to reconstruct the orig. data Also for categorical data!
  39. 39. H2O Deep Learning, @ArnoCandel 39 Deep Learning Auto-Encoders for Test set with anomaly Test set prediction is reconstruction, looks “normal” Found anomaly! large reconstruction error Model of what’s “normal” + => Anomaly Detection
  40. 40. H2O Deep Learning, @ArnoCandel 40 H2O brings Deep Learning to R R Vignette with example R scripts http://0xdata.com/h2o/algorithms/ All parameters are available from R…
  41. 41. H2O Deep Learning, @ArnoCandel POJO Model Export for Production Scoring 41 Plain old Java code is auto-generated to take your H2O Deep Learning models into production!
  42. 42. H2O Deep Learning, @ArnoCandel 42 Higgs Particle Discovery with H2O How well did H2O Deep Learning do? <Your guess goes here> reference paper results Any guesses for AUC on low-level features? AUC=0.76 was the best for RF/GBM/NN (H2O) Let’s see how H2O did in the past 30 minutes!
  43. 43. H2O Deep Learning, @ArnoCandel H2O Steam: Scoring Platform 43 http://server:port/steam/index.html Higgs Dataset Demo on 10-node cluster Let’s score all our H2O models and compare them! Live Demo
  44. 44. H2O Deep Learning, @ArnoCandel 44 Scoring Higgs Models in H2O Steam Live Demo on 10-node cluster: <10 minutes runtime for all algos! Better than LHC baseline of AUC=0.73!
  45. 45. H2O Deep Learning, @ArnoCandel 45 Higgs Particle Detection with H2O HIGGS UCI Dataset: 21 low-level features AND 7 high-level derived features Train: 10M rows, Test: 500k rows Algorithm *Nature paper: http://arxiv.org/pdf/1402.4735v2.pdf Paper’s l-l AUC low-level H2O AUC all features H2O AUC Parameters (not heavily tuned), H2O running on 10 nodes Generalized Linear Model - 0.596 0.684 default, binomial Random Forest - 0.764 0.840 50 trees, max depth 50 Gradient Boosted Trees 0.73 0.753 0.839 50 trees, max depth 15 Neural Net 1 layer 0.733 0.760 0.830 1x300 Rectifier, 100 epochs Deep Learning 3 hidden layers 0.836 0.850 - 3x1000 Rectifier, L2=1e-5, 40 epochs Deep Learning 4 hidden layers 0.868 0.869 - 4x500 Rectifier, L1=L2=1e-5, 300 epochs Deep Learning 6 hidden layers 0.880 running - 6x500 Rectifier, L1=L2=1e-5 Deep Learning on low-level features alone beats everything else! H2O prelim. results compare well with paper’s results* (TMVA & Theano)
  46. 46. H2O Deep Learning, @ArnoCandel Tips for H2O Deep Learning ! General: More layers for more complex functions (exp. more non-linearity). More neurons per layer to detect finer structure in data (“memorizing”). Add some regularization for less overfitting (lower validation set error). Specifically: Do a grid search to get a feel for convergence, then continue training. Try Tanh/Rectifier, try max_w2=10…50, L1=1e-5..1e-3 and/or L2=1e-5…1e-3 Try Dropout (input: up to 20%, hidden: up to 50%) with test/validation set. Input dropout is recommended for noisy high-dimensional input. Distributed: More training samples per iteration: faster, but less accuracy? With ADADELTA: Try epsilon = 1e-4,1e-6,1e-8,1e-10, rho = 0.9,0.95,0.99 Without ADADELTA: Try rate = 1e-4…1e-2, rate_annealing = 1e-5…1e-9, momentum_start = 0.5…0.9, momentum_stable = 0.99, momentum_ramp = 1/rate_annealing. Try balance_classes = true for datasets with large class imbalance. Enable force_load_balance for small datasets. Enable replicate_training_data if each node can h0ld all the data. 46
  47. 47. H2O Deep Learning, @ArnoCandel Extensions for H2O Deep Learning 47 - Vision: Convolutional & Pooling Layers PUB-644 - Anomaly Detection PUB-806 - Pre-Training: Stacked Auto-Encoders PUB-1014 - Faster Training: GPGPU support PUB-1013 - Language/Sequences: Recurrent Neural Networks - Benchmark vs other Deep Learning packages - Investigate other optimization algorithms Contribute to H2O! Add your own JIRA tickets!
  48. 48. H2O Deep Learning, @ArnoCandel Key Take-Aways H2O is a distributed in-memory data science platform. It was designed for high-performance machine learning applications on big data. ! H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data! ! Join our Community and Meetups! https://github.com/h2oai http://docs.h2o.ai www.h2o.ai/community @h2oai 48 Thank you!

×