Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Distributed Deep Learning on Hadoop
Clusters
Andy Feng & Jun Shi
Yahoo! Inc.
Our Talks @ Hadoop Summit
2
 Storm on YARN (2013)
› http://bit.ly/1W02tZy
 Spark on YARN (2014)
› http://bit.ly/1W03dxE
...
Agenda
• Why Deep Learning on Hadoop?
• CaffeOnSpark
– Architecture
– API: Scala + Python
• Demo
– CaffeOnSpark + Python N...
Deep Learning
4
Use Case: Flickr Magic View flickr.com/cameraroll
Yahoo Use Case: Yahoo Weather
6
 Beauty
› Computational
assessed
 Relevant
› Location
› Time
› Cloudy
› Shower
› …
Weath...
Yahoo Vision Kit: Demo
7
(4)
Apply
ML Model
@ Scale
Flickr DL/ML Pipeline
(3)
Non-deep
Learning
@ Scale
* http://bit.ly/1KIDfof by Pierre Garrigues...
Deep Learning vs. Hadoop
9
10
Machine Learning & Deep Learning on Hadoop
11
Hadoop Cluster Enhanced
 GPU servers added
› 4 Tesla K80 cards
• 2 GK210 GPUs, 24GB memory
 Network interface enhance...
Deep Learning Frameworks
 Caffe
› Available since Sept, 2013, 6.3k forks
› Popular in vision community & Yahoo
 TensorFl...
 Released in Feb. 2016
• Apache 2.0 license
• Distributed deep learning
– GPU or CPU
– Ethernet or InfiniBand
• Easily de...
CaffeOnSpark: Scalable Architecture
14
CaffeOnSpark: 19x Speedup (est.)
Training latency (hours)
Top-5ValidationError
CaffeOnSpark: Deployment Options
16
• Single node
– Spark-submit –master local
• Multiple nodes
– Spark-submit –master URL...
Spark CLI
• spark-submit
--num-executors #_Processes
--class com.yahoo.ml.CaffeOnSpark
caffe-on-spark.jar
-devices #_gpus_...
CaffeOnSpark: One Program (Scala)
http://bit.ly/21ZY1c2
18
cos = new CaffeOnSpark(ctx) conf = new Config(ctx, args).init()...
CaffeOnSpark: One Notebook (Python)
http://bit.ly/1REZ0cN
19
20
CaffeOnSpark: UI & Logs
Demo: CaffeOnSpark on EC2
 https://github.com/yahoo/CaffeOnSpark/wiki
› Get started on EC2
› Python for CaffeOnSpark
CaffeOnSpark: What’s Next?
 Validation within training
 Enhanced data layer
 RNN and LSTM
 Java API
 Asynchronous dis...
Related Work: SparkNet & DL4J
1) [driver] sc.broadcast(model) to executors
2) [executor] apply DL training against a mini-...
Summary
24
 Yahoo Hadoop clusters enhanced for deep learning
› GPU nodes + CPU nodes
› Infiniband network for fast commun...
25
Thank You!
Repo: github.com/yahoo/CaffeOnSpark
Email: caffeonspark-users@googlegroups.com
Próxima SlideShare
Cargando en…5
×

Distributed Deep Learning on Hadoop Clusters

6.011 visualizaciones

Publicado el

Distributed Deep Learning on Hadoop Clusters

Publicado en: Tecnología
  • Sé el primero en comentar

Distributed Deep Learning on Hadoop Clusters

  1. 1. Distributed Deep Learning on Hadoop Clusters Andy Feng & Jun Shi Yahoo! Inc.
  2. 2. Our Talks @ Hadoop Summit 2  Storm on YARN (2013) › http://bit.ly/1W02tZy  Spark on YARN (2014) › http://bit.ly/1W03dxE  Machine Learning on Hadoop/Spark (2015) › http://bit.ly/1NW3GvO
  3. 3. Agenda • Why Deep Learning on Hadoop? • CaffeOnSpark – Architecture – API: Scala + Python • Demo – CaffeOnSpark + Python Notebook
  4. 4. Deep Learning 4
  5. 5. Use Case: Flickr Magic View flickr.com/cameraroll
  6. 6. Yahoo Use Case: Yahoo Weather 6  Beauty › Computational assessed  Relevant › Location › Time › Cloudy › Shower › … Weather App Yahoo Weather App
  7. 7. Yahoo Vision Kit: Demo 7
  8. 8. (4) Apply ML Model @ Scale Flickr DL/ML Pipeline (3) Non-deep Learning @ Scale * http://bit.ly/1KIDfof by Pierre Garrigues, Deep Learning Summit 2015 (2) Deep Learning @ Scale (1) Prepare Datasets @ Scale * 10 billion photos * 7.5 million per day
  9. 9. Deep Learning vs. Hadoop 9
  10. 10. 10 Machine Learning & Deep Learning on Hadoop
  11. 11. 11 Hadoop Cluster Enhanced  GPU servers added › 4 Tesla K80 cards • 2 GK210 GPUs, 24GB memory  Network interface enhanced › InfiniBand for direct access to GPU memory › Ethernet for external communication
  12. 12. Deep Learning Frameworks  Caffe › Available since Sept, 2013, 6.3k forks › Popular in vision community & Yahoo  TensorFlow › Released in Nov. 2015, 9.8k forks  Theano, Torch, DL4J, etc.
  13. 13.  Released in Feb. 2016 • Apache 2.0 license • Distributed deep learning – GPU or CPU – Ethernet or InfiniBand • Easily deployed on public cloud or private cloud 13 CaffeOnSpark Open Sourced github.com/yahoo/CaffeOnSpark
  14. 14. CaffeOnSpark: Scalable Architecture 14
  15. 15. CaffeOnSpark: 19x Speedup (est.) Training latency (hours) Top-5ValidationError
  16. 16. CaffeOnSpark: Deployment Options 16 • Single node – Spark-submit –master local • Multiple nodes – Spark-submit –master URL –connection ethernet – Ex. EC2 – Spark-submit –master URL –connection infiniband – Ex., Yahoo Hadoop cluster
  17. 17. Spark CLI • spark-submit --num-executors #_Processes --class com.yahoo.ml.CaffeOnSpark caffe-on-spark.jar -devices #_gpus_per_proc -conf solver_config_file -model model_file -train | -test | -feature Caffe Configuration layer { name: "data" type: "MemoryData" source_class=“com.yahoo.ml.caffe.LMDB” memory_data_param { source: ”hdfs:///mnist/trainingdata/" batch_size: 64; channels: 1; height: 28; width: 28; } … } 17 CaffeOnSpark: DL Made Easy
  18. 18. CaffeOnSpark: One Program (Scala) http://bit.ly/21ZY1c2 18 cos = new CaffeOnSpark(ctx) conf = new Config(ctx, args).init() // (1) training DL model dl_train_source = DataSource.getSource(conf, true) cos.train(dl_train_source) // (2) extract features via DL lr_raw_source = DataSource.getSource(conf, false) ext_df = cos.features(lr_raw_source) // (3) apply ML lr_input=ext_df.withColumn(“L", cos.floats2doubleUDF(ext_df(conf.label))) .withColumn(“F", cos.floats2doublesUDF(ext_df(conf.features(0)))) lr = new LogisticRegression().setLabelCol(”L").setFeaturesCol(”F") lr_model = lr.fit(lr_input_df) Non-deep Learning DeepLearning
  19. 19. CaffeOnSpark: One Notebook (Python) http://bit.ly/1REZ0cN 19
  20. 20. 20 CaffeOnSpark: UI & Logs
  21. 21. Demo: CaffeOnSpark on EC2  https://github.com/yahoo/CaffeOnSpark/wiki › Get started on EC2 › Python for CaffeOnSpark
  22. 22. CaffeOnSpark: What’s Next?  Validation within training  Enhanced data layer  RNN and LSTM  Java API  Asynchronous distributed training
  23. 23. Related Work: SparkNet & DL4J 1) [driver] sc.broadcast(model) to executors 2) [executor] apply DL training against a mini-batch of dataset to update models locally 3) [driver] aggregate(models) to produce a new model REPEAT Driver
  24. 24. Summary 24  Yahoo Hadoop clusters enhanced for deep learning › GPU nodes + CPU nodes › Infiniband network for fast communication  CaffeOnSpark open sourced › Empower Flickr and other Yahoo services • In production since Q3 2015 • Reduced training latency, and improved accuracy › Scalable deep learning made easy • spark-submit on your Spark cluster
  25. 25. 25 Thank You! Repo: github.com/yahoo/CaffeOnSpark Email: caffeonspark-users@googlegroups.com

×