We took Google’s recently open sourced deep learning library called “TensorFlow”, which is single-machine, and built a distributed implementation on Spark. We did this by using the abstraction layer called “Distributed DataFrame”, or DDF. TensorFlow started as an internal research project at Google and has since spread to be used across the organization. Google open sourced the project in Nov. 2015, but kept the distributed version of it closed. There are other deep learning frameworks that have been implemented on Spark, but TensorFlow’s API is elegant and powerful and has proven to work at Google scale. Putting TensorFlow on Spark makes it easier for companies to harness the power of deep learning in their own businesses without sacrificing scalability.
KEY TAKE-AWAYS: (1) Google’s TensorFlow release is a single-machine implementation, (2) We have built a distributed implementation in Spark which allows TensorFlow to scale horizontally.
This talk was presented at the Spark Summit East 2016; watch it here: https://www.youtube.com/watch?v=-QtcP3yRqyM&index=5&list=PLMpF-Q2WyoGudT3z-VPGs-6nnv45cod_R
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Distributed TensorFlow: Scaling Google's Deep Learning Library on Spark
1. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Christopher Nguyen, PhD. | Chris Smith | Ushnish De | Vu Pham | Nanda Kishore
DISTRIBUTED TENSORFLOW:
SCALING GOOGLE'S
DEEP LEARNING LIBRARY ON SPARK
2. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Feb 2016
Distributed
TensorFlow
on Spark
Spark Summit East
June 2016
TensorFlow
Ensembles
Spark
Hadoop Summit
Nov 2015
Distributed
DL on Spark
+ Tachyon
Strata NYC
9. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Google uses TensorFlow for the following:
★ Search
★ Advertising
★ Speech Recognition
★ Photos
★ Maps
★ StreetView—Blurring out house numbers
★ Translate
★ YouTube
Why TensorFlow?
10. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com
DistBelief
Large Scale Distributed Deep Networks, Jeff Dean et al, NIPS 2012
1. Compute
Gradients
2. Update Params
(Descent)
11. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Arimo’s 3 Pillars
App Development
BIG APPS
PREDICTIVE
ANALYTICS
NATURAL
INTERFACES
COLLABO-
RATION
Big Data + Big Compute
14. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Spark & Tachyon Architectural Options
Model as Broadcast
Variable
Spark-Only
Model Stored
as
Tachyon File
Tachyon-
Storage
Model Hosted
in
HTTP Server
Param-
Server
Model Stored
and Updated
by Tachyon
Tachyon-
CoProc
25. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com
GPU vs CPU
★ GPU is 8-10x faster on local; lower gains with Spark but better for larger
data sets
★ On local: GPU 8–10x faster
★ On distributed: GPU 2–3x faster, more with larger datasets
★ GPU memory is limited > Performance impact if hit
★ AWS has old GPUs, need to build TF from source
26. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Distributed Gradients
★ Initial warmup phase on param server needed
★ “Wobble” effect may still happen with convergence due to mini-batches
✦ Drop “obsolete” gradients
✦ Reduce learning rate and initial parameters
✦ Reduce mini-batch size
★ Work to maximize network throughput, e.g., model compression
27. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Deployment Configuration
★ Should run only 1 TF instance per machine
★ Compress gradients/parameters (e.g., as binaries) to reduce network overhead
★ Batch-size tradeoff: convergence vs. communication overhead
29. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Conclusions
★ This implementation best for large datasets & small models
✦ For large models, we’re looking into
a) model parallelism b) model compression
★ If data fits on single machine, single-machine GPU would be best (if you have it)
★ For large datasets, leverage both speed and scale using Spark cluster with GPUs
★ Future Work: a) Tachyon b) Ensembles (Hadoop Summit - June 2016)
30. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Top 10 World’s
Most Innovative
Companies
in Data Science
https://github.com/adatao/tensorspark
VISIT US
BOOTH
K8 We’re OPEN SOURCING
our work at