Simplify Distributed TensorFlow Training for Fast Image Categorization at Starbucks

STARBUCKS
TECHNOLOGY
Simplifying Deep Learning
with HorovodRunner at Starbucks

About the presenters
Denny Lee
Denny Lee is a Technology
Evangelist with Databricks; he
is a hands-on data sciences
engineer with more than 15
years of experience
developing internet-scale
infrastructure, data platforms,
and distributed systems for
both on-premises and cloud.
His key focuses surround
solving complex large scale
data problems – providing not
only architectural direction
but the hands-on
implementation of these
systems.
Vishwanath Subramanian is a
Director of Data and Analytics
Engineering at Starbucks.
Vishwanath has over 15 years of
experience with a background in
distributed systems, product
management, software
engineering and Analytics.
At Starbucks, his key focus is on
providing Next Generation
Analytics platforms and enabling
large scale data processing and
machine learning to enable
Business Intelligence and Data
Services across Starbucks.
Vishwanath Subramanian

Scenarios
• On-Demand one click Provisioning
of Seamlessly integrated
Infrastructure Bill of Material for
Data Science and Intelligent Apps.
• Secured Connectivity to Enterprise
Data Platform completely
abstracted from Analytics teams.
• Solution template containing
organization of deployments to
enable Adhoc experiments, shared
data engineering and Intelligent
App Development
• Smarter checkout experiences
• Predicting customer traffic
• Planogram Analysis
• And more…

Current State
• Solving complex / streaming image and video analytics is
hard
• It also typically involves distributing the problem to multiple
nodes
• But how do I perform Keras+TensorFlow on a distributed
environment?

Convolutional Neural Networks
28 x 28 28 x 28 14 x 14
Convolution
32 filters
Convolution
64 filters
Subsampling
Stride (2,2)
Feature Extraction Classification
0
1
8
9
FullyConnected
Dropout

DEMO
Running Keras CNNs Standalone
Keras, TensorFlow, HorovodRunner, and MLflow: https://dbricks.co/2D58PDw

Introducing HorovodRunner
App Development
• HorovodRunner is a general API to run distributed learning workloads
on Databricks using Uber’s Horovod framework
• Combining Horovod with Apache Spark’s barrier mode allows longer-
running deep learning training jobs
• A Horovod MPI job is embedded as a Spark job using barrier
execution mode

HorovodRunner
• HorovodRunner takes a Python
method that contains DL training code
with Horovod hooks
• The first executor collects the IP
address of all of the task executors
using BarrierTaskContext
• Then it triggers a Horovod job using
mpirun.
• Each Python MPI process loads the
pickled program back, deserializes it,
and runs it.

HorovodRunner
driver
workers
runCNN():
model.add(Conv2D(32, …))
model.add(Conv2D(64, …))
model.add(MaxPooling2D(…))
model.add(Dense(128, …)
model.add(Dense(10, ’softmax’)
optimizer = keras.optimizers
.Adadelta(1.0)
In standalone or hvd local mode, the code is running on the driver

HorovodRunner
driver
workers
variables
runCNN_hvd():
hvd.init()
config.tf.ConfigProto()
# Original code
runCNN()
callbacks = []
With HorovodRunner, we wrap the original code and
code and variables are pushed to the workers

HorovodRunner
driver
workers
With HorovodRunner, we wrap the original code and
code and variables are pushed to the workers

HorovodRunner
driver
workers
Variables are transferred from driver to workers
Code is executed at the workers

Migrate to HorovodRunner
App Development
# Primary code differences are noted below
+ hvd.init()
+ config.tfConfigProto()
+ config.gpu_options.allow_growth = True
+ config.gpu_options.visible_device_list = str(hvd.local_rank())
+ epochs = int(math.ceil(12.0 / hvd.size()))
+ callbacks = [
+ hvd.callbacks.BroadcastGlobalVariablesCallback(0),
+ ]

Comparing the runs using MLflow
App Development

DEMO
Object Detection
Keras, TensorFlow, HorovodRunner, and MLflow

Object Detection Approaches
RCNN (2012)
• Region proposal algorithms - give you a set of regions in the image that are likely
to contain objects.
• Run those images in the bounding boxes to a pre-trained alexnet to compute
the features for that bounding box.
• Support vector machine, to classify what the object in the image is of.
• Run the box through a linear regression model to output tighter coordinates
for the box.
• RCNN -> Fast RCNN ->Faster RCNN
Rich feature hierarchies for accurate object detection and semantic segmentation - Girshick, Donahue, Darrell, Malik
Fast R-CNN - Girshick
Faster R-CNN: Towards Real-Time ObjectDetection with Region Proposal Networks - Ren, He, Girshick, Su

Object Detection Approaches (contd.)
• YOLO – detection as a regression problem
• Not a traditional classifier
• Divide image into grid, each cell is responsible for predicting n bounding boxes
• Output confidence score that predicted bounding box
• Gives a probability distribution of all the classes its trained on
• Confidence score and class prediction is combined is combined into a score for
object classification
• Based on threshold, we determine relevant boxes.
• All the boxes fed to the neural network all at once.
You Only Look Once: Unified, Real-Time Object Detection - Redmon, Divvala, Girshick, Farhadi

A TALENTED TECHNOLOGISTS
DELIVERING TODAY
aavaLEADING INTO THE FUTURE
https://www.starbucks.com/careers/

Simplify Distributed TensorFlow Training for Fast Image Categorization at Starbucks

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Simplify Distributed TensorFlow Training for Fast Image Categorization at Starbucks

Similar a Simplify Distributed TensorFlow Training for Fast Image Categorization at Starbucks (20)

Más de Databricks

Más de Databricks (20)

Último

Último (20)

Simplify Distributed TensorFlow Training for Fast Image Categorization at Starbucks