Accelerating Machine Learning on Databricks Runtime

WIFI SSID:SparkAISummit | Password: UnifiedAnalytics

Hossein Falaki & Yifan Cao, Databricks Inc.
Accelerating Machine
Learning on Databricks
Runtime
#UnifiedAnalytics #SparkAISummit

Outline
Databricks Runtime for ML
Use Case Examples
Under the Hood
Demo
What is Next
3#UnifiedAnalytics #SparkAISummit

Broad Adoption of ML
and many more customers in different industries and segments
Internet of ThingsDigital Personalization
Disruptive innovations are affecting most enterprises on the planet
Healthcare and Genomics Fraud Prevention

Hidden Tech Debt in ML Systems
ML
Code
Configuration
Data Collection
Data
Verification
Feature
Extraction
Machine
Resource
Management
Analysis Tools
Process
Management Tools
Serving
Infrastructure
Monitoring
Small fraction of real-world ML systems is composed of the ML code, as shown by the small green
box in the middle. The required surrounding infrastructure is vast and complex.
“Hidden Technical Debt in Machine Learning Systems,” Google NIPS 2015

ML Runtime: Job To Be Done
• As an ML practitioner
1. I want to quickly start with my ML project
• Today I have to spend many hours setting up environments
2. I want a single runtime for
• I don’t want to move data and code around
all steps of my work

ML Project Stages
Databricks Runtime for ML
Build
Models
ProductionizePrepare
Quality Data

What is Databricks Runtime for ML?
A ready to use environment for machine learning and data science
Built on top of and updated with every Databricks Runtime release
APIs for distributed deep learning on Spark (HorovodRunner)
Performance improvement for popular distributed algorithms in Spark
(GraphFrames, logistic regression and tree classifiers)

What is Databricks Runtime for ML?
ML Environment is
setup on all cluster
nodes with a single
click.

1. Prepare Data
Easily access, explore, and visualize
data in collaborative notebooks
Prepare data sets at scale with:
o Scala / Python / R / SQL
o Optimized Apache Spark
o Structured Streaming
o Delta Lake
o Persisted data meta store
Quickly automate notebooks with jobs

2. Build Models
Support for popular open source ML
frameworks
• TensorFlow and Tensorboard
• PyTorch
• Keras
• Horovod for distributed DL
• XGBoost
• GraphFrames
• Popular single node tools
in Python and R

3. Productionize ML Models
Model Deployment
MLflow API for inference on
third-party services like Docker
containers, AzureML on Azure,
SageMaker on AWS
Databricks Runtime for ML includes
mleap for model serialization.

Use Case Examples

15
Challenge
• 325,000 listed hotels, massive volume of image files
• Apply ML to improve match between traveler and hotels with personalized viewing
experience
Solution
• Leverage Databricks to train DL models on 100% of image data
• Increase processing power by 20X and enable real-time scoring
Result
• Hotels.com significantly improved customer engagement and conversions by improving
personalization models
• Customer Case Study: databricks.com/customers/hotels-com
Vision

16
Challenge
• >100 million gamers every month
• 2% of all games infected by serious toxicity
Solution
• Leveraged Databricks to apply NLP & ML to proactively identify abusive language
• Scaled training on much larger dataset and hyperparameter tuning
Result
• Riot Games increased customer satisfaction, retention, and lifetime value by detecting
abusive language in real-time
• Customer Case Study: databricks.com/customers/riot-games
NLP

17
Challenge
• Offer insights to what consumers buy and watch
• Scale from single-machine data science to large datasets to improve product offerings
Solution
• Leveraged Databricks to ensure collaboration across teams
• Reduced annual cost by 40% and improved model performance by 1/3
Result
• Nielsen improved competitive offering by applying ML to batch & live stream data from IOT
devices
• Customer Case Study: databricks.com/customers/nielsen
IOT

Under the Hood

High-level Engineering Goals
• Reproducible environments
– Package & dependency management
• Testability
– Testing & QA infrastructure and process
• Cross-compatibility
– Careful configuration of all packages to be compatible
• Performance optimization
– High-performance I/O

Package Management
• Package management
• Environment management
– Python 2.x & Python 3.x environments
• Environment is selected during cluster setup
• Latest stable versions from Anaconda
distribution

Python Environments
• ML Runtime vs. Databricks Runtime
– Upgraded packages
– Conda vs. pip
– Additional ML packages
• MKL for CPU acceleration
• CUDA & cuDNN for GPU acceleration

Dependency Management
• bazel for build system
• Audit files for change detection
– Python: Conda
– JAR: maven
– R: MRAN
– Native: Ubuntu APT and Docker

Docker Containers
• We internally use Docker to build Databricks
Runtime images
– Full control over content
– Reproducible and automated
• Runtime for ML is a layer on top of DBR
– MLR benefits from all existing DBR tests and QA
– MLR gets every hotfix and patch that goes into DBR

Extensive Integration Testing
• Extensive tests for top-tier packages
• Each commit runs unit and integration tests
• Nightly tests on master and released branches
• All CPU and GPU instances on Azure & AWS
• Integration Tests:
– Launch a docker container and run code
– Launch a cluster and execute notebooks

High Performance FUSE
• Why Filesystem in userspace (FUSE)?
• We use high-throughput FUSE clients for ML/DL
– Azure Storage FUSE on Azure
– Goofys on AWS
• The mounts points are pre-configured on ML
Runtime at dfbs:/ml

Demo

What is Next?

GA of Runtime for ML
• Release history:
– 4.1 Beta: June 2018
– …
– 5.3 GA: April 2019
– 5.4: May 2019
– 6.0: Second Half 2019

Roadmap for Environment
• DBR with Conda (Beta)
– Enable customizable environment
– Databricks Runtime & Databricks Runtime for ML will
continue to be supported
• 6.0
– Unify all into single Runtime
– Considering removing Python 2.x

Our Vision for 6.0

DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Accelerating Machine Learning on Databricks Runtime

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Accelerating Machine Learning on Databricks Runtime

Similar to Accelerating Machine Learning on Databricks Runtime (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Accelerating Machine Learning on Databricks Runtime