SlideShare una empresa de Scribd logo
1 de 37
Inferno
Scalable Deep Learning on Spark
Matthias Langer
m.langer@latrobe.edu.au
Dr. Zhen He
z.he@latrobe.edu.au
Prof. Wenny Rahayu
w.rahayu@latrobe.edu.au
Department of Computer Science &
Computer Engineering
Topics
• Deep Learning – Introduction
• Spark & Deep Learning
• Our solution:
La Trobe University’s Deep Learning System
• Conclusion, Timeline, Q&A
Deep Learning
Introduction
Source: CerCo (Brain and Cognition Research Centre), Toulouse
Object/Action Recognition
• Automatic Captioning
• Navigating Artificial Agents
• Deep Learning performs
100% better than the best
non-deep learning algorithms
in many Computer Vision
tasks!
Source: Research @ Facebook (left), google.com/selfdrivingcar (right)
Voice Recognition
• Deep Learning performs 30%
better than the best non-deep
learning algorithms!
Natural Language Processing
• Translation
• Thought Vector Q&A
• …
• Deep Learning tends to perform
“better” than traditional machine
learning algorithms!
Source: Google Inc. / Google Translate
Source: GoogleBrain; Google, Inc.
Spark & DL
How they could be an ideal tandem, but there
are challenges…
Why do you want to use a cluster to
train Deep Neural Networks?
Deep Learning is SLOW
• Highly scalable
• No relevant hardware limits
• Extensible
Two approaches to speed up DL
Scaling Up Scaling Out
• Superior scaling until fundamental
limits of the hardware are reached
 Max. the number of PCIe lanes
 Max. read speed of HDD
 Costs scale up non-linear
(DGX-1 = $129,000)
Source: https://developer.nvidia.com/devbox
You already have all your valuable data in Spark/Hadoop
DL (often) requires a lot of data to train
Need a lot of memory
Pre-processing has massive of I/O requirements
(disk & network)
More reasons why you would want to use
Hadoop/Spark for DL?
&
How could you implement
DL on Spark?
Worker 1 Worker 2 Worker 3
𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯
Master
𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯
= mini-batch of data
Draw mini-batch
Map:
Compute updated model in
each worker
Reduce:
Assemble into “better” model
via Master node
Broadcast “better” model
and repeat
Spark RDD
𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯
Comp
ute
5%
Comm
unicat
ion
95%
Problem 1:
Big Parameters = High shuffle cost!
Worker 1 Worker 2 Worker 3
𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯
Master
𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯
Reduce models
(at best 5 s over 1 GbE)
Broadcast combined model
(at best 5 s over 1 GbE)
500 MB 500 MB 500 MB
500 MB
Compute updated models
(typically 50 – 500 ms)
Problem 2:
Node communication is synchronous
Worker 1 Worker 2 Worker 3
𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯
Master
𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯
Bottleneck!
Blaze
La Trobe University DL-System
Cluster Single Machine
Blaze
Scala based standalone
deep learning system
CUBlaze
CUBlaze
GPU acceleration for Blaze
Inferno
Inferno
Coordinates distributed
computation of Blaze
models in synchronous
Spark environment
A (probably biased) comparison
Inferno SparkNet (Caffe) CaffeOnSpark deeplearning4j H2O
ConvNets, AutoEncoders, etc. planned
Communication protocol during
training
Spark MR Spark MR MPI/RDMA
Spark MR among
others
Grpc/MPI/RDMA
Build Complex models (e.g. ResNet) some
Dynamic branching support
(Path altering / dropping)
Pluggable preprocessing Pipeline partial
Pluggable update policies for
hyper parameters
Pluggable & visualizable
online cross validation
Entire execution path determined
in single runtime environment
Model description language JVM Code Config File Config File JVM Code multiple
GPU acceleration
Blaze
CUBlaze
Inferno
Blaze
High-Performance Deep Learning Engine
Module Library
• Standard Modules
 Add-Bias (C/U/S/B), Immediate-Filter (C/U/S/B)
 Convolution, Convolution-Decoder, Linear, Linear-
Decoder, Locally-Connected, Locally-Connected-Decoder
 L2-Pooling, Max-Pooling, Mean-Pooling,
 Batch-Normalization , Dropout, LCN, LRN, Normalization
(C/U/S/B), Reshape, Weight-Decay (L1/L2)
• Nonlinearities
Abs, Add-Noise, ELU, Exp, Hard-Tanh, LeakyReLU,
Ln, Pow, PReLU, ReLU, ReQU, (Log-)Sigmoid,
SmoothAbs, (Log-)Softmax, SoftPlus, Sq, Sqrt,
SReLU, Tanh
• Optimizers
AdaDelta, AdaGrad, Adam,
ConjugateGradientDescent, Rprop, RMSProp,
SGD (traditional, local learning rates, momentum)
• Constraints (can inject everywhere!)
BCE, ClassLL, ClassNLL, KLDivergence, MSE
• Containers
Sequence, Auto-Encoder, Branch (Parallel)
• Branching
Alternate-Path, Drop-Path, Random-Path
• Tensor Tables Operations
Select, Concatenate (C/U/S/B), Merge
(add/mean/lerp)
• Visualization & Benchmarking
Benchmark-Wrapper, Visualize-Histogram
Visualize-MeanAndStdDev (C/U/S/B)
C/U/S/B = These operations can be applied either on [C]hannel, [U]nit, [S]ample or [B]atch level.
Performance – AlexNet OWT
All benchmarks done using NVIDIA TitanX GPUs on comparable setups; Source: https://github.com/soumith/convnet-benchmarks
27 26
37
31
42
121
132
53 55 56
72
135
203
210
TORCH
(CUDNN)
TENSORFLOW CUBLAZE
(1 GB WS LIMIT)
TORCH
(FBFFT)
CUDACONVNET2 CAFFE
(NATIVE)
TORCH-7
(NATIVE)
forward (ms) backward (ms)
Performance – VGG A
162 158 167
355
408
323
350331
382 378
737
821
745 755
TORCH
(CUDNN)
TENSORFLOW CUBLAZE
(1 GB WS LIMIT)
TORCH
(FBFFT)
CUDACONVNET2 CAFFE
(NATIVE)
TORCH-7
(NATIVE)
forward (ms) backward (ms)
All benchmarks done using NVIDIA TitanX GPUs on comparable setups; Source: https://github.com/soumith/convnet-benchmarks
Cached Sample
…
Cached Sample
Cached Sample
How Blaze works
(example)
PrefetcherModel
(fprop only)
Augmenter
Weights
(fixed)
Sample
Merger
Data Source
(HDD, SparkRDD, HDFS)
Optimizer
Model
Weights
(tunable)
Hyper
Param.
Hyper
Param.
Objectives
Hyper
Param.
Scope
Delimiter
Terminal,
File,
Showoff,
etc.
Easy Setup: Model
• Blaze automatically infers most layer parameters based on the actual input
• Usually no need to specify input and output dimensions or whether to use CPU or GPU
val noClasses = 100
// Kernels
val kernelConv1 = Kernel2D(dims = (11, 11), stride = (4, 4), padding = (2, 2))
val kernelConv2 = Kernel2D.centered((3, 3))
val kernelPool = Kernel2D((3, 3), (2, 2))
// Layers
val bias = AddBiasBuilder()
val relu = ReLUBuilder()
val lrn = LateralResponseNormalizationBuilder(n = 5, k = 2, alpha = 1e-4f, beta = 0.75f)
val pool = MaxPoolingBuilder(kernelPool)
// Lego!
val mb = SequenceBuilder(
ConvolutionFilterBuilder(kernelConv1, 48), bias, relu, pool, lrn,
ConvolutionFilterBuilder(kernelConv2, 192), bias, relu,
ConvolutionFilterBuilder(kernelConv2, 128), bias, relu, pool,
ReshapeBuilder.collapseDimensions(),
LinearBuilder(noClasses), bias,
SoftmaxBuilder(), ClassLLConstraintBuilder()
)
Easy Setup: CPU and GPU
• Blaze maintains a variant table for each module type.
• When you “build” an instance of a module, all variants are scored and the
“best” variant for the current situation is selected automatically.
 You can configure what “best” means.
// Input data
val data = Array[Batch](...)
// Inspect batches
val hints = BuildHints.derive(data)
// Build compatible model
val m = mb.build(hints)
19:25:20 INFO Scoring ConvolutionFilter[Kernel2[(3, 3), (1, 1)] x 2, 0/1 = filter]:
19:25:20 DEBUG 0000800a => CUDA_CUDNN, preferred, input type matches
19:25:20 DEBUG 0000400a => JVM_BLAS_IMPLICITMM, preferred
19:25:20 DEBUG 00000004 => JVM_BLAS_MM
19:25:20 DEBUG 0000000a => JVM_BREEZE_MM, preferred
19:25:20 DEBUG 00000002 => JVM_BREEZE_SPARSEMM
19:25:20 INFO CUDA_CUDNN selected!
Working with large models!
val mb = SequenceBuilder(...)
val hints = ...
val g = mb.toGraph(hints)
SvgRenderer.render(g)
Visualizing
pre-processing
pipelines
val apb = AsynchronousPrefetcherBuilder(...)
val g = apb.toGraph()
SvgRenderer.render(g)
Easy Setup: Optimizer
val ob = MomentumBuilder()
// Configure Hyper-Parameters
ob.learningRate = DiscreteStepsBuilder(
0 -> 1e-2f,
40000 -> 1e-3f,
80000 -> 1e-4f
)
// Setup Objectives
ob.objectives += IterationCountLimitBuilder(1000)
+= CrossValidationBuilder(dataSource, ... preprocessing pipeline ...)
+= PrintStatusBuilder()
>> FileSinkBuilder(HadoopFileHandle.userHome ++ "results/optimization.log")
+= objectives.Presets.visualizePerformance()
>> ShowoffSinkBuilder("Cross Validation Performance")
// Add more advanced stuff like Regularizers...
// Go!
val o = ob.build(m, dataSource)
o.run()
Other Features
• Tensor Memory Management
 Automatically monitors the dependencies between all tensors
 Reallocates space occupied by unneeded tensors on the fly
 Will automatically toggle “inPlace” processing when it is safe
• Intermediate results are stored separate from model
 Forward passes yield backpropagation contexts that can be consumed or discarded
at any time.
 Very interesting property for:
 Live Query/Training
 Fancy Optimizers
 Hyper Parameter Search
Saves up to
40%
GPU memory
during training!
Blaze
CUBlaze
Inferno
Inferno
Training Deep Learning Models faster
with Apache Spark
Starting an Inferno cluster
Spark
Conf
Cluster
Coordinator
Cluster
FileRDD
Spark BinaryRDD Inferno FileRDD
50,000 files / 50 dirs 689 s 6 s
1,300,000 files / 1000 dirs > 9999 s (gave up) 35 s
689 s
6 s 35 s
Loading meta-data of HDFS files
Claim
Assess
Tailor
Spark
Context
Sample
Data
RDD
Load hdfs://…
Create Samples
Load Plugins
(e.g. CUBlaze)
run()build()
cache()
cache()
cache()
Distributed Optimizer
Blaze Model
Blaze
Optimizer
Pre-
processing
Pipeline
Inferno
Optimizer
Sample
Data
RDD
Cluster
Coordinator
Weights
Hyper
Param.
Objectives
Hyper
Param.
Scope
Delimiter
Hyper
Param.
Objectives
Scope
Delimiter
Cluster
Applied with
cluster wide
focus.
Applied independently
in each worker.
57 minutes
2 hours, 42 minutes
Performance
ResNet 34 on ImageNet
Blaze
2 x 8 core Xeon CPU + 1 x NVIDIA TitanX
Inferno (over 1 GbE)
8 x 8 core Xeon CPU + 4 x NIVIDA TitanX
Reached 20% Top1 accuracy 2.84 times faster!
Performance
PreAct ResNet 152 on ImageNet
0%
10%
20%
30%
40%
50%
60%
70%
80%
0 h 10 h 20 h 30 h 40 h 50 h
1x TitanX - Top 1 Accuracy
1x TitanX - Top 5 Accuracy
Inferno Cluster (5x TitanX, 1 GbE) - Top 1 Accuracy
Inferno Cluster (5x TitanX, 1 GbE) - Top 5 Accuracy
Reached 30% Top1 accuracy 4.81 times faster using 5 GPUs!*
* 6.8 ℎ vs. 32.7 ℎ
Conclusion
• Blaze & CUBlaze
 Fast
 Huge extensible module library
 Easy to use
• Inferno
 Allows you to accelerate Blaze DL tasks on Spark
 Uses Spark MR methods for all data transmissions:
 Can run rather nicely along with other Spark jobs.
 Can be used without high-speed / low latency equipment
(usually required to make RDMA solutions perform well)
 Plain old (and even slow) Ethernet is enough!
* Note that using “Showoff” to visualize progress may open separate HTTP connections to the Showoff-Server.
Where can I get it?
• Blaze & CUBlaze & Example Code
Stable, we train models using it for months already. A snapshot of the current stable release
is available at:
https://github.com/bashimao/ltudl (Apache License 2.0)
• Showoff
Multi-purpose live visualization system developed by Aiden Nibali (La Trobe University):
https://github.com/anibali/showoff
• Inferno
 I am writing a paper about Inferno’s optimization system right now.
 Once it has been accepted for publication, we will release the full source code on GitHub.
 The best way to prepare for Inferno, is to download Blaze now and to get familiar with it.
Questions?
Matthias Langer, PhD cand.
m.langer@latrobe.edu.au
Supervisors:
Dr. Zhen He
z.he@latrobe.edu.au
Prof. Wenny Rahayu
w.rahayu@latrobe.edu.au
Deep Learning & Spark @ LaTrobe
Students
• Master of Data Science degree
 http://tinyurl.com/hf4wmn2
 Advanced data science lab established in 2016 with newest hardware.
 CSE5BDC
Big Data Management on the Cloud (I tutor this!)
 CSE5DEV
Data Exploration and Visualization
(~50% lectures on deep learning)
 CSE5WDC
Web Development on the Cloud
• Research
 GPU research cluster capable of running distributed deep learning
tasks.
 In-house development of a distributed deep learning system.
 Dedicated research group working with various Deep Learning systems.
 CSE4DLJ
Weekly Deep Learning Journal Club
Industry
• If you have a data analytics problem:
 … we have a dedicated deep learning research team!
 … and probably also a deep learning solution for it!
• Spark & Deep Learning workshops for Torch
available on demand.
• Past & current machine learning research
collaborations
 Alfred Hospital
 ZenDesk
 AIS (Australian Institute for Sports)
• Contact: z.he@latobe.edu.au

Más contenido relacionado

La actualidad más candente

Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Spark Summit
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormLester Martin
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleDataWorks Summit/Hadoop Summit
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache SparkJen Aman
 
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.Shelan Perera
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Spark Summit
 
CaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use CasesCaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use CasesDataWorks Summit
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentDataWorks Summit
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangDatabricks
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationshadooparchbook
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs stormTrong Ton
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaAndrew Montalenti
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitDataWorks Summit
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkRDataWorks Summit
 
TeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage DevicesTeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage DevicesDatabricks
 

La actualidad más candente (20)

Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
 
CaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use CasesCaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use Cases
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric Liang
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs storm
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkR
 
TeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage DevicesTeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage Devices
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 

Destacado

Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...DataWorks Summit/Hadoop Summit
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On SparkSpark Summit
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer BuildPetteriTeikariPhD
 
HTM & Apache Flink (2016-06-27)
HTM & Apache Flink (2016-06-27)HTM & Apache Flink (2016-06-27)
HTM & Apache Flink (2016-06-27)Eron Wright
 
Democratizing AI with Apache Spark
Democratizing AI with Apache SparkDemocratizing AI with Apache Spark
Democratizing AI with Apache SparkSpark Summit
 
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingLess is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingZhe Zhang
 
Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Sangjin Lee
 
IoT Agents (With Lightweight M2M)
IoT Agents (With Lightweight M2M)IoT Agents (With Lightweight M2M)
IoT Agents (With Lightweight M2M)dmoranj
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin DataWorks Summit/Hadoop Summit
 
Advanced Spark Meetup - Jan 12, 2016
Advanced Spark Meetup - Jan 12, 2016Advanced Spark Meetup - Jan 12, 2016
Advanced Spark Meetup - Jan 12, 2016Michelle Casbon
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingSandeep Tammu
 
Smart Data Webinar: Advances in Natural Language Processing
Smart Data Webinar: Advances in Natural Language ProcessingSmart Data Webinar: Advances in Natural Language Processing
Smart Data Webinar: Advances in Natural Language ProcessingDATAVERSITY
 

Destacado (20)

Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
#HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course #HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer Build
 
HTM & Apache Flink (2016-06-27)
HTM & Apache Flink (2016-06-27)HTM & Apache Flink (2016-06-27)
HTM & Apache Flink (2016-06-27)
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Democratizing AI with Apache Spark
Democratizing AI with Apache SparkDemocratizing AI with Apache Spark
Democratizing AI with Apache Spark
 
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingLess is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
 
Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)
 
IoT Agents (With Lightweight M2M)
IoT Agents (With Lightweight M2M)IoT Agents (With Lightweight M2M)
IoT Agents (With Lightweight M2M)
 
Spark meets Smart Meters
Spark meets Smart MetersSpark meets Smart Meters
Spark meets Smart Meters
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
Advanced Spark Meetup - Jan 12, 2016
Advanced Spark Meetup - Jan 12, 2016Advanced Spark Meetup - Jan 12, 2016
Advanced Spark Meetup - Jan 12, 2016
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
 
Smart Data Webinar: Advances in Natural Language Processing
Smart Data Webinar: Advances in Natural Language ProcessingSmart Data Webinar: Advances in Natural Language Processing
Smart Data Webinar: Advances in Natural Language Processing
 

Similar a Inferno Scalable Deep Learning on Spark

Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetAmazon Web Services
 
Perl at SkyCon'12
Perl at SkyCon'12Perl at SkyCon'12
Perl at SkyCon'12Tim Bunce
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)RichardWarburton
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonJAXLondon2014
 
Resource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkResource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkDatabricks
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiDatabricks
 
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Tokyo Institute of Technology
 
H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14Sri Ambati
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit
 
Learning Puppet basic thing
Learning Puppet basic thing Learning Puppet basic thing
Learning Puppet basic thing DaeHyung Lee
 
Parallel Programming: Beyond the Critical Section
Parallel Programming: Beyond the Critical SectionParallel Programming: Beyond the Critical Section
Parallel Programming: Beyond the Critical SectionTony Albrecht
 
Sedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsSedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsIvan Shcheklein
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Haskell for data science
Haskell for data scienceHaskell for data science
Haskell for data scienceJohn Cant
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Sparksamthemonad
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
 
How to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningHow to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningSri Ambati
 

Similar a Inferno Scalable Deep Learning on Spark (20)

Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
 
Perl at SkyCon'12
Perl at SkyCon'12Perl at SkyCon'12
Perl at SkyCon'12
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
 
Resource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkResource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache Spark
 
MXNet Workshop
MXNet WorkshopMXNet Workshop
MXNet Workshop
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
 
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
 
H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 
Learning Puppet basic thing
Learning Puppet basic thing Learning Puppet basic thing
Learning Puppet basic thing
 
Parallel Programming: Beyond the Critical Section
Parallel Programming: Beyond the Critical SectionParallel Programming: Beyond the Critical Section
Parallel Programming: Beyond the Critical Section
 
Sedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsSedna XML Database: Executor Internals
Sedna XML Database: Executor Internals
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
Haskell for data science
Haskell for data scienceHaskell for data science
Haskell for data science
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
How to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningHow to win data science competitions with Deep Learning
How to win data science competitions with Deep Learning
 

Más de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Más de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Último (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Inferno Scalable Deep Learning on Spark

  • 1. Inferno Scalable Deep Learning on Spark Matthias Langer m.langer@latrobe.edu.au Dr. Zhen He z.he@latrobe.edu.au Prof. Wenny Rahayu w.rahayu@latrobe.edu.au Department of Computer Science & Computer Engineering
  • 2. Topics • Deep Learning – Introduction • Spark & Deep Learning • Our solution: La Trobe University’s Deep Learning System • Conclusion, Timeline, Q&A
  • 4. Source: CerCo (Brain and Cognition Research Centre), Toulouse
  • 5. Object/Action Recognition • Automatic Captioning • Navigating Artificial Agents • Deep Learning performs 100% better than the best non-deep learning algorithms in many Computer Vision tasks! Source: Research @ Facebook (left), google.com/selfdrivingcar (right)
  • 6. Voice Recognition • Deep Learning performs 30% better than the best non-deep learning algorithms!
  • 7. Natural Language Processing • Translation • Thought Vector Q&A • … • Deep Learning tends to perform “better” than traditional machine learning algorithms! Source: Google Inc. / Google Translate
  • 9. Spark & DL How they could be an ideal tandem, but there are challenges…
  • 10. Why do you want to use a cluster to train Deep Neural Networks? Deep Learning is SLOW
  • 11. • Highly scalable • No relevant hardware limits • Extensible Two approaches to speed up DL Scaling Up Scaling Out • Superior scaling until fundamental limits of the hardware are reached  Max. the number of PCIe lanes  Max. read speed of HDD  Costs scale up non-linear (DGX-1 = $129,000) Source: https://developer.nvidia.com/devbox
  • 12. You already have all your valuable data in Spark/Hadoop DL (often) requires a lot of data to train Need a lot of memory Pre-processing has massive of I/O requirements (disk & network) More reasons why you would want to use Hadoop/Spark for DL? &
  • 13. How could you implement DL on Spark? Worker 1 Worker 2 Worker 3 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ Master 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ = mini-batch of data Draw mini-batch Map: Compute updated model in each worker Reduce: Assemble into “better” model via Master node Broadcast “better” model and repeat Spark RDD 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯
  • 14. Comp ute 5% Comm unicat ion 95% Problem 1: Big Parameters = High shuffle cost! Worker 1 Worker 2 Worker 3 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ Master 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ Reduce models (at best 5 s over 1 GbE) Broadcast combined model (at best 5 s over 1 GbE) 500 MB 500 MB 500 MB 500 MB Compute updated models (typically 50 – 500 ms)
  • 15. Problem 2: Node communication is synchronous Worker 1 Worker 2 Worker 3 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ Master 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ Bottleneck!
  • 16. Blaze La Trobe University DL-System Cluster Single Machine Blaze Scala based standalone deep learning system CUBlaze CUBlaze GPU acceleration for Blaze Inferno Inferno Coordinates distributed computation of Blaze models in synchronous Spark environment
  • 17. A (probably biased) comparison Inferno SparkNet (Caffe) CaffeOnSpark deeplearning4j H2O ConvNets, AutoEncoders, etc. planned Communication protocol during training Spark MR Spark MR MPI/RDMA Spark MR among others Grpc/MPI/RDMA Build Complex models (e.g. ResNet) some Dynamic branching support (Path altering / dropping) Pluggable preprocessing Pipeline partial Pluggable update policies for hyper parameters Pluggable & visualizable online cross validation Entire execution path determined in single runtime environment Model description language JVM Code Config File Config File JVM Code multiple GPU acceleration
  • 19. Module Library • Standard Modules  Add-Bias (C/U/S/B), Immediate-Filter (C/U/S/B)  Convolution, Convolution-Decoder, Linear, Linear- Decoder, Locally-Connected, Locally-Connected-Decoder  L2-Pooling, Max-Pooling, Mean-Pooling,  Batch-Normalization , Dropout, LCN, LRN, Normalization (C/U/S/B), Reshape, Weight-Decay (L1/L2) • Nonlinearities Abs, Add-Noise, ELU, Exp, Hard-Tanh, LeakyReLU, Ln, Pow, PReLU, ReLU, ReQU, (Log-)Sigmoid, SmoothAbs, (Log-)Softmax, SoftPlus, Sq, Sqrt, SReLU, Tanh • Optimizers AdaDelta, AdaGrad, Adam, ConjugateGradientDescent, Rprop, RMSProp, SGD (traditional, local learning rates, momentum) • Constraints (can inject everywhere!) BCE, ClassLL, ClassNLL, KLDivergence, MSE • Containers Sequence, Auto-Encoder, Branch (Parallel) • Branching Alternate-Path, Drop-Path, Random-Path • Tensor Tables Operations Select, Concatenate (C/U/S/B), Merge (add/mean/lerp) • Visualization & Benchmarking Benchmark-Wrapper, Visualize-Histogram Visualize-MeanAndStdDev (C/U/S/B) C/U/S/B = These operations can be applied either on [C]hannel, [U]nit, [S]ample or [B]atch level.
  • 20. Performance – AlexNet OWT All benchmarks done using NVIDIA TitanX GPUs on comparable setups; Source: https://github.com/soumith/convnet-benchmarks 27 26 37 31 42 121 132 53 55 56 72 135 203 210 TORCH (CUDNN) TENSORFLOW CUBLAZE (1 GB WS LIMIT) TORCH (FBFFT) CUDACONVNET2 CAFFE (NATIVE) TORCH-7 (NATIVE) forward (ms) backward (ms)
  • 21. Performance – VGG A 162 158 167 355 408 323 350331 382 378 737 821 745 755 TORCH (CUDNN) TENSORFLOW CUBLAZE (1 GB WS LIMIT) TORCH (FBFFT) CUDACONVNET2 CAFFE (NATIVE) TORCH-7 (NATIVE) forward (ms) backward (ms) All benchmarks done using NVIDIA TitanX GPUs on comparable setups; Source: https://github.com/soumith/convnet-benchmarks
  • 22. Cached Sample … Cached Sample Cached Sample How Blaze works (example) PrefetcherModel (fprop only) Augmenter Weights (fixed) Sample Merger Data Source (HDD, SparkRDD, HDFS) Optimizer Model Weights (tunable) Hyper Param. Hyper Param. Objectives Hyper Param. Scope Delimiter Terminal, File, Showoff, etc.
  • 23. Easy Setup: Model • Blaze automatically infers most layer parameters based on the actual input • Usually no need to specify input and output dimensions or whether to use CPU or GPU val noClasses = 100 // Kernels val kernelConv1 = Kernel2D(dims = (11, 11), stride = (4, 4), padding = (2, 2)) val kernelConv2 = Kernel2D.centered((3, 3)) val kernelPool = Kernel2D((3, 3), (2, 2)) // Layers val bias = AddBiasBuilder() val relu = ReLUBuilder() val lrn = LateralResponseNormalizationBuilder(n = 5, k = 2, alpha = 1e-4f, beta = 0.75f) val pool = MaxPoolingBuilder(kernelPool) // Lego! val mb = SequenceBuilder( ConvolutionFilterBuilder(kernelConv1, 48), bias, relu, pool, lrn, ConvolutionFilterBuilder(kernelConv2, 192), bias, relu, ConvolutionFilterBuilder(kernelConv2, 128), bias, relu, pool, ReshapeBuilder.collapseDimensions(), LinearBuilder(noClasses), bias, SoftmaxBuilder(), ClassLLConstraintBuilder() )
  • 24. Easy Setup: CPU and GPU • Blaze maintains a variant table for each module type. • When you “build” an instance of a module, all variants are scored and the “best” variant for the current situation is selected automatically.  You can configure what “best” means. // Input data val data = Array[Batch](...) // Inspect batches val hints = BuildHints.derive(data) // Build compatible model val m = mb.build(hints) 19:25:20 INFO Scoring ConvolutionFilter[Kernel2[(3, 3), (1, 1)] x 2, 0/1 = filter]: 19:25:20 DEBUG 0000800a => CUDA_CUDNN, preferred, input type matches 19:25:20 DEBUG 0000400a => JVM_BLAS_IMPLICITMM, preferred 19:25:20 DEBUG 00000004 => JVM_BLAS_MM 19:25:20 DEBUG 0000000a => JVM_BREEZE_MM, preferred 19:25:20 DEBUG 00000002 => JVM_BREEZE_SPARSEMM 19:25:20 INFO CUDA_CUDNN selected!
  • 25. Working with large models! val mb = SequenceBuilder(...) val hints = ... val g = mb.toGraph(hints) SvgRenderer.render(g)
  • 26. Visualizing pre-processing pipelines val apb = AsynchronousPrefetcherBuilder(...) val g = apb.toGraph() SvgRenderer.render(g)
  • 27. Easy Setup: Optimizer val ob = MomentumBuilder() // Configure Hyper-Parameters ob.learningRate = DiscreteStepsBuilder( 0 -> 1e-2f, 40000 -> 1e-3f, 80000 -> 1e-4f ) // Setup Objectives ob.objectives += IterationCountLimitBuilder(1000) += CrossValidationBuilder(dataSource, ... preprocessing pipeline ...) += PrintStatusBuilder() >> FileSinkBuilder(HadoopFileHandle.userHome ++ "results/optimization.log") += objectives.Presets.visualizePerformance() >> ShowoffSinkBuilder("Cross Validation Performance") // Add more advanced stuff like Regularizers... // Go! val o = ob.build(m, dataSource) o.run()
  • 28. Other Features • Tensor Memory Management  Automatically monitors the dependencies between all tensors  Reallocates space occupied by unneeded tensors on the fly  Will automatically toggle “inPlace” processing when it is safe • Intermediate results are stored separate from model  Forward passes yield backpropagation contexts that can be consumed or discarded at any time.  Very interesting property for:  Live Query/Training  Fancy Optimizers  Hyper Parameter Search Saves up to 40% GPU memory during training!
  • 29. Blaze CUBlaze Inferno Inferno Training Deep Learning Models faster with Apache Spark
  • 30. Starting an Inferno cluster Spark Conf Cluster Coordinator Cluster FileRDD Spark BinaryRDD Inferno FileRDD 50,000 files / 50 dirs 689 s 6 s 1,300,000 files / 1000 dirs > 9999 s (gave up) 35 s 689 s 6 s 35 s Loading meta-data of HDFS files Claim Assess Tailor Spark Context Sample Data RDD Load hdfs://… Create Samples Load Plugins (e.g. CUBlaze)
  • 32. 57 minutes 2 hours, 42 minutes Performance ResNet 34 on ImageNet Blaze 2 x 8 core Xeon CPU + 1 x NVIDIA TitanX Inferno (over 1 GbE) 8 x 8 core Xeon CPU + 4 x NIVIDA TitanX Reached 20% Top1 accuracy 2.84 times faster!
  • 33. Performance PreAct ResNet 152 on ImageNet 0% 10% 20% 30% 40% 50% 60% 70% 80% 0 h 10 h 20 h 30 h 40 h 50 h 1x TitanX - Top 1 Accuracy 1x TitanX - Top 5 Accuracy Inferno Cluster (5x TitanX, 1 GbE) - Top 1 Accuracy Inferno Cluster (5x TitanX, 1 GbE) - Top 5 Accuracy Reached 30% Top1 accuracy 4.81 times faster using 5 GPUs!* * 6.8 ℎ vs. 32.7 ℎ
  • 34. Conclusion • Blaze & CUBlaze  Fast  Huge extensible module library  Easy to use • Inferno  Allows you to accelerate Blaze DL tasks on Spark  Uses Spark MR methods for all data transmissions:  Can run rather nicely along with other Spark jobs.  Can be used without high-speed / low latency equipment (usually required to make RDMA solutions perform well)  Plain old (and even slow) Ethernet is enough! * Note that using “Showoff” to visualize progress may open separate HTTP connections to the Showoff-Server.
  • 35. Where can I get it? • Blaze & CUBlaze & Example Code Stable, we train models using it for months already. A snapshot of the current stable release is available at: https://github.com/bashimao/ltudl (Apache License 2.0) • Showoff Multi-purpose live visualization system developed by Aiden Nibali (La Trobe University): https://github.com/anibali/showoff • Inferno  I am writing a paper about Inferno’s optimization system right now.  Once it has been accepted for publication, we will release the full source code on GitHub.  The best way to prepare for Inferno, is to download Blaze now and to get familiar with it.
  • 36. Questions? Matthias Langer, PhD cand. m.langer@latrobe.edu.au Supervisors: Dr. Zhen He z.he@latrobe.edu.au Prof. Wenny Rahayu w.rahayu@latrobe.edu.au
  • 37. Deep Learning & Spark @ LaTrobe Students • Master of Data Science degree  http://tinyurl.com/hf4wmn2  Advanced data science lab established in 2016 with newest hardware.  CSE5BDC Big Data Management on the Cloud (I tutor this!)  CSE5DEV Data Exploration and Visualization (~50% lectures on deep learning)  CSE5WDC Web Development on the Cloud • Research  GPU research cluster capable of running distributed deep learning tasks.  In-house development of a distributed deep learning system.  Dedicated research group working with various Deep Learning systems.  CSE4DLJ Weekly Deep Learning Journal Club Industry • If you have a data analytics problem:  … we have a dedicated deep learning research team!  … and probably also a deep learning solution for it! • Spark & Deep Learning workshops for Torch available on demand. • Past & current machine learning research collaborations  Alfred Hospital  ZenDesk  AIS (Australian Institute for Sports) • Contact: z.he@latobe.edu.au

Notas del editor

  1. Time Budget: 30 seconds Hi, my name is Matthias Langer. I am currently a PhD student at La Trobe University. Today I would like to present to you Inferno, which is a deep learning system that we develop here in Melbourne and can run on top of Spark.
  2. Time Budget: 30 seconds My talk will be structured as follows: I will talk with you a little bit about DL. … then about DL and Spark… … our own DL system …. … and then we will conclude, and I will also tell you where you can download our stuff.
  3. Time Budget: 30 seconds Talking Points: So without further ado, let’s start…
  4. Time Budget: 1 minute So, what is deep learning? Deep learning is machine learning algorithm that tries to extract hierarchical features from input data. In itself that is kind of similar to how the brain does it in this slide. So how does that work: Let’s say a stimulus (or input) comes from the eye and eventually ends up in region V1. There primitive features like edges are extracted. Then in V2 these features are combined into more complex features. This is done many times to grasp very complex features.
  5. Time Budget: 30 seconds Talking Points: Now, where can DL be used? For example, for in computer vision. In this area, DL has completely reshaped the landscape.
  6. Time Budget: 30 seconds Talking Points: But also in voice recognition DL is now used a lot!
  7. Time Budget: 30 seconds The same goes for natural language processing. I could now go on with examples, but… (next slide)
  8. Time Budget: 30 seconds … I think this slide from GoogleBrain sums it up pretty well. This is the amount of projects at Google that take advantage of DL to achieve their functionality. You can draw your own conclusions. But.. Well.. I would say this is an exponential development.
  9. Time Budget: 30 seconds So the first question that arises is probably... (next slide)
  10. Time Budget: 1 minute “Why do you want to use cluster resources to train DNNs?” When you dive into the literature available about DL, you will often see comments like this: (click) “This model took about 22 days to train.” (wait 5) Or another frequent comments could be: (click) “I trained 50x from scratch…” (wait 5) So, let me sum this up in on short sentence (click!) DEEP LEARNING IS SLOW!
  11. Time Budget: 1.5 minutes Scaling Up Scaling up works super-well until a certain point. And then it becomes either fundamentally hardware limited and/or expensive! Also consider that you have a box that can do ML very well but might be not good host for your data. Scaling Out (click) On the other hand we have the scaling out approach by using a cluster of computers and clever software like Hadoop & Spark. Here you have no hardware limits. And even better, it is extensible: So, you can gradually buy more resources for DL as you run more DL jobs.
  12. Time Budget: 1 minute Here are a few more reasons why you might want to try running DL on Spark: If you are here at this conference today, chances are that you already have all your valuable data in Hadoop and use Spark to process them. DL requires a lot of data. In your HDFS is a lot of da. DL needs a lot of memory, your Spark cluster probably has a lot of memory. Require lots of memory and IO for preprocessing data. Spark and Hadoop are masters at doing this.
  13. Time Budget: 1.5 minutes OK, Done deal! Let’s implement DL on Spark. As always, we first put all our data into an SparkRDD. (click) Now start a bunch of workers and give them our model. (click) Each worker then pulls one batch from the RDD and updates the model. This is a map-job in Spark. (click) Then we combine the changes from all workers into a joint model. This would be a reduce-job in Spark. (click) And finally, we take this model pass it back to the workers for the next optimization round. You could do this with a broadcast-job in Spark.
  14. Time Budget: 1.5 minutes The before-mentioned approach looks theoretically sound. But let’s take a closer look. (click) Typical DL models need 50-500 ms to compute on a modern GPU. (click) But presuming the model is large (e.g. 500 MB) Then the reduction will at least take 5 seconds, because that is minimum flight-time a single instance of such a model 1 GbE. (click) And then we also need at least another 5 seconds for rebroadcasting the model. In this scenario we spend about 95% of the time at communication. Now you could, say: But I have 10 GbE. 10 GbE is of course faster. But at best you still spend at least 66% of the time budget on communication.
  15. Time Budget: 1 minute Another thing to consider in Spark is map/reduce is synchronous. Only after the slowest worker has responded to the master it will be able to finish the reduction process. The master itself and its network connections can quickly become the bottleneck that slows down the entire system. So synchronous is kind of problematic.
  16. Time Budget: 1.5 minutes So let’s talk about what we have to offer. The LTU DL system consists of 3 major components. Blaze Is a standalone deep learning system that can train DL models on a single node… Now you might want to ask, why did you have to create new DL system. Blaze was designed from the ground up for use in a distributed MapReduce environment. So it is highly portable and scalable. CUBlaze A plugin for Blaze that adds support for NVIDIA GPUs. Inferno Is a coordinator service and a set of advanced optimizers for Blaze that leverage cluster resources to accelerate training of DNNs.
  17. Time Budget: 1.5 minutes There are already solutions for DL on Spark. Now why Inferno? If you type DL + Spark into Google you end up with a couple of systems. And they are all very different. So I will just pick a few things here This presentation will be available later for downloading. So you can compare more thoroughly. (click) Our system is not only a deep learning system but covers the entire pipeline. Including preprocessing. So it one can do all solution. (click) We also have pluggable online cross validation support. So you can see live how well your model generalizes right now. (click) Last but not least, this is the primary communication protocol used. As you can see, while some systems say they are Apache Spark based, they do not use Spark for communication. Actually, some of them just kick off the learning task using Spark and then open other communication channels. Hence, they are actually not really Spark DL systems. This is quite important. Because if you do something like the Spark resource management is completely thrown out of the window when you do that.
  18. Time Budget: 30 seconds So, let’s dig into our DL system… And start with Blaze.
  19. Time Budget: 1 minute Blaze is not only a Deep Learning Engine. It also comes with built-in support for a vast array of DL modules and optimizers. This is an incomplete list, but note that you see Convolution only once in this list and not things like Spatial, Volumetric, etc. Keep that in mind it will come back in a minute.
  20. Time Budget: 45 seconds Going distributed is useless if your base performance is horrible. Here is a benchmark that that pits CUBLaze against other famous DL engines on AlexNet. As you can see, our single GPU performance is comparable with TensorFlow. (lower is better)
  21. Time Budget: 30 seconds But not only for AlexNet. We scores similarly well for other network architectures.
  22. Time Budget: 2.5 min Talking Points: Next I want to show you how Blaze fundamentally works. As for all data science tasks, everything starts with the data itself. (click) Blaze gives you two options, lazily cached and uncached data loading. In this case we went for cached data loading. This is only interesting if you have very slow network connections and or use a regular access pattern. (click) Anyway, data is pulled from the data source from the first preprocessing pipeline. In this presentation they are always depicted as yellow hexagons. In this case it is a merger that merges multiple samples together to form a mini-batch. (click) It then hands it over to the next processing stage. In this example it is an augmenter. Augmenter allow you add a wide array of modules (including entire NNs), to mangle the data in order to make it consumable for the model under test. (click) So the augmenter hands the data over to the underlying model. The model then consumes the batch and produces a new batch. (click) Which it returns to the augmenter, which (click) in turn hands it over to the next processing stage. Here it is a prefetcher. Prefetchers mitigate performance drops through I/O bottlenecks, by pulling in batches ahead of time. (click) However, regardless what the last preprocessing stage is, now the batch in its current form is handed over to the optimizer. (click) The optimizer will consult the scope delimiter to decide to what degree the model should be modified next. This is a pretty unique property of Blaze and gives us very interesting properties for special purpose networks, or if you want to use different optimization strategies for different parts of the model and actually the distributed optimization. (click) Then it reads the current hyper parameters and (click) begins running the batch through the model. (click) No surprises here. The model uses its current weights and hyper parameters to compute a cost and returns it to the optimizer. (click) The optimizer will then process its current objectives and take action (depends what the objective is about). (click) Objectives can for example result in an output to a file or a Showoff server. It could also result in yield signal to the optimizer. In that case the optimization would be finished. (click) If it is not finished. Blaze will now use the gradients returned by the model to improve the current weights. I will also trigger update procedures in all hyper parameters. As you can see there are a few technicalities. But no fancy surprises or magic here. Arguments: Remember that caching is not useful if you can afford a prefetcher.
  23. Time Budget: 1.5 min REMEMBER: Mentioning the ConvLayer again when mentioned in previous slides. So, what does working with Blaze actually look like. Here we go… (click) For defining a convolution NN, we best start off with defining kernels (Kernels represent the size of the feature maps) We could do that later, but it is cleaner like this. These are many ways to initialize a 2D kernel. (click) In most NNs there are layers that we frequently use. So let’s just define those upfront. And now it is Lego time. Here we define a sequence And add convolution layers that use the previously defined kernels. The first one will create 48 feature maps of kernelConv1 we defined above. As you can simply mix defining individual layers on the fly and using the layers that we just defined above. Note that we are creating here a network that is 17 modules high. And it is still pretty readable. And the reason why that is still quite readable is that every piece of information here has to do with what we want to do. Not how. (click) As you can see we do not define the actual input and output dimensions of the layers. This is inferred automatically.
  24. Time Budget: 1 minute Here is why we do not have to specify CPU or GPU. Blaze will automatically pick the most best available implementation depending on many factors. Especially the runtime type of the tensor of the previous module. However, you have the option to set preferences to override our built in mechanics if you want. Blaze supplies fallback implementations for everything. If something is not supported in the desired implementation. Blaze will temporarily switch to a fallback solution. So if you give a model to a friend, he will always be able to compute it. TODO: Give examples for how it can be configured!
  25. Time Budget: 1 minute With many things being done automatically, you sometimes want to know how Blaze will actually process the data. To do this, you can transform any NN into a graph. Just call the toGraph method. You can also render it for on screen display. Then Blaze will show you what will happen. … there are two branches combing from above. … after the branches join a table is being formed containing those two tensors … this table is then collapsed down into a single CUDA tensor by a Merge operation that adds the tenors on top of each other.
  26. Time Budget: 30 seconds Of course, visualization is not limited to the model. You can also visualize other things as well. Here is a preprocessing pipeline for ImageNet. (wait 10 seconds)
  27. Time Budget: 2 minutes So last but not least, and probably the an example how you setup an optimization job in Blaze. First you create an optimizer builder. (click) Than you could set hyper parameters. In this case we set up a learningRate schedule with discrete steps. You can extend the functionality of optimizers with so-called objectives. (click) Objectives include stop conditions like this one where we simply say that we want to stop after 1000 iterations. But you can also execute complex functions (click) Let’s add a “Online Cross Validation” module. (click) Now let’s print the status again. That would now print the cost and other figures regarding the learning to the command line. (click) Let’s say we do not want this information to end up on the command line but in a file. For example in a Hadoop file. Then we would just add two arrows and let them point to a sink. (click) That was nice. But how about more advanced visualization. You can build them yourself or use presets that we frequently use. (click) Well.. Where the visualize. We could write an image file. Or we could also send it to our showoff visualization system. Like this. (click) This will automatically render the image on the showoff server in frame titled “Cross Validation Performance” (click) And produce in the showoff server a graphic like this. (click) You can also use logical operations to combine objectives. Here is periodic trigger that we set to 3600. That means this objective evaluates true once every hour. (click) We combine this using an &&-operator with a dump command. Now very hour the weights of the model will be dumped to the stdout. (click) But that is not very useful. So, let’s add a directory sink. This will redirect the output of dump to files in the directory “/tmp”. (click) There are lots of other things you can do. But eventually you want t build the optimizer by providing a model and a datasource. And then “run()” it.
  28. Time Budget: 1 minute Talking Points: Blaze has many other features. This here is merely a selection. Blaze has an automatic Tensor memory management. It will automatically monitor the relationships between tensors in your network to utilize the available memory as efficient as possible. The tensor management system will also automatically toggle inPlace processing if it is deemed safe. (click) This can save up to 40% of GPU memory during training. Also note that in Blaze, intermediate results are always stored separate from the model. So you can forward propagate multiple times without loosing the ability to backprop separately for the previous mini-batch. This is a nice property to have if you are optimizing hyper parameters .. Or .. You want to write fancy optimizers that explore the hyperplane of the cost function. As you have already seen in the previous slides. We have the ability to visualize lots of things. Right now we support only our own visualization system showoff. But the system is extensible.
  29. Time Budget: 30 seconds Talking Points: Now, for the last part of our deep learning system. Inferno itself. (click)
  30. Time Budget: 1.5 minutes Talking Points: To be able to utilize a cluster for training your modules. You have to use Inferno. In Inferno everything always starts with the Cluster coordinator. First, you will have to provide a SparkConf so that we know what Spark master you want to connect to. (click) The Cluster Coordinator creates and takes over control over the Spark Context. Now the automatic initialization procedure starts. (click) First The coordinator will briefly claim all cluster resources. (click) It then probes the each executor and checks for specific settings in their local configuration. (click) Then it frees all cluster resources that cannot be used for one or another reason to make them available for the Spark Scheduler again. (click) Now special plugins like for example CUBlaze can be loaded. Now the system is initialized. (click) Typically you would now somehow load your dataset. (click) Here we used the Inferno-FileRDD which is a special RDD that can handle huge amounts of files much faster than the built-in Spark RDDs. This way we can for instance just ditch the entire imagenet dataset into our HDFS filesystem and have it accessible from the entire cluster. (click) Anyway. Sooner or later you want to create samples that you can use for learning. Notice that sample creation is lazy. So once we have the meta-data for the HDFS files in the FileRDD we do not need to access that file anymore until we really need it for learning.i
  31. Time Budget: 1.5 minutes Talking Points: Presuming that you have already your Blaze optimizer. The Inferno optimizer is easy to use. As always, everything starts with the (click) Just provide the Blaze model you want to tune and cache it. (click) Provide the description of Blaze optimizer and cache it. (click) Provide the description of the preprocessing pipeline and cache it. (click) Now come the inferno-optimizers’ objectives hyper parameters and scope delimiter. (click) Now I know that looks similar. But in fact bother sets of parameters are different. The optimizer will be distributed to the workers and so are its objectives. And they are only evaluated there. The inferno parameters are evaluated in the master. (click) Anyway, finally you will have to provide your sample data and call the build() function. Now you have your Inferno optimizer. The only thing left is to call the “run()” function.
  32. Time Budget: 1 minute Now for the performance. Here we have trained a ResNet 34 on a single GPU. I deliberately took this picture. This is how Blaze would visualize its progress to you. With 1 GPU, we reached 20% top1 Accuracy after 2hrs, 42 minutes. Now the same network training on 4 machines with the same specs on Inferno. We reached the same result in 57 minutes. So 4x the hardware 2.8 times speed improvement….
  33. Time Budget: 1 minute Well nice, but not impressive. But ResNet 34 is very small and still doable on a single GPU. Let’s take on something larger. ResNet 152, with pre-activation units. As you can see from the horizontal axis, it takes incredibly long to train this on a single GPU (blue line). I basically gave up after about 44 hours. The distributed version (green). In this case 5 TitanX cards in an Inferno cluster with poor 1 GbE link speed. We can reach a top1 accuracy with similar hyper parameters in less than 7 hours. That is about 4.8 times faster than 33 hours using a single GPU.
  34. Time Budget: 1.5 minute So, to sum up Blaze & CUBlaze & Inferno Are fast, have a huge module library and still extensible. And quite easy to use. Inferno Allows you to accelerate DL tasks. And is completely Spark MR. So no shady network connections that punch holes into your security. And of course that we can also achieve decent results using cheap Ethernet hardware where other’s can’t.
  35. Time Budget: 1.5 minute So now the big question that remains is: How can you obtain this software to start playing around. For Blaze and CUBlaze, I have published snapshots of the current stable release on GitHub. There is example code. So just grab them and follow the instructions. Our visualization system Showoff, can be found at Aiden Nibali’s GitHub repo as a docker-image. For Inferno things are more complicated. I am in fact writing a paper about our optimizer right now. Unfortunately, I have tow wait until that paper has been accepted before I can release the code. However, as soon as that happens, you will find it next to Blaze & CUBlaze in the above mentioned repository. The best way to prepare for Inferno is to get familiar with Blaze now.
  36. Time Budget: 5 minutes Talking points: So, I don’t have many slides left. Any questions? (if people stand up, switch to the next slide.)
  37. Time Budget: - At LaTrobe we do quite a lot with deep learning. If you are interested, regardless whether you are a student or industry representative, you can contact us here.