SlideShare una empresa de Scribd logo
1 de 38
1
Deep Learning Frameworks Using Spark on YARN
Vartika Singh
Field Data Science Architect
©2014 Cloudera, Inc. All rights reserved.©2014 Cloudera, Inc. All rights reserved.
2
“Would you tell me, please, which road do I take?"
"That depends a good deal on where you want to get to."
"I don't much care where –"
"Then it doesn't matter which way you go.”
3
Big Data and DNN
© Cloudera, Inc. All rights reserved. 4
A overview of ML pipeline
Raw Data
- many
sources
- many
formats
- varying
validity
Validated ML
Models End User
Data
Engineering
Data Science
Well-formatted
data
Training, validation,
and test data
cleaning
merging
filtering
model building
model training
hyper-param
tuning
pipeline
execution
production
operation
Data
Engineering
Consump-
tion for
analysis
ongoing data
ingestion
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
5
Deep Learning in Big Data
• A major source of difficulty in many real-world
artificial intelligence applications is that many of the
factors of variation influence every single piece of
data we can observe.
• Deep learning solves this central problem via
representation learning by introducing representations
that are expressed in terms of other, simpler
representations.
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
6
Deep
Learning
in Hadoop
• http://blog.cloudera.com/blog/2017/04/deep-learning-frameworks-on-cdh-and-cloudera-data-science-workbench/
• http://blog.cloudera.com/blog/2017/04/bigdl-on-cdh-and-cloudera-data-science-workbench/
© Cloudera, Inc. All rights reserved. 7
Analysis Pipeline
metadata, feature extraction,filter
Data Engineering
RawBinaryDatainS3
Processeddata
inS3
●training
●validation
●test
model, train, tune
Data Science and
Exploration
Search and SQL
Data Engineering
Validated model
● model
● parameters
Live data ingest
End results UI
● insights
● predictions
● results
Data Lake Cluster
S3/HDFS/Kudu
Parquet
Parquet
PMML
processing, execution
Data Engineering
Need
archived
Data
Hyperparameters/Code
CDSW
8
DL at Scale
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
9
Deep Learning at scale
• A significant amount of effort has been put into developing deep learning systems that can scale to very large
models and large training sets.
• Large models in the literature are now top performers in supervised visual recognition tasks
• Can even learn to detect objects when trained from unlabeled images alone
• The very largest of these systems has been constructed, which is able to train neural networks with over 1 billion
trainable parameters.
• While such extremely large networks are potentially valuable objects of AI research, the expense to train them is
overwhelming: the distributed computing infrastructure (known as “DistBelief”) manages to train a neural network
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
10
When to do it?
Distributed training isn’t free.
Setup time.
Continue to train your networks on a single machine, until the training time becomes
prohibitive.
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
Scaling Out and Up
• Using multiple machines in a large cluster
• Leveraging graphics processing units (GPUs).
11
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
GPUs
• The use of GPUs is a significant advance in recent years that
makes the training of modestly sized deep networks practical.
• A known limitation of the GPU approach is that the training
speed-up is small when the model does not fit in GPU memory
(typically less than 6 gigabytes).
• To use a GPU effectively, researchers often reduce the size of
the data or parameters so that CPU-to-GPU transfers are not a
significant bottleneck.
12
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
Parallelism
• Within machines
• Multithreading
• Across machines
• Message passing
13
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
14
Model parallelism
In model parallelism, different machines in the distributed system are responsible for the computat
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
15
Data parallelism
In data parallelism, different machines have a complete copy of the model; each machine simply ge
16
Typical considerations in Cloud
What we see out there
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
17
Driver Libraries
• cuDNN and Intel’s MKL
• One of the primary goals of driver libraries is to enable the community of neural
network frameworks to benefit equally from its APIs.
• The library exposes a host-callable C language API, but requires that input and
output data be resident on the GPU
• The library is thread-safe and its routines can be called from different host
threads.
• The convolution routines in cuDNN provide competitive performance with zero
auxiliary memory required.
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
18
CPUs and GPUs
• The the most important feature for deep learning performance is memory bandwidth.
• GPUs are optimized for memory bandwidth while sacrificing for memory access time
(latency).
• Batch methods, such as Limited memory BFGS (L- BFGS) or Conjugate Gradient (CG), with
the presence of a line search procedure, are usually much more stable to train and easier to
check for convergence.
• These methods, conventionally considered to be slow, can be fast thanks to the availability
of large amounts of RAMs, multicore CPUs, GPUs and computer clusters with fast network
hardware.
• Balance the number of CPUs and GPUs
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
19
Communication costs
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
20
AWS EBS
• Use volumes that are attached to an EBS-optimized instance
or an instance with 10 Gigabit network connectivity.
• EC2 instances that do not meet this criteria offer no
guarantee of network resources.
• You can use all of network bandwidth for traffic to Amazon
EBS if your application isn’t pushing other network traffic that
contends with Amazon EBS. (If not EBS optimization)
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
21
Optimized EBS
• EBS-optimized connections are full-duplex, and can drive
more throughput and IOPS in a 50/50 read/write workload
where both communication lanes are used.
• In some cases, network, file system, and Amazon EBS
encryption overhead can reduce the maximum throughput
and IOPS available.
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
22
AWS EC2
• Physical proximity of EC2 instances
• EC2 instance maximum transmission unit (MTU)
• The size of your EC2 instance.
• EC2 enhanced networking support for Linux
• Placement groups
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
23
Security
The security mechanism in cloud technology is generally weak. Hence tampering of data at the
public cloud is inevitable and it is a big concern. Finding a robust security mechanism for the
purpose of using the public cloud. Usually, in addition to firewalls, VPNS and encryption
provided by cloud service providers, CDH provides:
Authentication
Authorization
Encryption
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
24
Elasticity
Managed Service for elastic data pipelines
No data silos
Backward compatibility and platform portability
Built in workload management
Data Governance
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
25
Exploration and development
Fast and interactive data analysis
Isolated filesystem
Custom environment
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
26
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
27
28
Deep Learning Frameworks on Hadoop
What we see out there
© Cloudera, Inc. All rights reserved. 29
AWS
Impala
Search
Spark
Manage
upgrades - on
user
Debugging
tricky
All independent
Easy snapshot
Configurable
Scalable
ML/DL
© Cloudera, Inc. All rights reserved. 30
AWS and CDH
CDH
Scale with AWS
Manage with
CDH
Configurable
EasySnapshot
Managefrom
CDH
Upgradefrom
CDH
Debugusing
CDH
Scalable
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
Caffe2 - Synch SGD
• Data parallel
• Using 8 GPUS to run a batch of 32 each is equivalent to
one GPU running a mini-batch of 256.
31
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
Tensorflow - Synch and Asynch SGD
• Data parallel
• Synchronous SGD
• Asynchronous SGD
• Model parallel
• Concurrent Steps for Model computation in a pipeline
32
© Cloudera, Inc. All rights reserved. 33
CaffeOnSpark
• Caffe is a Deep Learning Framework from Berkley Vision Lab implemented in C++
where models and optimizations are defined as plaintext schemas instead of code. It
has a command line as well as a Python interface and has been widely adopted
especially for vision related tasks.
• Yahoo released a Spark interface for Caffe which gives you the ability to run the DNN
model within the same cluster where your ingested data and other analytical
frameworks reside, conforming to the company wide security and governance
policies.
© Cloudera, Inc. All rights reserved. 34
TensorflowOnSpark
• In sequence, Google releases Tensorflow, enhanced distributed deep learning
capabilities in Tensorflow, and then support for HDFS Support
• Supports direct Tensor communication between processes.
• Scales easily by adding more machines
• Tensorflow ingests data using QueueRunners or feed_dict. Does not leverage Spark
for data ingestion.
© Cloudera, Inc. All rights reserved. 35
DL4J
• Support Apache Spark (1 and 2) for distributed training on a cluster.
• Supports data parallel synchronous parameter averaging
• Recently added support for asynchronous gradient descent.
• Use Aeron for message passing
©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved.
36
Data parallelism in Spark
Data parallel approaches to distributed training keep a copy of the entire model on each
worker machine, processing different subsets of the training data set on each. Data
parallel training approaches all require some method of combining results and
synchronizing the model parameters between each worker. A number of different
approaches have been discussed in the literature, and the primary differences between
approaches are
• Parameter averaging vs. update (gradient)-based approaches
• Synchronous vs. asynchronous methods
• Centralized vs. distributed synchronization
Deeplearning4j’s current Spark implementation is a synchronous parameter averaging
© Cloudera, Inc. All rights reserved. 37
WHAT IS BIGDL ?
Github: github.com/intel-analytics/BigDL
http://software.intel.com/ai
• Open Source Deep Learning
framework for Apache Spark*
• High Performance & Efficient
Scale out leveraging Spark
architecture
• Feature Parity with Caffe, Torch
etc.
• Efficient implementations of
synchronous stochastic gradient
descent (SGD) and all-reduce
communications in Spark.
©2014 Cloudera, Inc. All rights reserved.©2014 Cloudera, Inc. All rights reserved.
38
Thank you

Más contenido relacionado

La actualidad más candente

On-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on AndroidOn-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on AndroidYufeng Guo
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformShivaji Dutta
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Codemotion
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntel Nervana
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...Edge AI and Vision Alliance
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of ComputingIntel Nervana
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)Julien SIMON
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceIntel Nervana
 
ECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine LearningECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine LearningAmanda Mackay (she/her)
 
Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to KerasJohn Ramey
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for RoboticsIntel Nervana
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at ScaleIntel Nervana
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA Taiwan
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on sparkSatyendra Rana
 
Urs Köster Presenting at RE-Work DL Summit in Boston
Urs Köster Presenting at RE-Work DL Summit in BostonUrs Köster Presenting at RE-Work DL Summit in Boston
Urs Köster Presenting at RE-Work DL Summit in BostonIntel Nervana
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
 

La actualidad más candente (20)

On-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on AndroidOn-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on Android
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at Galvanize
 
Amazon Deep Learning
Amazon Deep LearningAmazon Deep Learning
Amazon Deep Learning
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
Tensorflow vs MxNet
Tensorflow vs MxNetTensorflow vs MxNet
Tensorflow vs MxNet
 
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligence
 
ECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine LearningECS for Amazon Deep Learning and Amazon Machine Learning
ECS for Amazon Deep Learning and Amazon Machine Learning
 
Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to Keras
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for Robotics
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on spark
 
Urs Köster Presenting at RE-Work DL Summit in Boston
Urs Köster Presenting at RE-Work DL Summit in BostonUrs Köster Presenting at RE-Work DL Summit in Boston
Urs Köster Presenting at RE-Work DL Summit in Boston
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 

Similar a Deep Learning Frameworks Using Spark on YARN by Vartika Singh

Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartchCloudera, Inc.
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsYong Feng
 
Parallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitParallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitRafael Arana
 
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Amazon Web Services
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWDataWorks Summit
 
CLOUD ENABLING TECHNOLOGIES.pptx
 CLOUD ENABLING TECHNOLOGIES.pptx CLOUD ENABLING TECHNOLOGIES.pptx
CLOUD ENABLING TECHNOLOGIES.pptxDr Geetha Mohan
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)Amazon Web Services
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the EdgeAmazon Web Services
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors DataWorks Summit/Hadoop Summit
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Indrajit Poddar
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learnJohn D Almon
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the EdgeJulien SIMON
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSCloudera, Inc.
 
High Performance Computing Pitch Deck
High Performance Computing Pitch DeckHigh Performance Computing Pitch Deck
High Performance Computing Pitch DeckNicholas Vossburg
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
 

Similar a Deep Learning Frameworks Using Spark on YARN by Vartika Singh (20)

Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
Parallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitParallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks Summit
 
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSW
 
CLOUD ENABLING TECHNOLOGIES.pptx
 CLOUD ENABLING TECHNOLOGIES.pptx CLOUD ENABLING TECHNOLOGIES.pptx
CLOUD ENABLING TECHNOLOGIES.pptx
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
 
BSC LMS DDL
BSC LMS DDL BSC LMS DDL
BSC LMS DDL
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the Edge
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the Edge
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
High Performance Computing Pitch Deck
High Performance Computing Pitch DeckHigh Performance Computing Pitch Deck
High Performance Computing Pitch Deck
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 

Más de Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

Más de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Último

Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 

Último (20)

Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 

Deep Learning Frameworks Using Spark on YARN by Vartika Singh

  • 1. 1 Deep Learning Frameworks Using Spark on YARN Vartika Singh Field Data Science Architect
  • 2. ©2014 Cloudera, Inc. All rights reserved.©2014 Cloudera, Inc. All rights reserved. 2 “Would you tell me, please, which road do I take?" "That depends a good deal on where you want to get to." "I don't much care where –" "Then it doesn't matter which way you go.”
  • 4. © Cloudera, Inc. All rights reserved. 4 A overview of ML pipeline Raw Data - many sources - many formats - varying validity Validated ML Models End User Data Engineering Data Science Well-formatted data Training, validation, and test data cleaning merging filtering model building model training hyper-param tuning pipeline execution production operation Data Engineering Consump- tion for analysis ongoing data ingestion
  • 5. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 5 Deep Learning in Big Data • A major source of difficulty in many real-world artificial intelligence applications is that many of the factors of variation influence every single piece of data we can observe. • Deep learning solves this central problem via representation learning by introducing representations that are expressed in terms of other, simpler representations.
  • 6. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 6 Deep Learning in Hadoop • http://blog.cloudera.com/blog/2017/04/deep-learning-frameworks-on-cdh-and-cloudera-data-science-workbench/ • http://blog.cloudera.com/blog/2017/04/bigdl-on-cdh-and-cloudera-data-science-workbench/
  • 7. © Cloudera, Inc. All rights reserved. 7 Analysis Pipeline metadata, feature extraction,filter Data Engineering RawBinaryDatainS3 Processeddata inS3 ●training ●validation ●test model, train, tune Data Science and Exploration Search and SQL Data Engineering Validated model ● model ● parameters Live data ingest End results UI ● insights ● predictions ● results Data Lake Cluster S3/HDFS/Kudu Parquet Parquet PMML processing, execution Data Engineering Need archived Data Hyperparameters/Code CDSW
  • 9. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 9 Deep Learning at scale • A significant amount of effort has been put into developing deep learning systems that can scale to very large models and large training sets. • Large models in the literature are now top performers in supervised visual recognition tasks • Can even learn to detect objects when trained from unlabeled images alone • The very largest of these systems has been constructed, which is able to train neural networks with over 1 billion trainable parameters. • While such extremely large networks are potentially valuable objects of AI research, the expense to train them is overwhelming: the distributed computing infrastructure (known as “DistBelief”) manages to train a neural network
  • 10. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 10 When to do it? Distributed training isn’t free. Setup time. Continue to train your networks on a single machine, until the training time becomes prohibitive.
  • 11. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. Scaling Out and Up • Using multiple machines in a large cluster • Leveraging graphics processing units (GPUs). 11
  • 12. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. GPUs • The use of GPUs is a significant advance in recent years that makes the training of modestly sized deep networks practical. • A known limitation of the GPU approach is that the training speed-up is small when the model does not fit in GPU memory (typically less than 6 gigabytes). • To use a GPU effectively, researchers often reduce the size of the data or parameters so that CPU-to-GPU transfers are not a significant bottleneck. 12
  • 13. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. Parallelism • Within machines • Multithreading • Across machines • Message passing 13
  • 14. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 14 Model parallelism In model parallelism, different machines in the distributed system are responsible for the computat
  • 15. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 15 Data parallelism In data parallelism, different machines have a complete copy of the model; each machine simply ge
  • 16. 16 Typical considerations in Cloud What we see out there
  • 17. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 17 Driver Libraries • cuDNN and Intel’s MKL • One of the primary goals of driver libraries is to enable the community of neural network frameworks to benefit equally from its APIs. • The library exposes a host-callable C language API, but requires that input and output data be resident on the GPU • The library is thread-safe and its routines can be called from different host threads. • The convolution routines in cuDNN provide competitive performance with zero auxiliary memory required.
  • 18. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 18 CPUs and GPUs • The the most important feature for deep learning performance is memory bandwidth. • GPUs are optimized for memory bandwidth while sacrificing for memory access time (latency). • Batch methods, such as Limited memory BFGS (L- BFGS) or Conjugate Gradient (CG), with the presence of a line search procedure, are usually much more stable to train and easier to check for convergence. • These methods, conventionally considered to be slow, can be fast thanks to the availability of large amounts of RAMs, multicore CPUs, GPUs and computer clusters with fast network hardware. • Balance the number of CPUs and GPUs
  • 19. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 19 Communication costs
  • 20. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 20 AWS EBS • Use volumes that are attached to an EBS-optimized instance or an instance with 10 Gigabit network connectivity. • EC2 instances that do not meet this criteria offer no guarantee of network resources. • You can use all of network bandwidth for traffic to Amazon EBS if your application isn’t pushing other network traffic that contends with Amazon EBS. (If not EBS optimization)
  • 21. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 21 Optimized EBS • EBS-optimized connections are full-duplex, and can drive more throughput and IOPS in a 50/50 read/write workload where both communication lanes are used. • In some cases, network, file system, and Amazon EBS encryption overhead can reduce the maximum throughput and IOPS available.
  • 22. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 22 AWS EC2 • Physical proximity of EC2 instances • EC2 instance maximum transmission unit (MTU) • The size of your EC2 instance. • EC2 enhanced networking support for Linux • Placement groups
  • 23. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 23 Security The security mechanism in cloud technology is generally weak. Hence tampering of data at the public cloud is inevitable and it is a big concern. Finding a robust security mechanism for the purpose of using the public cloud. Usually, in addition to firewalls, VPNS and encryption provided by cloud service providers, CDH provides: Authentication Authorization Encryption
  • 24. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 24 Elasticity Managed Service for elastic data pipelines No data silos Backward compatibility and platform portability Built in workload management Data Governance
  • 25. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 25 Exploration and development Fast and interactive data analysis Isolated filesystem Custom environment
  • 26. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 26
  • 27. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 27
  • 28. 28 Deep Learning Frameworks on Hadoop What we see out there
  • 29. © Cloudera, Inc. All rights reserved. 29 AWS Impala Search Spark Manage upgrades - on user Debugging tricky All independent Easy snapshot Configurable Scalable ML/DL
  • 30. © Cloudera, Inc. All rights reserved. 30 AWS and CDH CDH Scale with AWS Manage with CDH Configurable EasySnapshot Managefrom CDH Upgradefrom CDH Debugusing CDH Scalable
  • 31. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. Caffe2 - Synch SGD • Data parallel • Using 8 GPUS to run a batch of 32 each is equivalent to one GPU running a mini-batch of 256. 31
  • 32. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. Tensorflow - Synch and Asynch SGD • Data parallel • Synchronous SGD • Asynchronous SGD • Model parallel • Concurrent Steps for Model computation in a pipeline 32
  • 33. © Cloudera, Inc. All rights reserved. 33 CaffeOnSpark • Caffe is a Deep Learning Framework from Berkley Vision Lab implemented in C++ where models and optimizations are defined as plaintext schemas instead of code. It has a command line as well as a Python interface and has been widely adopted especially for vision related tasks. • Yahoo released a Spark interface for Caffe which gives you the ability to run the DNN model within the same cluster where your ingested data and other analytical frameworks reside, conforming to the company wide security and governance policies.
  • 34. © Cloudera, Inc. All rights reserved. 34 TensorflowOnSpark • In sequence, Google releases Tensorflow, enhanced distributed deep learning capabilities in Tensorflow, and then support for HDFS Support • Supports direct Tensor communication between processes. • Scales easily by adding more machines • Tensorflow ingests data using QueueRunners or feed_dict. Does not leverage Spark for data ingestion.
  • 35. © Cloudera, Inc. All rights reserved. 35 DL4J • Support Apache Spark (1 and 2) for distributed training on a cluster. • Supports data parallel synchronous parameter averaging • Recently added support for asynchronous gradient descent. • Use Aeron for message passing
  • 36. ©2017 Cloudera, Inc. All rights reserved.©2017 Cloudera, Inc. All rights reserved. 36 Data parallelism in Spark Data parallel approaches to distributed training keep a copy of the entire model on each worker machine, processing different subsets of the training data set on each. Data parallel training approaches all require some method of combining results and synchronizing the model parameters between each worker. A number of different approaches have been discussed in the literature, and the primary differences between approaches are • Parameter averaging vs. update (gradient)-based approaches • Synchronous vs. asynchronous methods • Centralized vs. distributed synchronization Deeplearning4j’s current Spark implementation is a synchronous parameter averaging
  • 37. © Cloudera, Inc. All rights reserved. 37 WHAT IS BIGDL ? Github: github.com/intel-analytics/BigDL http://software.intel.com/ai • Open Source Deep Learning framework for Apache Spark* • High Performance & Efficient Scale out leveraging Spark architecture • Feature Parity with Caffe, Torch etc. • Efficient implementations of synchronous stochastic gradient descent (SGD) and all-reduce communications in Spark.
  • 38. ©2014 Cloudera, Inc. All rights reserved.©2014 Cloudera, Inc. All rights reserved. 38 Thank you

Notas del editor

  1. Apache Hadoop and related ecosystems have come to play a significant role in “Big Data Analytics”. They provide a rich and wide choices for handling format, source variation, fast-moving/evolving streaming data, security and trust handling, distributed and noisy sources, supported algorithms, high dimensions as well as scalability of cluster. It goes without saying that colocating a data processing pipeline with a Deep Learning framework makes data exploration/algorithm and model evolution much simpler and at the same time makes data governance and lineage tracking a simpler effort.
  2. Let’s do a segway
  3. BigDL is a distributed deep learning library for Apache Spark*. Using BigDL, you can write deep learning applications as Scala or Python* programs and take advantage of the power of scalable Spark clusters.  • You want to use existing Hadoop/Spark clusters to run your deep learning applications, which you can then easily share with other workloads (e.g., extract-transform-load, data warehouse, feature engineering, classical machine learning, graph analytics). An undesirable alternative to using BigDL is to introduce yet another distributed framework alongside Spark just to implement deep learning algorithms.