SlideShare una empresa de Scribd logo
1 de 48
Challenges on Data Parallelism
and Model Parallelism
Presenter:
Jie Cao
Petuum: A New Platform for Distributed Machine Learning on Big Data
CMU Eric Xing
Outline
• Background
• Data Parallelism & Model Parallelism(3 Properties)
• Error Tolerance
• Stale Synchronous Parallel
• Dynamic Structure Dependence
• Dynamic Schedule
• Non-uniform Convergence
• Priority-based
• Blocked-based
2
Current Solution
1. Implementation of Specific ML algorithm
YahooLDA
Vowelpal Rabbit, Fast Learning Algorithm Collections, Yahoo -> Microsoft
Caffe, Deep Learning Framework, http://caffe.berkeleyvision.org/
2. Platforms for General Purpose ML
Hadoop
Spark spark+parameterserver, spark+caffe
GraphLab
3. Other Systems
Parameter Server
Petuum
4. Specified Hardware Accelerating
GPU,FPGA-based accelerators, “DianNao” for large NN.
Petuum
1. Systematically analyze ML models and tools
2. Find Common Properties
3. Build “workhorse” engine to solve entire models
The Key Statement!!!
many probabilistic model or algorithms are also based on an
iterative convergence
Data Parallelism & Model
Parallelism
Data Partition, I.I.D
Parallel Stochastic Gradient
Descent
Parallel SGD: Partition data to different workers; all workers
update full parameter vector
Parallel SGD [Zinkevich et al., 2010]
PSGD runs SGD on local copy of params in each machine
Input
Data
Input
Data
Input
Data
split Update local copy
of ALL params
Update local copy
of ALL params
aggregate
Update ALL
params
Input
Data
Input
Data
Input
Data
Challenges in Data
Parallelism
Existing ways are either safe/slow (BSP), or fast/risky (Async)
Need “Partial” synchronicity: Bounded Async Parallelism (BAP)
— Spread network comms evenly (don’t sync unless needed)

— Threads usually shouldn’t wait – but mustn’t drift too far apart!
Need straggler tolerance

— Slow threads must somehow catch up
????
Error Tolerance
Challenges 1 in Model Parallelism
Model Dependence
• Only effective if independent(or
weakly-correlated)
• Need carefully-chosen
parameters for updating
• Dependency-aware
On model parallelism and scheduling strategies for distributed machine learning
(NIPS’2014)
Parallel coordinate descent for l1-regularized loss minimization (ICML’2011)
Feature clustering for accelerating parallel coordinate descent (NIPS’2012)
A Frame- work for Machine Learning and Data Mining in the Cloud (PVLDB’2012)
Challenges 2 in Model Parallelism
Noneuniform Convergence
Petuum
Bösen
Strads
Parameter Server
memcached
YahooLDA
— Besteffort
Pettum:
— Bösen
Parameter Server (OSDI’ 2014 ):
— flexiable consitency models,customized, developer decide.
Desirable Consistency Model
1) Correctness of the distributed algorithm can be theoretically proven
2) Computing power of the system is fully utilized
Consistency Models• Classic Consistency in Database
• BSP (Bulk Synchronous Parallel,Valiant(PCA), 1990,
• Correct, but slow
• Hadoop[MR],Spark [RDD]
• GraphLab (Careful colored graph or by locking)
• Best-effort
• fast but no theoretic guarantee
• YahooLDA
• Async
• Hogwild!(NIPS 2011)
• A Lock-Free Approach to PSGD.
• Condition: optimization problem is sparse, meaning most gradient updates only modify
small parts of the decision variable. cannot grantee correctness in other case.
More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server (NIPS.2013)
Bounded Updates
Analysis of High-Performance Distributed ML at Scale through Parameter Server Consistency Models (AAAI’2015)
Distributed_delayed_stochastic_optimization(NIPS2011)
Slow Learners are Fast (NIPS’2009)
3 Conceptual Actions
• Schedule specifies the next subset of model variables to be updated in parallel
• e.g: fixed sequence,random
• Improved:
• 1. fastest converging variable,avoiding already-converged variables
• 2. avoid inter-dependence.
• Push specifies how individual workers compute partial results on those
variables
• Pull specifies how those partial results are aggregated to perform the full
variable update.
• Sync: Builtin primitives, BSP,SSP,AP
Dynamic Structural
Dependency
Not Automatically
Structure Aware
Examples on following 3 algorithms based on STRADS Framework

schedule,push,pull
by manually analysis and customized implement
word rotation scheduling
• 1.V dictionary words into U disjoint subsets V1, . . . , VU
(where U is the number of workers
• 2. Subsequent invocations of schedule will “rotate” subsets
amongst workers, so that every worker touches all U subsets
every U invocations
• For data partitioning, we divide the document tokens W
evenly across workers, and denote worker p’s set of tokens
by Wqp
• all zij will be sampled exactly once after U invocations of
schedule
Worker 1 Worker 2 Worker N
Data not moved
Words in docs1:
A,B,C,E,F,G,H,Z
Data not moved
Words in docs2:
A,B,C,E,F,G,H,Z
Data not moved
Words in docs3:
A,B,C,E,F,G,H,Z
V1. dictionary:
A,B,C
V2. dictionary:
A,B,C
Vu. dictionary:
A,B,C
…
…
…
STRADS Performance
Non-uniform convergence
Prioritization has not received as much attention in ML
Mainly proposal 2 methods:
1. Priority-based
2. Block-based with load balancing (Fugue)


Shotgun-Lasso Random-Robbin selection
rapidly changing parameters are 

more frequently updated than others
Priority-Based
Priority-based Schedule
1. Rewrite objective function 

by duplicating original features with opposite sign
Here, X contains 2J features,all
2.
It works for latent space model, but does not apply to all possible ML models:
1. graphical models and deep networks can have arbitrary structure between parameters and variables,
2. problems on time-series data will have sequential or autoregressive dependencies between datapoints.
Fugue:Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data
Thanks

Más contenido relacionado

La actualidad más candente

Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Ram Sriharsha
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...Andy Petrella
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15MLconf
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelinesjeykottalam
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...MLconf
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Spark Summit
 
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovSpark Summit
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15MLconf
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...MLconf
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowDatabricks
 
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...MLconf
 
Brief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep LearningBrief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep LearningAdam Gibson
 
AutoML Toolkit – Deep Dive
AutoML Toolkit – Deep DiveAutoML Toolkit – Deep Dive
AutoML Toolkit – Deep DiveDatabricks
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Yves Raimond
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Libraryjeykottalam
 
Deep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoDeep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoSri Ambati
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerDatabricks
 

La actualidad más candente (20)

How To Do A Project
How To Do A ProjectHow To Do A Project
How To Do A Project
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
 
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
 
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
 
Brief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep LearningBrief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep Learning
 
AutoML Toolkit – Deep Dive
AutoML Toolkit – Deep DiveAutoML Toolkit – Deep Dive
AutoML Toolkit – Deep Dive
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Library
 
Deep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoDeep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry Larko
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 

Destacado

Machine Learning Methods for Parameter Acquisition in a Human ...
Machine Learning Methods for Parameter Acquisition in a Human ...Machine Learning Methods for Parameter Acquisition in a Human ...
Machine Learning Methods for Parameter Acquisition in a Human ...butest
 
Spark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf JagermanSpark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf JagermanSpark Summit
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduJen Aman
 
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkTheodoros Vasiloudis
 
Machine Learning on Big Data
Machine Learning on Big DataMachine Learning on Big Data
Machine Learning on Big DataMax Lin
 
04 2 machine evolution
04 2 machine evolution04 2 machine evolution
04 2 machine evolutionTianlu Wang
 
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...Cloudera, Inc.
 
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...Fabio Petroni, PhD
 
A few questions about large scale machine learning
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learningTheodoros Vasiloudis
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesCodePolitan
 
Spark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick PentreathSpark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick PentreathSpark Summit
 
Introduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewIntroduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewSri Ambati
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on SparkMathieu Dumoulin
 
Inspection of CloudML Hyper Parameter Tuning
Inspection of CloudML Hyper Parameter TuningInspection of CloudML Hyper Parameter Tuning
Inspection of CloudML Hyper Parameter Tuningnagachika t
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...Kato Mivule
 

Destacado (20)

Machine Learning Methods for Parameter Acquisition in a Human ...
Machine Learning Methods for Parameter Acquisition in a Human ...Machine Learning Methods for Parameter Acquisition in a Human ...
Machine Learning Methods for Parameter Acquisition in a Human ...
 
Spark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf JagermanSpark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf Jagerman
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In Baidu
 
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache Flink
 
Machine Learning on Big Data
Machine Learning on Big DataMachine Learning on Big Data
Machine Learning on Big Data
 
CCG
CCGCCG
CCG
 
04 2 machine evolution
04 2 machine evolution04 2 machine evolution
04 2 machine evolution
 
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
 
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Comple...
 
A few questions about large scale machine learning
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learning
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Large Scale Distributed Deep Networks
Large Scale Distributed Deep NetworksLarge Scale Distributed Deep Networks
Large Scale Distributed Deep Networks
 
Spark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick PentreathSpark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick Pentreath
 
Introduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewIntroduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain View
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
H20: A platform for big math
H20: A platform for big math H20: A platform for big math
H20: A platform for big math
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on Spark
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
Inspection of CloudML Hyper Parameter Tuning
Inspection of CloudML Hyper Parameter TuningInspection of CloudML Hyper Parameter Tuning
Inspection of CloudML Hyper Parameter Tuning
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
 

Similar a Challenges on Distributed Machine Learning

Data Parallel and Object Oriented Model
Data Parallel and Object Oriented ModelData Parallel and Object Oriented Model
Data Parallel and Object Oriented ModelNikhil Sharma
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkDataWorks Summit/Hadoop Summit
 
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016Scaling Machine Learning to Billions of Parameters - Spark Summit 2016
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016Badri Narayan Bhaskar
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersJen Aman
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2Mohit Garg
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAAlbert Bifet
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetupGanesan Narayanasamy
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019VMware Tanzu
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
 
Mining big data streams with APACHE SAMOA by Albert Bifet
Mining big data streams with APACHE SAMOA by Albert BifetMining big data streams with APACHE SAMOA by Albert Bifet
Mining big data streams with APACHE SAMOA by Albert BifetJ On The Beach
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018Apache MXNet
 
Designing Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache SparkDesigning Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache SparkDatabricks
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkDatabricks
 
PEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCPEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCHimanshu Bedi
 
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingConcurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingSachintha Gunasena
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
Big Data Day LA 2015 - Lessons Learned Designing Data Ingest Systems
Big Data Day LA 2015 - Lessons Learned Designing Data Ingest SystemsBig Data Day LA 2015 - Lessons Learned Designing Data Ingest Systems
Big Data Day LA 2015 - Lessons Learned Designing Data Ingest Systemsaaamase
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 

Similar a Challenges on Distributed Machine Learning (20)

Data Parallel and Object Oriented Model
Data Parallel and Object Oriented ModelData Parallel and Object Oriented Model
Data Parallel and Object Oriented Model
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016Scaling Machine Learning to Billions of Parameters - Spark Summit 2016
Scaling Machine Learning to Billions of Parameters - Spark Summit 2016
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
 
Mining big data streams with APACHE SAMOA by Albert Bifet
Mining big data streams with APACHE SAMOA by Albert BifetMining big data streams with APACHE SAMOA by Albert Bifet
Mining big data streams with APACHE SAMOA by Albert Bifet
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
 
Designing Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache SparkDesigning Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache Spark
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
 
PEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCPEARC 17: Spark On the ARC
PEARC 17: Spark On the ARC
 
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingConcurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Big Data Day LA 2015 - Lessons Learned Designing Data Ingest Systems
Big Data Day LA 2015 - Lessons Learned Designing Data Ingest SystemsBig Data Day LA 2015 - Lessons Learned Designing Data Ingest Systems
Big Data Day LA 2015 - Lessons Learned Designing Data Ingest Systems
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 

Más de jie cao

Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes
Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral CodesObserving Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes
Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codesjie cao
 
A Comparative Study on Schema-guided Dialog State Tracking
A Comparative Study on Schema-guided Dialog State TrackingA Comparative Study on Schema-guided Dialog State Tracking
A Comparative Study on Schema-guided Dialog State Trackingjie cao
 
Task-oriented Conversational semantic parsing
Task-oriented Conversational semantic parsingTask-oriented Conversational semantic parsing
Task-oriented Conversational semantic parsingjie cao
 
Talking Geckos (Question and Answering)
Talking Geckos (Question and Answering)Talking Geckos (Question and Answering)
Talking Geckos (Question and Answering)jie cao
 
Spark调研串讲
Spark调研串讲Spark调研串讲
Spark调研串讲jie cao
 
Parsing Natural Scenes and Natural Language with Recursive Neural Networks
Parsing Natural Scenes and Natural Language with Recursive Neural NetworksParsing Natural Scenes and Natural Language with Recursive Neural Networks
Parsing Natural Scenes and Natural Language with Recursive Neural Networksjie cao
 

Más de jie cao (6)

Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes
Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral CodesObserving Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes
Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes
 
A Comparative Study on Schema-guided Dialog State Tracking
A Comparative Study on Schema-guided Dialog State TrackingA Comparative Study on Schema-guided Dialog State Tracking
A Comparative Study on Schema-guided Dialog State Tracking
 
Task-oriented Conversational semantic parsing
Task-oriented Conversational semantic parsingTask-oriented Conversational semantic parsing
Task-oriented Conversational semantic parsing
 
Talking Geckos (Question and Answering)
Talking Geckos (Question and Answering)Talking Geckos (Question and Answering)
Talking Geckos (Question and Answering)
 
Spark调研串讲
Spark调研串讲Spark调研串讲
Spark调研串讲
 
Parsing Natural Scenes and Natural Language with Recursive Neural Networks
Parsing Natural Scenes and Natural Language with Recursive Neural NetworksParsing Natural Scenes and Natural Language with Recursive Neural Networks
Parsing Natural Scenes and Natural Language with Recursive Neural Networks
 

Último

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 

Último (20)

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 

Challenges on Distributed Machine Learning

  • 1. Challenges on Data Parallelism and Model Parallelism Presenter: Jie Cao Petuum: A New Platform for Distributed Machine Learning on Big Data CMU Eric Xing
  • 2. Outline • Background • Data Parallelism & Model Parallelism(3 Properties) • Error Tolerance • Stale Synchronous Parallel • Dynamic Structure Dependence • Dynamic Schedule • Non-uniform Convergence • Priority-based • Blocked-based 2
  • 3.
  • 4. Current Solution 1. Implementation of Specific ML algorithm YahooLDA Vowelpal Rabbit, Fast Learning Algorithm Collections, Yahoo -> Microsoft Caffe, Deep Learning Framework, http://caffe.berkeleyvision.org/ 2. Platforms for General Purpose ML Hadoop Spark spark+parameterserver, spark+caffe GraphLab 3. Other Systems Parameter Server Petuum 4. Specified Hardware Accelerating GPU,FPGA-based accelerators, “DianNao” for large NN. Petuum 1. Systematically analyze ML models and tools 2. Find Common Properties 3. Build “workhorse” engine to solve entire models
  • 5.
  • 7. many probabilistic model or algorithms are also based on an iterative convergence
  • 8.
  • 9.
  • 10. Data Parallelism & Model Parallelism
  • 12. Parallel Stochastic Gradient Descent Parallel SGD: Partition data to different workers; all workers update full parameter vector Parallel SGD [Zinkevich et al., 2010] PSGD runs SGD on local copy of params in each machine Input Data Input Data Input Data split Update local copy of ALL params Update local copy of ALL params aggregate Update ALL params Input Data Input Data Input Data
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Challenges in Data Parallelism Existing ways are either safe/slow (BSP), or fast/risky (Async) Need “Partial” synchronicity: Bounded Async Parallelism (BAP) — Spread network comms evenly (don’t sync unless needed)
 — Threads usually shouldn’t wait – but mustn’t drift too far apart! Need straggler tolerance
 — Slow threads must somehow catch up ???? Error Tolerance
  • 20. Challenges 1 in Model Parallelism Model Dependence • Only effective if independent(or weakly-correlated) • Need carefully-chosen parameters for updating • Dependency-aware On model parallelism and scheduling strategies for distributed machine learning (NIPS’2014) Parallel coordinate descent for l1-regularized loss minimization (ICML’2011) Feature clustering for accelerating parallel coordinate descent (NIPS’2012) A Frame- work for Machine Learning and Data Mining in the Cloud (PVLDB’2012)
  • 21. Challenges 2 in Model Parallelism Noneuniform Convergence
  • 23. Parameter Server memcached YahooLDA — Besteffort Pettum: — Bösen Parameter Server (OSDI’ 2014 ): — flexiable consitency models,customized, developer decide. Desirable Consistency Model 1) Correctness of the distributed algorithm can be theoretically proven 2) Computing power of the system is fully utilized
  • 24. Consistency Models• Classic Consistency in Database • BSP (Bulk Synchronous Parallel,Valiant(PCA), 1990, • Correct, but slow • Hadoop[MR],Spark [RDD] • GraphLab (Careful colored graph or by locking) • Best-effort • fast but no theoretic guarantee • YahooLDA • Async • Hogwild!(NIPS 2011) • A Lock-Free Approach to PSGD. • Condition: optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable. cannot grantee correctness in other case.
  • 25. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server (NIPS.2013)
  • 27.
  • 28. Analysis of High-Performance Distributed ML at Scale through Parameter Server Consistency Models (AAAI’2015) Distributed_delayed_stochastic_optimization(NIPS2011) Slow Learners are Fast (NIPS’2009)
  • 29.
  • 30.
  • 31.
  • 32.
  • 33. 3 Conceptual Actions • Schedule specifies the next subset of model variables to be updated in parallel • e.g: fixed sequence,random • Improved: • 1. fastest converging variable,avoiding already-converged variables • 2. avoid inter-dependence. • Push specifies how individual workers compute partial results on those variables • Pull specifies how those partial results are aggregated to perform the full variable update. • Sync: Builtin primitives, BSP,SSP,AP Dynamic Structural Dependency
  • 34.
  • 35.
  • 36. Not Automatically Structure Aware Examples on following 3 algorithms based on STRADS Framework
 schedule,push,pull by manually analysis and customized implement
  • 37. word rotation scheduling • 1.V dictionary words into U disjoint subsets V1, . . . , VU (where U is the number of workers • 2. Subsequent invocations of schedule will “rotate” subsets amongst workers, so that every worker touches all U subsets every U invocations • For data partitioning, we divide the document tokens W evenly across workers, and denote worker p’s set of tokens by Wqp • all zij will be sampled exactly once after U invocations of schedule
  • 38. Worker 1 Worker 2 Worker N Data not moved Words in docs1: A,B,C,E,F,G,H,Z Data not moved Words in docs2: A,B,C,E,F,G,H,Z Data not moved Words in docs3: A,B,C,E,F,G,H,Z V1. dictionary: A,B,C V2. dictionary: A,B,C Vu. dictionary: A,B,C … … …
  • 40. Non-uniform convergence Prioritization has not received as much attention in ML Mainly proposal 2 methods: 1. Priority-based 2. Block-based with load balancing (Fugue) 

  • 41. Shotgun-Lasso Random-Robbin selection rapidly changing parameters are 
 more frequently updated than others
  • 43. Priority-based Schedule 1. Rewrite objective function 
 by duplicating original features with opposite sign Here, X contains 2J features,all 2.
  • 44. It works for latent space model, but does not apply to all possible ML models: 1. graphical models and deep networks can have arbitrary structure between parameters and variables, 2. problems on time-series data will have sequential or autoregressive dependencies between datapoints.
  • 46.
  • 47.