SlideShare a Scribd company logo
1 of 30
About
@MLnick on Twitter & Github
Principal Engineer, IBM
CODAIT - Center for Open-Source Data
& AI Technologies
Machine Learning & AI
Apache Spark committer & PMC
Author of Machine Learning with Spark
Various conferences & meetups
Center for Open Source Data
and AI Technologies
CODAIT
codait.org
DBG / Oct 4, 2018 / © 2018 IBM Corporation
CODAIT aims to make AI solutions
dramatically easier to create, deploy,
and manage in the enterprise
Relaunch of the Spark Technology
Center (STC) to reflect expanded
mission
Improving Enterprise AI Lifecycle in Open Source
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
The Machine Learning
Workflow
Applying Machine Learning: Perception
In reality the workflow spans teams …
… and tools
Spark provides a unified platform
Machine Learning
Pipelines
What is a “model”?
Pipelines in Spark ML
Example – Text Classifier Pipeline
Example – Text Classifier PipelineModel
Spark ML Components
14
Source: http://spark.apache.org/docs/latest
Deep Learning
DBG / June 6, 2018 / © 2018 IBM Corporation
Deep Learning Overview
• Original theory from 1940s; computer models
originated around 1960s; fell out of favor in
1980s/90s
• Recent resurgence due to
• Bigger (and better) data; standard datasets (e.g.
ImageNet)
• Better hardware (GPUs)
• Improvements to algorithms, architectures and
optimization
• Leading to new state-of-the-art results in
computer vision (images and video);
speech/text; language translation and more
Source: Wikipedia
Modern Neural Networks
• Deep (multi-layer) networks
• Computer vision
• Convolution neural networks (CNNs)
• Image classification, object detection, segmentation
• Sequences and time-series
• Recurrent neural networks (RNNs)
• Machine translation, text generation
• Embeddings
• Text, categorical features
• Deep learning frameworks
• Flexibility, computation graphs, auto-differentiation,
GPUs
Source: Stanford CS231n
Deep Learning Frameworks
* Logos trademarks of their respective projects
Computation Graphs
Source: Google AI Blog
*MnasNet Network
*Inception V3
DL Frameworks on Spark
Major Frameworks
21
• Deeplearning4J
• BigDL
• Deep Learning Pipelines
• TensorFlowOnSpark
• Microsoft Machine Learning on Spark
(MMLSpark)
Deeplearning4J
22
• Distributed GPU support for all major deep
learning architectures
• CPU / Distributed CPU / Single GPU options exist
• Supports Convolutional Nets, LSTMs / RNNs,
Feedforward Nets, Word2Vec, custom layers
• Supported by startup Skymind.io
• Backed by its own linear algebra library –
ND4J
• APIs in Scala, Java, Python
• Newer Scala API, Keras-like
• Keras import / export for Python API
• Production serving is through proprietary
layer
• DataVec for ETL
BigDL
23
• Distributed CPU with Intel MKL
• No GPU support
• Most DL models – CNN, RNN
• Backed by Intel
• Natively integrated with Spark
• Scala, Python API
• Support for Spark ML pipelines
• Uses private internal Spark components for
distributed training
• Load Keras, Caffe, Torch models
• New Keras-style API
Deep Learning Pipelines
24
• Created by Databricks
• Focus on scoring models (TensorFlow / Keras) and
basic transfer learning
• No support for training the DL model
• Focus on image data & use cases
• Natively integrated with Spark
• Scala, Python API
• Support for Spark ML pipelines
• Support for scoring models as a SQL UDF
• Largely dormant currently
TensorFlowOnSpark
25
• Created by Yahoo
• Scale out TF on Spark clusters
• Use Spark executors to launch TF processes
• Supports distributed training through TF parameter
servers
• RDMA / Infiniband improvement to TF to speed up
distributed training
• Good support for TensorBoard
• Good integration with Spark
• But only Python API
• Some support for Spark ML pipelines
• Relatively inactive recently
MMLSpark
26
• Created by Microsoft
• Supports training using CNTK including distributed
• Image, text data
• Good integration with Spark
• Scala, Python, R API
• Support for Spark ML pipelines
• Varied deployment options
• Relatively active, seems quite well supported
Other Frameworks
27
• H20 AI / DeepWater
• Apache MXNet Spark integration
• TensorFrames
• CaffeOnSpark
• scalable-deep-learning on Github
• MLlib – MLPClassifier only
• Sparknet (abandoned)
Integration Challenges
28
• Moving data from Spark to DL framework (and
back)
• Serialization overhead – especially Python
• Managing DL computation graphs from Spark
executors means fault tolerance is difficult to
achieve
• GPU awareness
• Optimize and standardize data exchange -
SPARK-24579
• Apache Arrow
• Barrier Execution Mode - SPARK-24374
• Accelerator-aware scheduling - SPARK-
24615
29
Thank you!
codait.org
twitter.com/MLnick
github.com/MLnick
developer.ibm.com
FfDL
Sign up for IBM Cloud and try Watson Studio!
https://ibm.biz/BdYhXz
https://datascience.ibm.com/
MAX
Brought to you by community.ibm.com/icpfordata Catch the replay at ibmaicommunity.bemyapp.com

More Related Content

What's hot

Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Databricks
 
ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
 ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens... ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...Databricks
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowJan Kirenz
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operationsStepan Pushkarev
 
ER/Studio 2016: Build a Business-Driven Data Architecture
ER/Studio 2016: Build a Business-Driven Data ArchitectureER/Studio 2016: Build a Business-Driven Data Architecture
ER/Studio 2016: Build a Business-Driven Data ArchitectureEmbarcadero Technologies
 
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...MLconf
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonSimon Frid
 
RAPID - Building a highly usable API Design language with XText
RAPID - Building a highly usable API Design language with XTextRAPID - Building a highly usable API Design language with XText
RAPID - Building a highly usable API Design language with XTextTed Epstein
 
RepreZen DSL: Pushing the limits of language usability with XText
RepreZen DSL: Pushing the limits of language usability with XTextRepreZen DSL: Pushing the limits of language usability with XText
RepreZen DSL: Pushing the limits of language usability with XTextTatiana Tanya Fesenko
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...Bill Liu
 
Omni-Path Status, Upstreaming and Ongoing Work
Omni-Path Status, Upstreaming and Ongoing WorkOmni-Path Status, Upstreaming and Ongoing Work
Omni-Path Status, Upstreaming and Ongoing Workinside-BigData.com
 
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017MLconf
 
Freelancing on FPGA Design [How & Where to get Freelancing on FPGA]
Freelancing on FPGA Design [How & Where to get Freelancing on FPGA]Freelancing on FPGA Design [How & Where to get Freelancing on FPGA]
Freelancing on FPGA Design [How & Where to get Freelancing on FPGA]Krishna Gaihre
 
DIY Analytics with Apache Spark
DIY Analytics with Apache SparkDIY Analytics with Apache Spark
DIY Analytics with Apache SparkAdam Roberts
 
Spark NLP: State of the Art Natural Language Processing at Scale
Spark NLP: State of the Art Natural Language Processing at ScaleSpark NLP: State of the Art Natural Language Processing at Scale
Spark NLP: State of the Art Natural Language Processing at ScaleDatabricks
 
Rhapsody Software
Rhapsody SoftwareRhapsody Software
Rhapsody SoftwareBill Duncan
 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningSergey Karayev
 

What's hot (20)

Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
 
ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
 ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens... ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
ER/Studio 2016: Build a Business-Driven Data Architecture
ER/Studio 2016: Build a Business-Driven Data ArchitectureER/Studio 2016: Build a Business-Driven Data Architecture
ER/Studio 2016: Build a Business-Driven Data Architecture
 
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in Python
 
RAPID - Building a highly usable API Design language with XText
RAPID - Building a highly usable API Design language with XTextRAPID - Building a highly usable API Design language with XText
RAPID - Building a highly usable API Design language with XText
 
RepreZen DSL: Pushing the limits of language usability with XText
RepreZen DSL: Pushing the limits of language usability with XTextRepreZen DSL: Pushing the limits of language usability with XText
RepreZen DSL: Pushing the limits of language usability with XText
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
 
Omni-Path Status, Upstreaming and Ongoing Work
Omni-Path Status, Upstreaming and Ongoing WorkOmni-Path Status, Upstreaming and Ongoing Work
Omni-Path Status, Upstreaming and Ongoing Work
 
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
 
Freelancing on FPGA Design [How & Where to get Freelancing on FPGA]
Freelancing on FPGA Design [How & Where to get Freelancing on FPGA]Freelancing on FPGA Design [How & Where to get Freelancing on FPGA]
Freelancing on FPGA Design [How & Where to get Freelancing on FPGA]
 
DIY Analytics with Apache Spark
DIY Analytics with Apache SparkDIY Analytics with Apache Spark
DIY Analytics with Apache Spark
 
Spark NLP: State of the Art Natural Language Processing at Scale
Spark NLP: State of the Art Natural Language Processing at ScaleSpark NLP: State of the Art Natural Language Processing at Scale
Spark NLP: State of the Art Natural Language Processing at Scale
 
Advanced python
Advanced pythonAdvanced python
Advanced python
 
Satish A (1)
Satish A (1)Satish A (1)
Satish A (1)
 
Rhapsody Software
Rhapsody SoftwareRhapsody Software
Rhapsody Software
 
Ai use cases
Ai use casesAi use cases
Ai use cases
 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep Learning
 

Similar to AI and Spark - IBM Community AI Day

Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkDatabricks
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Jason Dai
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache SparkQuantUniversity
 
Index conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreathIndex conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreathChester Chen
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltreMarco Parenzan
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkDatabricks
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsDatabricks
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesJen Aman
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDatabricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesJen Aman
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...Databricks
 
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...Data Con LA
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...Databricks
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI PipelinesTimothy Spann
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkDatabricks
 
.NET for Azure Synapse (and viceversa)
.NET for Azure Synapse (and viceversa).NET for Azure Synapse (and viceversa)
.NET for Azure Synapse (and viceversa)Marco Parenzan
 
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleScala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleDomino Data Lab
 

Similar to AI and Spark - IBM Community AI Day (20)

Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache Spark
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
 
Index conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreathIndex conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreath
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new Directions
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
 
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache Spark
 
.NET for Azure Synapse (and viceversa)
.NET for Azure Synapse (and viceversa).NET for Azure Synapse (and viceversa)
.NET for Azure Synapse (and viceversa)
 
Amazon Deep Learning
Amazon Deep LearningAmazon Deep Learning
Amazon Deep Learning
 
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleScala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
 

More from Nick Pentreath

Notebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and KubeflowNotebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and KubeflowNick Pentreath
 
Scaling up deep learning by scaling down
Scaling up deep learning by scaling downScaling up deep learning by scaling down
Scaling up deep learning by scaling downNick Pentreath
 
IBM Developer Model Asset eXchange - Deep Learning for Everyone
IBM Developer Model Asset eXchange - Deep Learning for EveryoneIBM Developer Model Asset eXchange - Deep Learning for Everyone
IBM Developer Model Asset eXchange - Deep Learning for EveryoneNick Pentreath
 
Search and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same CoinSearch and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same CoinNick Pentreath
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsNick Pentreath
 
Productionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for AnalyticsProductionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for AnalyticsNick Pentreath
 
RNNs for Recommendations and Personalization
RNNs for Recommendations and PersonalizationRNNs for Recommendations and Personalization
RNNs for Recommendations and PersonalizationNick Pentreath
 

More from Nick Pentreath (7)

Notebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and KubeflowNotebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and Kubeflow
 
Scaling up deep learning by scaling down
Scaling up deep learning by scaling downScaling up deep learning by scaling down
Scaling up deep learning by scaling down
 
IBM Developer Model Asset eXchange - Deep Learning for Everyone
IBM Developer Model Asset eXchange - Deep Learning for EveryoneIBM Developer Model Asset eXchange - Deep Learning for Everyone
IBM Developer Model Asset eXchange - Deep Learning for Everyone
 
Search and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same CoinSearch and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same Coin
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Productionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for AnalyticsProductionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for Analytics
 
RNNs for Recommendations and Personalization
RNNs for Recommendations and PersonalizationRNNs for Recommendations and Personalization
RNNs for Recommendations and Personalization
 

Recently uploaded

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Recently uploaded (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 

AI and Spark - IBM Community AI Day

  • 1.
  • 2. About @MLnick on Twitter & Github Principal Engineer, IBM CODAIT - Center for Open-Source Data & AI Technologies Machine Learning & AI Apache Spark committer & PMC Author of Machine Learning with Spark Various conferences & meetups
  • 3. Center for Open Source Data and AI Technologies CODAIT codait.org DBG / Oct 4, 2018 / © 2018 IBM Corporation CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise Relaunch of the Spark Technology Center (STC) to reflect expanded mission Improving Enterprise AI Lifecycle in Open Source Gather Data Analyze Data Machine Learning Deep Learning Deploy Model Maintain Model Python Data Science Stack Fabric for Deep Learning (FfDL) Mleap + PFA Scikit-LearnPandas Apache Spark Apache Spark Jupyter Model Asset eXchange Keras + Tensorflow
  • 6. In reality the workflow spans teams …
  • 8. Spark provides a unified platform
  • 10. What is a “model”?
  • 12. Example – Text Classifier Pipeline
  • 13. Example – Text Classifier PipelineModel
  • 14. Spark ML Components 14 Source: http://spark.apache.org/docs/latest
  • 15. Deep Learning DBG / June 6, 2018 / © 2018 IBM Corporation
  • 16. Deep Learning Overview • Original theory from 1940s; computer models originated around 1960s; fell out of favor in 1980s/90s • Recent resurgence due to • Bigger (and better) data; standard datasets (e.g. ImageNet) • Better hardware (GPUs) • Improvements to algorithms, architectures and optimization • Leading to new state-of-the-art results in computer vision (images and video); speech/text; language translation and more Source: Wikipedia
  • 17. Modern Neural Networks • Deep (multi-layer) networks • Computer vision • Convolution neural networks (CNNs) • Image classification, object detection, segmentation • Sequences and time-series • Recurrent neural networks (RNNs) • Machine translation, text generation • Embeddings • Text, categorical features • Deep learning frameworks • Flexibility, computation graphs, auto-differentiation, GPUs Source: Stanford CS231n
  • 18. Deep Learning Frameworks * Logos trademarks of their respective projects
  • 19. Computation Graphs Source: Google AI Blog *MnasNet Network *Inception V3
  • 21. Major Frameworks 21 • Deeplearning4J • BigDL • Deep Learning Pipelines • TensorFlowOnSpark • Microsoft Machine Learning on Spark (MMLSpark)
  • 22. Deeplearning4J 22 • Distributed GPU support for all major deep learning architectures • CPU / Distributed CPU / Single GPU options exist • Supports Convolutional Nets, LSTMs / RNNs, Feedforward Nets, Word2Vec, custom layers • Supported by startup Skymind.io • Backed by its own linear algebra library – ND4J • APIs in Scala, Java, Python • Newer Scala API, Keras-like • Keras import / export for Python API • Production serving is through proprietary layer • DataVec for ETL
  • 23. BigDL 23 • Distributed CPU with Intel MKL • No GPU support • Most DL models – CNN, RNN • Backed by Intel • Natively integrated with Spark • Scala, Python API • Support for Spark ML pipelines • Uses private internal Spark components for distributed training • Load Keras, Caffe, Torch models • New Keras-style API
  • 24. Deep Learning Pipelines 24 • Created by Databricks • Focus on scoring models (TensorFlow / Keras) and basic transfer learning • No support for training the DL model • Focus on image data & use cases • Natively integrated with Spark • Scala, Python API • Support for Spark ML pipelines • Support for scoring models as a SQL UDF • Largely dormant currently
  • 25. TensorFlowOnSpark 25 • Created by Yahoo • Scale out TF on Spark clusters • Use Spark executors to launch TF processes • Supports distributed training through TF parameter servers • RDMA / Infiniband improvement to TF to speed up distributed training • Good support for TensorBoard • Good integration with Spark • But only Python API • Some support for Spark ML pipelines • Relatively inactive recently
  • 26. MMLSpark 26 • Created by Microsoft • Supports training using CNTK including distributed • Image, text data • Good integration with Spark • Scala, Python, R API • Support for Spark ML pipelines • Varied deployment options • Relatively active, seems quite well supported
  • 27. Other Frameworks 27 • H20 AI / DeepWater • Apache MXNet Spark integration • TensorFrames • CaffeOnSpark • scalable-deep-learning on Github • MLlib – MLPClassifier only • Sparknet (abandoned)
  • 28. Integration Challenges 28 • Moving data from Spark to DL framework (and back) • Serialization overhead – especially Python • Managing DL computation graphs from Spark executors means fault tolerance is difficult to achieve • GPU awareness • Optimize and standardize data exchange - SPARK-24579 • Apache Arrow • Barrier Execution Mode - SPARK-24374 • Accelerator-aware scheduling - SPARK- 24615
  • 29. 29 Thank you! codait.org twitter.com/MLnick github.com/MLnick developer.ibm.com FfDL Sign up for IBM Cloud and try Watson Studio! https://ibm.biz/BdYhXz https://datascience.ibm.com/ MAX
  • 30. Brought to you by community.ibm.com/icpfordata Catch the replay at ibmaicommunity.bemyapp.com