SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
Deep Learning On Spark
Using BigDL on Qubole
Dash Desai
Technology Evangelist
@iamontheinet
Some Basic Concepts
Copyright 2017 © Qubole
What is Machine Learning?
Gives ‘computers the ability to learn without
being explicitly programmed’ - Wikipedia
Copyright 2017 © Qubole
What is Deep Learning?
Form of ML that uses a model of computing—
inspired by the structure of the brain
Copyright 2017 © Qubole
Deep Learning Applications
Computer Vision / Image Recognition / Object Detection
Speech Recognition / Natural Language Processing (NLP)
Recommendation Systems (Products, Matchmaking, etc.)
Prediction (Stock Market, Healthcare, etc.)
Anomaly Detection (Cybersecurity, etc.)
Copyright 2017 © Qubole
What is Apache Spark?
A fast and general-purpose engine
for large-scale, distributed data
processing
MLlib
Spark’s scalable machine learning library
High-quality algorithms; 100x faster than MapReduce
Usable in Java, Scala, Python, and R
Copyright 2017 © Qubole
Deep Learning: On Apache Spark
Copyright 2017 © Qubole
Deep Learning: Other Popular Non-Spark Options
TensorFlow* (Google)
• Natively distributed out-of-the-box
Keras
• Naturally runs on distributed frameworks/back-ends
• Theano, MXNet (CMU, MIT, NYU), TensorFlow, CNTK (Microsoft)
*Not to be confused with TensorFlow On Spark (TFOS) by Yahoo
BigDL
Copyright 2017 © Qubole
What is BigDL?
Distributed deep learning library
for Apache Spark
Open sourced by Intel (in Dec 2016)
Feature parity with DL frameworks such as Caffe, Torch
Integrates with Spark ML pipeline and Spark Streaming
Supports Model snapshots
Intel MKL (Math Kernel Library); multi-threading within
each Spark task
Copyright 2017 © Qubole
Cont…
Includes 100+ Layers (highest level building block in DL)
Includes 20+ Loss functions (help with model fitting)
Optimization methods include SGD, Adagard, LBFGS
Numeric computing via Tensor & high level neural networks
Scaling: synchronous mini-batch SGD and all-reduce
communication on Spark
What is BigDL?
Copyright 2017 © Qubole
BigDL vs TensorFrames
TensorFrames — can call TF from individual
partitions of a DataFrame or an RDD (in PySpark)…
However, since TF is not natively integrated
with Spark, it does not support distributed deep
learning such as for model training or fine
tuning.
Copyright 2017 © Qubole
BigDL vs TensorFlow on Spark (TFOS), Caffe
TensorFlow on Spark* (TFOS) and Caffe on Spark —
use Spark executors to launch TF or Caffe instances
on the cluster…
However, model training, predictions, etc. are
performed outside of Spark across multiple TF or
Caffe instances…
• Run as standalone jobs outside of the pipeline
• Very fine-grained/limited interaction with
analytics pipeline
*Not to be confused with natively distributed TensorFlow by Google
Copyright 2017 © Qubole
How Does BigDL Work
Copyright 2017 © Qubole
Distributed Deep Learning: Two Methods
Copyright 2017 © Qubole
Distributed Deep Learning: BigDL
<Insert Demo Here >> YAY!/>
Copyright 2017 © Qubole
Demo: Recognize Handwritten Digits
On
Use Model
Train On Dataset
… with everything running on …
…
…
… to recognize handwritten digits …
Data Science on Qubole
00Copyright 2017 © Qubole
Qubole
Qubole automates, controls and orchestrates all big data workloads including Data
Science so that you can optimize for performance, cost and scale.
Built for Anyone Who Uses
Data
Analysts
Data Scientists
Data Engineers
Data Admins
A Single Platform
for Any Use Case
ETL & Reporting
Ad Hoc Queries
Machine Learning
Streaming
Vertical Apps
Open Source Engines,
Optimized for the Cloud
Cloud-Native,
Cloud-Optimized,
Cloud-Agnostics
Copyright 2017 © Qubole
Data Science on Qubole
Copyright 2017 © Qubole
Data Science on Qubole
Copyright 2017 © Qubole
Cluster LifeCycle Management on Qubole
Note: Available on Apache Spark, Hadoop, and Presto as a service on Qubole
Auto-scaling Clusters
• Policy-driven
• One-time setup; Runtime modifications
• Work load aware upscaling and downscaling
• No wasted resources results in lowered TCO
Heterogeneous Clusters
• Mix-and-match instance types
• On-Demand and Spot instances (on AWS)
00Copyright 2017 © Qubole
Qubole: High-level View
User Access Qubole Tier Customer’s Azure Account
QUBOLE UI
VIA BROWSER
SDK
ODBC
EPHEMERAL WEB TIER
WEB SERVERS
Default Hive
Metastore
RDS–Qubole
User, Account
Configurations
(Encrypted
credentials)
Encrypted
Result Cache
(Optional)
Custom Hive
Metastore
(Optional) Other
RDS
Encrypted
HDFS
Slave
Encrypted
HDFS
Slave
Master
Ephemeral
Cluster,
Managed by
Qubole
Data Flow within
Customer’s CloudRESTAPI
(HTTPS)
Thank you!
Dash Desai
Technology Evangelist
@iamontheinet
Getting Started
Install BigDL on Qubole + Demo App: http://bit.ly/deep_learning_bigdl_qubole
BigDL: https://github.com/intel-analytics/BigDL

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
 
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed Awan
 
Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019
 
Distributed ML with Dask and Kubernetes
Distributed ML with Dask and KubernetesDistributed ML with Dask and Kubernetes
Distributed ML with Dask and Kubernetes
 
Introduction to df
Introduction to dfIntroduction to df
Introduction to df
 
DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache Spark
 
Distributed deep learning
Distributed deep learningDistributed deep learning
Distributed deep learning
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
Building DSLs with Scala
Building DSLs with ScalaBuilding DSLs with Scala
Building DSLs with Scala
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
 
Machine Learning with Scala
Machine Learning with ScalaMachine Learning with Scala
Machine Learning with Scala
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
Keras: Deep Learning Library for Python
Keras: Deep Learning Library for PythonKeras: Deep Learning Library for Python
Keras: Deep Learning Library for Python
 
Distributed deep learning optimizations for Finance
Distributed deep learning optimizations for FinanceDistributed deep learning optimizations for Finance
Distributed deep learning optimizations for Finance
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
AI Pipeline Optimization using Kubeflow
AI Pipeline Optimization using KubeflowAI Pipeline Optimization using Kubeflow
AI Pipeline Optimization using Kubeflow
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
 

Similar a Deep Learning on Apache Spark

Similar a Deep Learning on Apache Spark (20)

Image Recognition on AWS with Apache Spark and BigDL
Image Recognition on AWS with Apache Spark and BigDLImage Recognition on AWS with Apache Spark and BigDL
Image Recognition on AWS with Apache Spark and BigDL
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Octo and the DevSecOps Evolution at Oracle by Ian Van Hoven
Octo and the DevSecOps Evolution at Oracle by Ian Van HovenOcto and the DevSecOps Evolution at Oracle by Ian Van Hoven
Octo and the DevSecOps Evolution at Oracle by Ian Van Hoven
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptx
 
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018 Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
Spark 101
Spark 101Spark 101
Spark 101
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Deep Learning on Apache Spark

  • 1. Deep Learning On Spark Using BigDL on Qubole Dash Desai Technology Evangelist @iamontheinet
  • 3. Copyright 2017 © Qubole What is Machine Learning? Gives ‘computers the ability to learn without being explicitly programmed’ - Wikipedia
  • 4. Copyright 2017 © Qubole What is Deep Learning? Form of ML that uses a model of computing— inspired by the structure of the brain
  • 5. Copyright 2017 © Qubole Deep Learning Applications Computer Vision / Image Recognition / Object Detection Speech Recognition / Natural Language Processing (NLP) Recommendation Systems (Products, Matchmaking, etc.) Prediction (Stock Market, Healthcare, etc.) Anomaly Detection (Cybersecurity, etc.)
  • 6. Copyright 2017 © Qubole What is Apache Spark? A fast and general-purpose engine for large-scale, distributed data processing MLlib Spark’s scalable machine learning library High-quality algorithms; 100x faster than MapReduce Usable in Java, Scala, Python, and R
  • 7. Copyright 2017 © Qubole Deep Learning: On Apache Spark
  • 8. Copyright 2017 © Qubole Deep Learning: Other Popular Non-Spark Options TensorFlow* (Google) • Natively distributed out-of-the-box Keras • Naturally runs on distributed frameworks/back-ends • Theano, MXNet (CMU, MIT, NYU), TensorFlow, CNTK (Microsoft) *Not to be confused with TensorFlow On Spark (TFOS) by Yahoo
  • 10. Copyright 2017 © Qubole What is BigDL? Distributed deep learning library for Apache Spark Open sourced by Intel (in Dec 2016) Feature parity with DL frameworks such as Caffe, Torch Integrates with Spark ML pipeline and Spark Streaming Supports Model snapshots Intel MKL (Math Kernel Library); multi-threading within each Spark task
  • 11. Copyright 2017 © Qubole Cont… Includes 100+ Layers (highest level building block in DL) Includes 20+ Loss functions (help with model fitting) Optimization methods include SGD, Adagard, LBFGS Numeric computing via Tensor & high level neural networks Scaling: synchronous mini-batch SGD and all-reduce communication on Spark What is BigDL?
  • 12. Copyright 2017 © Qubole BigDL vs TensorFrames TensorFrames — can call TF from individual partitions of a DataFrame or an RDD (in PySpark)… However, since TF is not natively integrated with Spark, it does not support distributed deep learning such as for model training or fine tuning.
  • 13. Copyright 2017 © Qubole BigDL vs TensorFlow on Spark (TFOS), Caffe TensorFlow on Spark* (TFOS) and Caffe on Spark — use Spark executors to launch TF or Caffe instances on the cluster… However, model training, predictions, etc. are performed outside of Spark across multiple TF or Caffe instances… • Run as standalone jobs outside of the pipeline • Very fine-grained/limited interaction with analytics pipeline *Not to be confused with natively distributed TensorFlow by Google
  • 14. Copyright 2017 © Qubole How Does BigDL Work
  • 15. Copyright 2017 © Qubole Distributed Deep Learning: Two Methods
  • 16. Copyright 2017 © Qubole Distributed Deep Learning: BigDL
  • 17. <Insert Demo Here >> YAY!/>
  • 18. Copyright 2017 © Qubole Demo: Recognize Handwritten Digits On Use Model Train On Dataset … with everything running on … … … … to recognize handwritten digits …
  • 19. Data Science on Qubole
  • 20. 00Copyright 2017 © Qubole Qubole Qubole automates, controls and orchestrates all big data workloads including Data Science so that you can optimize for performance, cost and scale. Built for Anyone Who Uses Data Analysts Data Scientists Data Engineers Data Admins A Single Platform for Any Use Case ETL & Reporting Ad Hoc Queries Machine Learning Streaming Vertical Apps Open Source Engines, Optimized for the Cloud Cloud-Native, Cloud-Optimized, Cloud-Agnostics
  • 21. Copyright 2017 © Qubole Data Science on Qubole
  • 22. Copyright 2017 © Qubole Data Science on Qubole
  • 23. Copyright 2017 © Qubole Cluster LifeCycle Management on Qubole Note: Available on Apache Spark, Hadoop, and Presto as a service on Qubole Auto-scaling Clusters • Policy-driven • One-time setup; Runtime modifications • Work load aware upscaling and downscaling • No wasted resources results in lowered TCO Heterogeneous Clusters • Mix-and-match instance types • On-Demand and Spot instances (on AWS)
  • 24. 00Copyright 2017 © Qubole Qubole: High-level View User Access Qubole Tier Customer’s Azure Account QUBOLE UI VIA BROWSER SDK ODBC EPHEMERAL WEB TIER WEB SERVERS Default Hive Metastore RDS–Qubole User, Account Configurations (Encrypted credentials) Encrypted Result Cache (Optional) Custom Hive Metastore (Optional) Other RDS Encrypted HDFS Slave Encrypted HDFS Slave Master Ephemeral Cluster, Managed by Qubole Data Flow within Customer’s CloudRESTAPI (HTTPS)
  • 25. Thank you! Dash Desai Technology Evangelist @iamontheinet Getting Started Install BigDL on Qubole + Demo App: http://bit.ly/deep_learning_bigdl_qubole BigDL: https://github.com/intel-analytics/BigDL