SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
Mar 13, 2020
Automated Time Series Analysis using
Deep Learning, Ray and Analytics Zoo
Shan Yu, Shengsheng Huang, Jason Dai
AI
Mar 13, 2020
• Background
• Introduction of Analytics Zoo
• Background about Time Series Analysis
• Background about AutoML and Ray
• Time Series Analysis using AutoML and Ray on Analytics Zoo
• Use Case Sharing
Agenda
Mar 13, 2020
Background
Mar 13, 2020
What is Analytics Zoo
Accelerating Data Analytics + AI Solutions At Scale
Distributed, High-Performance
Deep Learning Framework
for Apache Spark
https://github.com/intel-analytics/bigdl
Unified Analytics + AI Platform
Distributed TensorFlow, Keras, PyTorch
and BigDL on Apache Spark
https://github.com/intel-analytics/analytics-zoo
Mar 13, 2020
Unified Big Data Analytics and AI Platform
Production
Data pipeline
Prototype on laptop
using sample data
Experiment on clusters
with history data
Production deployment w/
distributed data pipeline
• Easily prototype the integrated data analytics & AI solution
• “Zero” code change from laptop to distributed cluster
• Directly access production data (Hadoop/Hive/HBase) without data copy
• Seamlessly deployed on production big data clusters
Seamless Scaling from Laptop to Production
Mar 13, 2020
Analytics Zoo
Unified Big Data Analytics and AI Platform
https://github.com/intel-analytics/analytics-zoo
Recommendation
Distributed TensorFlow & PyTorch on Spark
Spark Dataframes & ML Pipelines for DL
RayOnSpark
Model Serving
Models &
Algorithms
Integrated
Analytics & AI
Pipelines
Library &
Framework
Time Series Computer Vision NLP
ML Workflow AutoML for Time Series Automatic Cluster Serving
Python Libraries
(Numpy/Pandas/…)
DL Frameworks
(TF/PyTorch/…)
Distributed Analytics
(Spark/Flink/Ray/…)
Distributions
(Cloudera/Databricks/….)
Mar 13, 2020
Time Series Analysis
• Time Series data
• A series of data that is observed
sequentially in time.
• Numerical & unstructured
• Stock prices, sales volume, CPU/IO
monitoring data, etc.
• Example of time series analysis
• Product demand prediction
• Network quality analysis
• Predictive maintenance for high-
value equipment
Total volume of taxi passengers in NYC from 2014/07-2015/02 ( source :
https://github.com/intel-analytics/analytics-zoo/blob/master/apps/anomaly-
detection/anomaly-detection-nyc-taxi.ipynb)
Mar 13, 2020
AutoML Overview
Taking the Human out of Learning Applications : A Survey on Automated
Machine Learning. Yao, Q., Wang, et. al
Mar 13, 2020
Ray and Ray On Spark
https://medium.com/riselab/rayonspark-running-emerging-ai-applications-on-big-data-clusters-with-ray-and-analytics-zoo-923e0136ed6a
• Ray
• A distributed framework for
emerging AI applications
• RayOnSpark
• Directly run Ray programs on big
data cluster
• Seamlessly integrate ray into spark
data processing pipeline
Mar 13, 2020
Time Series Analysis using AutoML and
Ray on Analytics Zoo
Mar 13, 2020
Laptop Spark ClusterYARN ClusterK8s Cluster
Distributed TensorFlow & PyTorch on Spark
Spark Dataframes & ML Pipelines for DL
RayOnSpark
Model Inference
Recommendation
Computer
Vision
NLP
Cluster Serving
AutoML
Framework
Integrated
Analytics and
AI Pipelines
TimeSeries
Algorithms
Hyper-Parameter
Tuning
Feature Generation Model Selection
AnalyticsZoo
Trend Prediction
……
Anomaly
Detection
ML
Workflow
Built-in
Algorithms
and Models
Time Series Solution
…
User Models
Time Series Solution In Analytics Zoo
• Time series Applications
• Time series forecasting
• Anomaly detection
• Time Series Clustering
• etc
• AutoML
• Seamless scaling
• Full-stack Intel SW+HW
Optimization w/ Analytics
Zoo
Mar 13, 2020
• AutoML Framework
• FeatureTransformer
• Model
• SearchEngine
• Pipeline
• Time Series Prediction w/
AutoML
• TimeSequencePredictor
• TimeSequencePipeline
AutoML + Time Series Analysis Framework
In Analytics Zoo
*Other names and brands may be claimed as the property of others.
https://medium.com/riselab/scalable-automl-for-time-series-prediction-using-ray-and-
analytics-zoo-b79a6fd08139
Mar 13, 2020
Typical Workflow of Training w/ AutoML
FeatureTransformer
Model
SearchEngine
Search presets
Workflow implemented in TimeSequencePredictor
trial
trial
trial
trial
…best model
/parameters
trail jobs
Pipeline
with tunable parameters
with tunable parameters
configured with best parameters/model
Each trial runs a different
combination of hyper parameters
Ray Tune
Mar 13, 2020
• Training a Predictor
• fit (w/ automl)
• recipe
• distributed
• Using a Pipeline
• save/load
• evaluate/predict
• fit (incremental)
General API Usage
Mar 13, 2020Intel Confidential
Application: Time Series Forecasting
Mar 13, 2020Intel Confidential
Application: Anomaly Detection
Mar 13, 2020
Use case sharing
Mar 13, 2020
Spark-SQL
Data Loading
Data
Loader
Data Source APIs
File, HTTP, Kafka
forked.
DRAM
Store
customized.
Flash
Store
tiering
Preprocess RDD of Tensor Model Code of TF
DL Training & Inferencing
Data Model
SIMD Acceleration
Time Series Based Network Quality Prediction in SK Telecom
https://databricks.com/session_eu19/apache-spark-ai-use-case-in-telco-network-quality-analysis-and-prediction-with-geospatial-visualization
Mar 13, 2020
Unsupervised Time Series Anomaly Detection for Baosight
https://software.intel.com/en-us/articles/lstm-based-time-
series-anomaly-detection-using-analytics-zoo-for-apache-
spark-and-bigdl
Mar 13, 2020
https://www.intel.cn/content/www/cn/zh/analytics/artificial-intelligence/yunda-brings-quality-change-to-the-express-delivery-
industry.html
Yunda: Anomaly Detection for AIOps
• AIOps
• Monitoring log/metrics
analysis for data center
operations
• AIOps helps cost saving and
MTTR (mean-time-to-repair)
Mar 13, 2020
More Information about
AutoML + Time Series in Analytics Zoo
• Scalable AutoML for Time Series Analysis
• Source code as a branch of analytics-zoo repo @ https://github.com/intel-analytics/analytics-zoo/tree/automl
• README @ https://github.com/intel-analytics/analytics-zoo/blob/automl/pyzoo/zoo/automl/README.md
• Blog https://medium.com/riselab/scalable-automl-for-time-series-prediction-using-ray-and-analytics-zoo-b79a6fd08139
• Anomaly Detection Reference Examples
• Time Series Forecast w/ AutoML https://github.com/intel-analytics/analytics-zoo/blob/automl/apps/automl
• Anomaly Detection based on Forecast https://github.com/intel-analytics/analytics-zoo/tree/master/apps/anomaly-
detection
• Anomaly Detection based on AutoEncoder https://github.com/intel-analytics/analytics-
zoo/tree/master/apps/anomaly-detection-hd
• Real-world Customer Applications
• Baosight’s anomaly detection for intelligent equipment management. Details refer to http://software.intel.com/en-
us/articles/lstm-based-time-series-anomaly-detection-using-analytics-zoo-for-apache-spark-and-bigdl
• Yunda anomaly detection for AIOps https://www.intel.cn/content/www/cn/zh/analytics/artificial-intelligence/yunda-
brings-quality-change-to-the-express-delivery-industry.html
Mar 13, 2020
• Project website
• https://github.com/intel-analytics/analytics-zoo
• https://github.com/intel-analytics/bigdl
• Tutorials
• CVPR 2018: https://jason-dai.github.io/cvpr2018/
• AAAI 2019: https://jason-dai.github.io/aaai2019/
• “BigDL: A Distributed Deep Learning Framework for Big Data”
• In proceedings of ACM Symposium on Cloud Computing 2019 (SOCC’19)
• Use cases
• Azure, CERN, MasterCard, Office Depot, Tencent, Midea, etc.
• https://analytics-zoo.github.io/master/#powered-by/
More Information on Analytics Zoo
Automated Time Series Analysis using Deep Learning, Ray and Analytics Zoo

Más contenido relacionado

La actualidad más candente

cyREST: Cytoscape as a Service
cyREST: Cytoscape as a ServicecyREST: Cytoscape as a Service
cyREST: Cytoscape as a ServiceKeiichiro Ono
 
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Databricks
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis PatternsMikio L. Braun
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKJan Wiegelmann
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford ConsortiumSDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford ConsortiumKeiichiro Ono
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
 
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Databricks
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoDatabricks
 
Spark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til PifflSpark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til PifflSpark Summit
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingPaco Nathan
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningDatabricks
 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLDistributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLYulia Tell
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...GetInData
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaData Science Thailand
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at ScaleDatabricks
 
Building Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization WorkflowsBuilding Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization WorkflowsKeiichiro Ono
 
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...confluent
 

La actualidad más candente (20)

cyREST: Cytoscape as a Service
cyREST: Cytoscape as a ServicecyREST: Cytoscape as a Service
cyREST: Cytoscape as a Service
 
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACK
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford ConsortiumSDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
 
Spark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til PifflSpark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til Piffl
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely heading
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLDistributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDL
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at Scale
 
Building Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization WorkflowsBuilding Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization Workflows
 
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
 

Similar a Automated Time Series Analysis using Deep Learning, Ray and Analytics Zoo

Scalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using RayScalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using RayDatabricks
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Jason Dai
 
Apache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsApache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsMuralidhar Somisetty
 
Introduction to pyspark new
Introduction to pyspark newIntroduction to pyspark new
Introduction to pyspark newAnam Mahmood
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesPaco Nathan
 
Hadoop/Spark Non-Technical Basics
Hadoop/Spark Non-Technical BasicsHadoop/Spark Non-Technical Basics
Hadoop/Spark Non-Technical BasicsZitao Liu
 
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
apidays LIVE Helsinki & North 2022_Apps without APIs
apidays LIVE Helsinki & North 2022_Apps without APIsapidays LIVE Helsinki & North 2022_Apps without APIs
apidays LIVE Helsinki & North 2022_Apps without APIsapidays
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingJan Wiegelmann
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Ronak Agrawal 2018 Computer Science
Ronak Agrawal 2018 Computer Science Ronak Agrawal 2018 Computer Science
Ronak Agrawal 2018 Computer Science Ronak Agrawal
 
Using the FLaNK Stack for edge ai (apache mxnet, apache flink, apache nifi, a...
Using the FLaNK Stack for edge ai (apache mxnet, apache flink, apache nifi, a...Using the FLaNK Stack for edge ai (apache mxnet, apache flink, apache nifi, a...
Using the FLaNK Stack for edge ai (apache mxnet, apache flink, apache nifi, a...Timothy Spann
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Jason Dai
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonInsuk (Chris) Cho
 

Similar a Automated Time Series Analysis using Deep Learning, Ray and Analytics Zoo (20)

Scalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using RayScalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using Ray
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Apache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsApache Spark and future of advanced analytics
Apache Spark and future of advanced analytics
 
Introduction to pyspark new
Introduction to pyspark newIntroduction to pyspark new
Introduction to pyspark new
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
 
Hadoop/Spark Non-Technical Basics
Hadoop/Spark Non-Technical BasicsHadoop/Spark Non-Technical Basics
Hadoop/Spark Non-Technical Basics
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
apidays LIVE Helsinki & North 2022_Apps without APIs
apidays LIVE Helsinki & North 2022_Apps without APIsapidays LIVE Helsinki & North 2022_Apps without APIs
apidays LIVE Helsinki & North 2022_Apps without APIs
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous Driving
 
Knowledge Discovery in Production
Knowledge Discovery in ProductionKnowledge Discovery in Production
Knowledge Discovery in Production
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Ronak Agrawal 2018 Computer Science
Ronak Agrawal 2018 Computer Science Ronak Agrawal 2018 Computer Science
Ronak Agrawal 2018 Computer Science
 
Using the FLaNK Stack for edge ai (apache mxnet, apache flink, apache nifi, a...
Using the FLaNK Stack for edge ai (apache mxnet, apache flink, apache nifi, a...Using the FLaNK Stack for edge ai (apache mxnet, apache flink, apache nifi, a...
Using the FLaNK Stack for edge ai (apache mxnet, apache flink, apache nifi, a...
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of Python
 

Último

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Automated Time Series Analysis using Deep Learning, Ray and Analytics Zoo

  • 1. Mar 13, 2020 Automated Time Series Analysis using Deep Learning, Ray and Analytics Zoo Shan Yu, Shengsheng Huang, Jason Dai AI
  • 2. Mar 13, 2020 • Background • Introduction of Analytics Zoo • Background about Time Series Analysis • Background about AutoML and Ray • Time Series Analysis using AutoML and Ray on Analytics Zoo • Use Case Sharing Agenda
  • 4. Mar 13, 2020 What is Analytics Zoo Accelerating Data Analytics + AI Solutions At Scale Distributed, High-Performance Deep Learning Framework for Apache Spark https://github.com/intel-analytics/bigdl Unified Analytics + AI Platform Distributed TensorFlow, Keras, PyTorch and BigDL on Apache Spark https://github.com/intel-analytics/analytics-zoo
  • 5. Mar 13, 2020 Unified Big Data Analytics and AI Platform Production Data pipeline Prototype on laptop using sample data Experiment on clusters with history data Production deployment w/ distributed data pipeline • Easily prototype the integrated data analytics & AI solution • “Zero” code change from laptop to distributed cluster • Directly access production data (Hadoop/Hive/HBase) without data copy • Seamlessly deployed on production big data clusters Seamless Scaling from Laptop to Production
  • 6. Mar 13, 2020 Analytics Zoo Unified Big Data Analytics and AI Platform https://github.com/intel-analytics/analytics-zoo Recommendation Distributed TensorFlow & PyTorch on Spark Spark Dataframes & ML Pipelines for DL RayOnSpark Model Serving Models & Algorithms Integrated Analytics & AI Pipelines Library & Framework Time Series Computer Vision NLP ML Workflow AutoML for Time Series Automatic Cluster Serving Python Libraries (Numpy/Pandas/…) DL Frameworks (TF/PyTorch/…) Distributed Analytics (Spark/Flink/Ray/…) Distributions (Cloudera/Databricks/….)
  • 7. Mar 13, 2020 Time Series Analysis • Time Series data • A series of data that is observed sequentially in time. • Numerical & unstructured • Stock prices, sales volume, CPU/IO monitoring data, etc. • Example of time series analysis • Product demand prediction • Network quality analysis • Predictive maintenance for high- value equipment Total volume of taxi passengers in NYC from 2014/07-2015/02 ( source : https://github.com/intel-analytics/analytics-zoo/blob/master/apps/anomaly- detection/anomaly-detection-nyc-taxi.ipynb)
  • 8. Mar 13, 2020 AutoML Overview Taking the Human out of Learning Applications : A Survey on Automated Machine Learning. Yao, Q., Wang, et. al
  • 9. Mar 13, 2020 Ray and Ray On Spark https://medium.com/riselab/rayonspark-running-emerging-ai-applications-on-big-data-clusters-with-ray-and-analytics-zoo-923e0136ed6a • Ray • A distributed framework for emerging AI applications • RayOnSpark • Directly run Ray programs on big data cluster • Seamlessly integrate ray into spark data processing pipeline
  • 10. Mar 13, 2020 Time Series Analysis using AutoML and Ray on Analytics Zoo
  • 11. Mar 13, 2020 Laptop Spark ClusterYARN ClusterK8s Cluster Distributed TensorFlow & PyTorch on Spark Spark Dataframes & ML Pipelines for DL RayOnSpark Model Inference Recommendation Computer Vision NLP Cluster Serving AutoML Framework Integrated Analytics and AI Pipelines TimeSeries Algorithms Hyper-Parameter Tuning Feature Generation Model Selection AnalyticsZoo Trend Prediction …… Anomaly Detection ML Workflow Built-in Algorithms and Models Time Series Solution … User Models Time Series Solution In Analytics Zoo • Time series Applications • Time series forecasting • Anomaly detection • Time Series Clustering • etc • AutoML • Seamless scaling • Full-stack Intel SW+HW Optimization w/ Analytics Zoo
  • 12. Mar 13, 2020 • AutoML Framework • FeatureTransformer • Model • SearchEngine • Pipeline • Time Series Prediction w/ AutoML • TimeSequencePredictor • TimeSequencePipeline AutoML + Time Series Analysis Framework In Analytics Zoo *Other names and brands may be claimed as the property of others. https://medium.com/riselab/scalable-automl-for-time-series-prediction-using-ray-and- analytics-zoo-b79a6fd08139
  • 13. Mar 13, 2020 Typical Workflow of Training w/ AutoML FeatureTransformer Model SearchEngine Search presets Workflow implemented in TimeSequencePredictor trial trial trial trial …best model /parameters trail jobs Pipeline with tunable parameters with tunable parameters configured with best parameters/model Each trial runs a different combination of hyper parameters Ray Tune
  • 14. Mar 13, 2020 • Training a Predictor • fit (w/ automl) • recipe • distributed • Using a Pipeline • save/load • evaluate/predict • fit (incremental) General API Usage
  • 15. Mar 13, 2020Intel Confidential Application: Time Series Forecasting
  • 16. Mar 13, 2020Intel Confidential Application: Anomaly Detection
  • 17. Mar 13, 2020 Use case sharing
  • 18. Mar 13, 2020 Spark-SQL Data Loading Data Loader Data Source APIs File, HTTP, Kafka forked. DRAM Store customized. Flash Store tiering Preprocess RDD of Tensor Model Code of TF DL Training & Inferencing Data Model SIMD Acceleration Time Series Based Network Quality Prediction in SK Telecom https://databricks.com/session_eu19/apache-spark-ai-use-case-in-telco-network-quality-analysis-and-prediction-with-geospatial-visualization
  • 19. Mar 13, 2020 Unsupervised Time Series Anomaly Detection for Baosight https://software.intel.com/en-us/articles/lstm-based-time- series-anomaly-detection-using-analytics-zoo-for-apache- spark-and-bigdl
  • 20. Mar 13, 2020 https://www.intel.cn/content/www/cn/zh/analytics/artificial-intelligence/yunda-brings-quality-change-to-the-express-delivery- industry.html Yunda: Anomaly Detection for AIOps • AIOps • Monitoring log/metrics analysis for data center operations • AIOps helps cost saving and MTTR (mean-time-to-repair)
  • 21. Mar 13, 2020 More Information about AutoML + Time Series in Analytics Zoo • Scalable AutoML for Time Series Analysis • Source code as a branch of analytics-zoo repo @ https://github.com/intel-analytics/analytics-zoo/tree/automl • README @ https://github.com/intel-analytics/analytics-zoo/blob/automl/pyzoo/zoo/automl/README.md • Blog https://medium.com/riselab/scalable-automl-for-time-series-prediction-using-ray-and-analytics-zoo-b79a6fd08139 • Anomaly Detection Reference Examples • Time Series Forecast w/ AutoML https://github.com/intel-analytics/analytics-zoo/blob/automl/apps/automl • Anomaly Detection based on Forecast https://github.com/intel-analytics/analytics-zoo/tree/master/apps/anomaly- detection • Anomaly Detection based on AutoEncoder https://github.com/intel-analytics/analytics- zoo/tree/master/apps/anomaly-detection-hd • Real-world Customer Applications • Baosight’s anomaly detection for intelligent equipment management. Details refer to http://software.intel.com/en- us/articles/lstm-based-time-series-anomaly-detection-using-analytics-zoo-for-apache-spark-and-bigdl • Yunda anomaly detection for AIOps https://www.intel.cn/content/www/cn/zh/analytics/artificial-intelligence/yunda- brings-quality-change-to-the-express-delivery-industry.html
  • 22. Mar 13, 2020 • Project website • https://github.com/intel-analytics/analytics-zoo • https://github.com/intel-analytics/bigdl • Tutorials • CVPR 2018: https://jason-dai.github.io/cvpr2018/ • AAAI 2019: https://jason-dai.github.io/aaai2019/ • “BigDL: A Distributed Deep Learning Framework for Big Data” • In proceedings of ACM Symposium on Cloud Computing 2019 (SOCC’19) • Use cases • Azure, CERN, MasterCard, Office Depot, Tencent, Midea, etc. • https://analytics-zoo.github.io/master/#powered-by/ More Information on Analytics Zoo