SlideShare una empresa de Scribd logo
1 de 29
Classifying Multivariate Time
Series Scalably
Ashfaq Munshi, Saeed Bidhendi, Faramarz Munshi
November 10, 2017
• Background and Motivation
• Univariate Time Series (UTS)
• Multivariate Time Series (MTS)
• Conclusion
Overview
© Pepperdata, Inc.2
Background
Pepperdata Telemetry Data Scale
Example production deployment:
© Pepperdata, Inc.5
570
Nodes
20
Tasks /
Node
300
Metrics /
Task
5-Sec
Sampling
41 Million
Points /
Minute
300
Trillion
Performance
Data Points
Collected
Our Big Data About Production Big Data
© Pepperdata, Inc.6
22
Thousand
Production
Nodes
50
Million
Jobs/Year
Example Time Series
© Pepperdata, Inc.7
• Highly variable in length
• 10 data points to 10K+ data points
• Missing data
• Extremely noisy
Characteristics of our TS
© Pepperdata, Inc.8
Problem
© Pepperdata, Inc.9
Classify this collection of time series
to give operators a better understanding of
resource utilization on their clusters and to
enable a scheduler to better optimize cluster
resources
Univariate Time Series
• Two recent approaches from the literature
• Transform the TS into an image then use a tiled CNN
[Wang & Oats 2015]
• Transform the TS into a bag of patterns
[Schafer & Leser 2017]
• Dataset is the UCR data set
• 82 time series data sets
• Number of series < 10K
• Data points per series < 2K
Approaches and Data Set
© Pepperdata, Inc.11
• Map the time series into
• Gramian Angular Summation Fields
• Gramian Angular Difference Fields
• Markov Transition Fields
• Feed images into a tiled CNN for classification
Time Series and Images
© Pepperdata, Inc.12
[Wang & Oats, 2015]
• Normalize the time series into [-1,1]
• Transform to Polar Coordinates
Gramian Angular Fields
© Pepperdata, Inc.13
[Wang & Oats, 2015]
Example GADF Image
© Pepperdata, Inc.14
[Wang & Oats, 2015]
• Divide TS into windows
• Fourier Transform TS in window
• Apply low-pass filter
• Quantize the Fourier coefficients
• Map window to words
• Extract features from sentences
• Use Logistic Regression classifier
Time Series and Bag of Patterns
© Pepperdata, Inc.15
[Schafer & Leser 2017]
• Convert TS into image (GADF)
• Use Google’s pre-trained CNN; trained on inception v3
• Embed into 2,048-dimensional vector space
• Train MLP
• 2 hidden layers (50 nodes each)
• ReLU activation
• Dropout for regularization (.1, .2)
• Softmax final layer
Our “Off the shelf” Approach (PD)
© Pepperdata, Inc.16
Accuracies for a subset of UCR
© Pepperdata, Inc.17
0%
20%
40%
60%
80%
100%
BOSS (91.1)
PD (89.8)
GADF+GASF+MTF (86.4)
Accuracy on a subset of UCR
© Pepperdata, Inc.18
68%
70%
72%
74%
76%
78%
80%
82%
84%
86%
WEASEL 1-NN DTW CV 1-NN DTW BOSS Learning
Shapelet (LS)
TSBF ST EE (PROP) COTE
(ensemble)
PD
Training Time Comparison
© Pepperdata, Inc.19
 PD
Multivariate Time Series
• Two recent approaches from the literature
• Use an ESN (“Echo State Network”) to map MTS into
state clouds [Wang, Wang, Liu 2015]
• Use Dynamic Time Warping with Mahalanobis distance
metric [Mei, Liu, Wang, Gao 2016]
• Dataset is from UCI, a small subset of UCR and others
• Number of series ~ 10K
• Data points per series ~ 200
Approaches and Data Set
© Pepperdata, Inc.21
• Make TS for each variable the same length by zero
padding
• Convert each TS into a GADF image
• Interpolate any missing data points in the image using
linear interpolation on the image
• Stack the images for the five variables
• Use the same process as before for univariate time
series
Our “Off the Shelf” Approach (PD)
© Pepperdata, Inc.22
5-Fold Cross Validation Error
© Pepperdata, Inc.23
0
5
10
15
20
25
30
Robot failure LP1 Robot failure LP2 Robot failure LP3 Robot failure LP4 Robot failure LP5
MDDTW Best
PD 5-fold
10-Fold Cross Validation Error
© Pepperdata, Inc.24
0
5
10
15
20
25
30
Robot failure LP1 Robot failure LP2 Robot failure LP3 Robot failure LP4 Robot failure LP5
Echo Network Best
PD 10-fold
• Four variables:
• CPU, Virtual Memory, HDFS reads, Network Ops
• Each time series collected over one week
• 10 data points to 10K+ data points
• Missing data
• Extremely noisy
• For periods longer than a week, data is much larger
• Sampling rate is the same for all TS
PD Data
© Pepperdata, Inc.25
Accuracy per Label on PD Dataset G
© Pepperdata, Inc.26
0
20
40
60
80
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Accuracy
Number of TS = 3092
Lengths per TS = 5 to 8500
Average Accuracy = 78.14%
Accuracy per Label on PD Dataset R
© Pepperdata, Inc.27
Number of TS = 6715
Lengths per TS = 5 to 9400
Average Accuracy = 75.95
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
Summary
© Pepperdata, Inc.28
Our “Off the Shelf” approach is as good as the
best approaches for both UTS and MTS. And,
the methodology is the same for both types of
TS.
Thank You

Más contenido relacionado

La actualidad más candente

Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
MLconf
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 

La actualidad más candente (20)

Introduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithmIntroduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithm
 
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
 
Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learning
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
task scheduling in cloud datacentre using genetic algorithm
task scheduling in cloud datacentre using genetic algorithmtask scheduling in cloud datacentre using genetic algorithm
task scheduling in cloud datacentre using genetic algorithm
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
 
deep reinforcement learning with double q learning
deep reinforcement learning with double q learningdeep reinforcement learning with double q learning
deep reinforcement learning with double q learning
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Optimization in deep learning
Optimization in deep learningOptimization in deep learning
Optimization in deep learning
 
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
 
Image classification with neural networks
Image classification with neural networksImage classification with neural networks
Image classification with neural networks
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 

Destacado

LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
MLconf
 
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
MLconf
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2017
Daniel Shank, Data Scientist, Talla at MLconf SF 2017Daniel Shank, Data Scientist, Talla at MLconf SF 2017
Daniel Shank, Data Scientist, Talla at MLconf SF 2017
MLconf
 
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
MLconf
 
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
MLconf
 
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
MLconf
 
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
MLconf
 
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
MLconf
 

Destacado (15)

LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
 
Jonas Schneider, Head of Engineering for Robotics, OpenAI
Jonas Schneider, Head of Engineering for Robotics, OpenAIJonas Schneider, Head of Engineering for Robotics, OpenAI
Jonas Schneider, Head of Engineering for Robotics, OpenAI
 
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2017
Daniel Shank, Data Scientist, Talla at MLconf SF 2017Daniel Shank, Data Scientist, Talla at MLconf SF 2017
Daniel Shank, Data Scientist, Talla at MLconf SF 2017
 
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
 
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
 
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
 
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
 
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
 
Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017
 
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
 
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
 
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
 
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
 
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
 

Similar a Ashfaq Munshi, ML7 Fellow, Pepperdata

Puppet Camp London Fall 2014: Keynote
Puppet Camp London Fall 2014: KeynotePuppet Camp London Fall 2014: Keynote
Puppet Camp London Fall 2014: Keynote
Puppet
 
ADCSS 2022
ADCSS 2022ADCSS 2022
Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING Journal
Pandey_G
 
Puppet Camp Melbourne: Keynote
Puppet Camp Melbourne: KeynotePuppet Camp Melbourne: Keynote
Puppet Camp Melbourne: Keynote
Puppet
 
Puppet Camp Seattle 2014: Keynote
Puppet Camp Seattle 2014: KeynotePuppet Camp Seattle 2014: Keynote
Puppet Camp Seattle 2014: Keynote
Puppet
 

Similar a Ashfaq Munshi, ML7 Fellow, Pepperdata (20)

MSR 2009
MSR 2009MSR 2009
MSR 2009
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
Early Application experiences on Summit
Early Application experiences on Summit Early Application experiences on Summit
Early Application experiences on Summit
 
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
 
The Pill for Your Migration Hell
The Pill for Your Migration HellThe Pill for Your Migration Hell
The Pill for Your Migration Hell
 
Puppet Camp London Fall 2014: Keynote
Puppet Camp London Fall 2014: KeynotePuppet Camp London Fall 2014: Keynote
Puppet Camp London Fall 2014: Keynote
 
Exascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate AnalyticsExascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate Analytics
 
A Billion Points of Data Pressure
A Billion Points of Data PressureA Billion Points of Data Pressure
A Billion Points of Data Pressure
 
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
 
Overview of DuraMat software tool development
Overview of DuraMat software tool developmentOverview of DuraMat software tool development
Overview of DuraMat software tool development
 
Keynote: Machine Learning for Design Automation at DAC 2018
Keynote:  Machine Learning for Design Automation at DAC 2018Keynote:  Machine Learning for Design Automation at DAC 2018
Keynote: Machine Learning for Design Automation at DAC 2018
 
BIRTE-13-Kawashima
BIRTE-13-KawashimaBIRTE-13-Kawashima
BIRTE-13-Kawashima
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
ADCSS 2022
ADCSS 2022ADCSS 2022
ADCSS 2022
 
Dasia 2022
Dasia 2022Dasia 2022
Dasia 2022
 
Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING Journal
 
Puppet Camp Melbourne: Keynote
Puppet Camp Melbourne: KeynotePuppet Camp Melbourne: Keynote
Puppet Camp Melbourne: Keynote
 
Fiware: Connecting to robots
Fiware: Connecting to robotsFiware: Connecting to robots
Fiware: Connecting to robots
 
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
 
Puppet Camp Seattle 2014: Keynote
Puppet Camp Seattle 2014: KeynotePuppet Camp Seattle 2014: Keynote
Puppet Camp Seattle 2014: Keynote
 

Más de MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 

Más de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Último

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Ashfaq Munshi, ML7 Fellow, Pepperdata

  • 1. Classifying Multivariate Time Series Scalably Ashfaq Munshi, Saeed Bidhendi, Faramarz Munshi November 10, 2017
  • 2. • Background and Motivation • Univariate Time Series (UTS) • Multivariate Time Series (MTS) • Conclusion Overview © Pepperdata, Inc.2
  • 4.
  • 5. Pepperdata Telemetry Data Scale Example production deployment: © Pepperdata, Inc.5 570 Nodes 20 Tasks / Node 300 Metrics / Task 5-Sec Sampling 41 Million Points / Minute
  • 6. 300 Trillion Performance Data Points Collected Our Big Data About Production Big Data © Pepperdata, Inc.6 22 Thousand Production Nodes 50 Million Jobs/Year
  • 7. Example Time Series © Pepperdata, Inc.7
  • 8. • Highly variable in length • 10 data points to 10K+ data points • Missing data • Extremely noisy Characteristics of our TS © Pepperdata, Inc.8
  • 9. Problem © Pepperdata, Inc.9 Classify this collection of time series to give operators a better understanding of resource utilization on their clusters and to enable a scheduler to better optimize cluster resources
  • 11. • Two recent approaches from the literature • Transform the TS into an image then use a tiled CNN [Wang & Oats 2015] • Transform the TS into a bag of patterns [Schafer & Leser 2017] • Dataset is the UCR data set • 82 time series data sets • Number of series < 10K • Data points per series < 2K Approaches and Data Set © Pepperdata, Inc.11
  • 12. • Map the time series into • Gramian Angular Summation Fields • Gramian Angular Difference Fields • Markov Transition Fields • Feed images into a tiled CNN for classification Time Series and Images © Pepperdata, Inc.12 [Wang & Oats, 2015]
  • 13. • Normalize the time series into [-1,1] • Transform to Polar Coordinates Gramian Angular Fields © Pepperdata, Inc.13 [Wang & Oats, 2015]
  • 14. Example GADF Image © Pepperdata, Inc.14 [Wang & Oats, 2015]
  • 15. • Divide TS into windows • Fourier Transform TS in window • Apply low-pass filter • Quantize the Fourier coefficients • Map window to words • Extract features from sentences • Use Logistic Regression classifier Time Series and Bag of Patterns © Pepperdata, Inc.15 [Schafer & Leser 2017]
  • 16. • Convert TS into image (GADF) • Use Google’s pre-trained CNN; trained on inception v3 • Embed into 2,048-dimensional vector space • Train MLP • 2 hidden layers (50 nodes each) • ReLU activation • Dropout for regularization (.1, .2) • Softmax final layer Our “Off the shelf” Approach (PD) © Pepperdata, Inc.16
  • 17. Accuracies for a subset of UCR © Pepperdata, Inc.17 0% 20% 40% 60% 80% 100% BOSS (91.1) PD (89.8) GADF+GASF+MTF (86.4)
  • 18. Accuracy on a subset of UCR © Pepperdata, Inc.18 68% 70% 72% 74% 76% 78% 80% 82% 84% 86% WEASEL 1-NN DTW CV 1-NN DTW BOSS Learning Shapelet (LS) TSBF ST EE (PROP) COTE (ensemble) PD
  • 19. Training Time Comparison © Pepperdata, Inc.19  PD
  • 21. • Two recent approaches from the literature • Use an ESN (“Echo State Network”) to map MTS into state clouds [Wang, Wang, Liu 2015] • Use Dynamic Time Warping with Mahalanobis distance metric [Mei, Liu, Wang, Gao 2016] • Dataset is from UCI, a small subset of UCR and others • Number of series ~ 10K • Data points per series ~ 200 Approaches and Data Set © Pepperdata, Inc.21
  • 22. • Make TS for each variable the same length by zero padding • Convert each TS into a GADF image • Interpolate any missing data points in the image using linear interpolation on the image • Stack the images for the five variables • Use the same process as before for univariate time series Our “Off the Shelf” Approach (PD) © Pepperdata, Inc.22
  • 23. 5-Fold Cross Validation Error © Pepperdata, Inc.23 0 5 10 15 20 25 30 Robot failure LP1 Robot failure LP2 Robot failure LP3 Robot failure LP4 Robot failure LP5 MDDTW Best PD 5-fold
  • 24. 10-Fold Cross Validation Error © Pepperdata, Inc.24 0 5 10 15 20 25 30 Robot failure LP1 Robot failure LP2 Robot failure LP3 Robot failure LP4 Robot failure LP5 Echo Network Best PD 10-fold
  • 25. • Four variables: • CPU, Virtual Memory, HDFS reads, Network Ops • Each time series collected over one week • 10 data points to 10K+ data points • Missing data • Extremely noisy • For periods longer than a week, data is much larger • Sampling rate is the same for all TS PD Data © Pepperdata, Inc.25
  • 26. Accuracy per Label on PD Dataset G © Pepperdata, Inc.26 0 20 40 60 80 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Accuracy Number of TS = 3092 Lengths per TS = 5 to 8500 Average Accuracy = 78.14%
  • 27. Accuracy per Label on PD Dataset R © Pepperdata, Inc.27 Number of TS = 6715 Lengths per TS = 5 to 9400 Average Accuracy = 75.95 0 20 40 60 80 100 120 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
  • 28. Summary © Pepperdata, Inc.28 Our “Off the Shelf” approach is as good as the best approaches for both UTS and MTS. And, the methodology is the same for both types of TS.

Notas del editor

  1. Expedia cluster, 3/21-3/24. https://beta-dashboard.pepperdata.com/expedia-chandler-prod/charts#s=2017/03/21-13:07&e=2017/03/24-13:07&tzo=-7&m=basic