SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
© Copyright 2015 Simularity. All Rights Reserved
Ray Richardson, Founder & CTO | ray@simularity.com
Practical Predictive Analytics on
Time Series Data using SAX
MLConf Seattle, May 1, 2015
© Copyright 2015 Simularity. All Rights Reserved
2
Anomaly Detection
 A time series anomaly is simply an unusual subsequence of the series
 “Unusual” will be taken to mean “improbable”
!  The degree of anomaly is isomorphic with the improbability of the
subsequence
!  Probability is not defined for Time Series
!  Probability can be defined for Symbols
 Mapping a time series to a symbol may allow us to assign a
probability to the time series subsequence
 This involves mapping the time series subsequence to a symbol in
some Symbol Space
© Copyright 2015 Simularity. All Rights Reserved
33
Symbolic Representation
 All data in a modern computer is in a Symbolic Representation
!  Integers, Floating point numbers and Strings are all symbols, and are all
composed of bytes
 Anomaly detection requires a special kind of symbol – one from a
Finite Symbol Space
!  This means there are a finite number of symbols available
© Copyright 2015 Simularity. All Rights Reserved
44
Finite Symbol Spaces
 For our purposes, a Finite Symbol Space is defined by 2 attributes
!  An Alphabet, from which components are drawn
!  A Symbol Length, defining the fixed number of components of the
symbol
 Thus, if we define the alphabet as a..d and a length of 4, a
legitimate symbol might be abcd
 Another legitimate symbol might be 10:15, where 10 is the row of
a matrix and 15 is the column
!  The size of the matrix must be constant
 Fixed point numbers are drawn from a Finite Symbol Space if
there is a lower and upper bound
© Copyright 2015 Simularity. All Rights Reserved
55
Why Finite Symbol Spaces?
 A Finite Symbol Space allows us to compute a (perhaps naïve)
probability of seeing a particular symbol
!  The number of possible symbols is al where a is the cardinality of the
alphabet and l is the length of the symbol
!  Perhaps naïve due to the fact that some symbols may never appear
•  In some symbolic representations of time series aaaa and dddd represent the same
series
 We can compute a probability of seeing a symbol if they are
random – it’s the reciprocal of size of the symbol space
© Copyright 2015 Simularity. All Rights Reserved
66
Time Series
 A time series is a sequence of pairs
!  Each pair consists of a Time Index and a Value
!  The Time Index may be implied if there is a constant difference between
values
 The time series can be segmented into “Windows” which
represent the time series between 2 Time Indices
 Symbols can represent Windows!
!  Because symbols in a Finite Symbol Space have a probability, we can
think of the probability of a time series
!  Symbols are easy to store and manipulate– each symbol can be
represented as an integer
© Copyright 2015 Simularity. All Rights Reserved
77
Normalizing Time Series
  A time series window can be put into a “normal form” called PAA (Piecewise Aggregate
Approximation).
  The PAA consists of K floating point values which represent the aggregate value of the
times series over fixed time spans
  Each value is the average of the readings that fall into each “box”
!  Each box is a time window with a start and end derived by segmenting the time series window into K windows
© Copyright 2015 Simularity. All Rights Reserved
88
The Symbolic Representation Of Time Series
 A number of algorithms exist to represent time series as symbols in
a Finite Symbol Space
!  These algorithms are often though of as “Feature Reducers”
 Self Organizing Maps are a traditional form of Feature Reducer
 SAX (Symbolic Aggregate approXimation) is another, designed
specifically for time series
 There are many other ways to reduce a time series to symbol
!  As long as the symbol is drawn from a Finite Symbol Space, the
technique described here will work
baabccbc
© Copyright 2015 Simularity. All Rights Reserved
99
What is SAX?
 SAX is a methodology for reducing a time series window to a
symbol
 The technique was developed by Dr. Eamonn Keogh et al. at the
University of California at Riverside in the early 2000’s
 It has since drawn a great deal of attention in the world of time
series analysis
© Copyright 2015 Simularity. All Rights Reserved
1010
What’s a SAX Word?
 A SAX word is the symbol generated by the SAX algorithm
 It is defined by a SAX Alphabet and a length
!  The SAX Alphabet is traditionally represented by letters, and its
components are referred to as “SAX Letters”
!  The size of the alphabet is typically small – this is particularly important for
anomaly detection
 When we write out a description of a SAX word, we typically use
a string like representation, such as “abcdefg”
!  SAX letters don’t have to be letters – implementations often use numbers
based at zero, however, we often display them as letters
© Copyright 2015 Simularity. All Rights Reserved
1111
Building A SAX Word
 Convert the Time Series Window to a PAA of the length of the
SAX word, and Z-normalize the PAA
!  Which mean and standard deviation are used for normalization will
affect the outcome
 Compute the SAX letter by dividing the Standard Normal
Distribution into K regions of equal area under the curve and
assigning each component of the PAA a letter from the SAX
Alphabet corresponding to the region indexed by the PAA value
 Repeating for each value of the PAA yields a SAX word of
equivalent length to the PAA
© Copyright 2015 Simularity. All Rights Reserved
How do we obtain SAX?
First convert the time
series to PAA
representation, then
convert the PAA to
symbols
It takes linear time
0 20 40 60 80 100 120
C
C
Slide by Eamonn Keogh and Jessica Lin. Used with permission.
0
--
0 20 40 60 80 100 120
b
b b
a
cc c
a
baabccbc
© Copyright 2015 Simularity. All Rights Reserved
1313
Encoding Magnitude And Slope
 The Magnitude and slope can be encoded in a SAX word
 The Magnitude (mean) can be Z-normalized over the entire
space of the time series, and divided into SAX letters
!  These letters need not be from the same alphabet as the SAX word
which represents the shape, we just need to consider the alphabet size
when computing the size of the Finite Symbol Space
 Slope can be encoded by dividing 180º into equal spaces, and
assigning each space to a letter
!  The slope can be determined by a number of methodologies
© Copyright 2015 Simularity. All Rights Reserved
1414
Computing The Anomaly
 We need a data structure, which uses SAX words as an index,
and stores the number of times we have seen each SAX word, as
well as the total number windows we’ve seen
 Due to the fact that our SAX words are of a fixed length and
alphabet, we know the total number of possible SAX words
 Tries are one choice of data structure
!  Allow for quick access
 Converting the SAX word to a number, which is an array index is
another
!  Requires exponentiation
© Copyright 2015 Simularity. All Rights Reserved
1515
Computing The Anomaly
 The procedure for examining a window
!  Convert the window into a SAX word
!  Lookup the current count for that SAX word and increment it
!  Compute a metric which determines how anomalous the window is
using 3 values – The total number of windows, the number of instances of
this SAX word, and the size of the Finite Symbol Space of SAX words
!  Compare the result of the metric with a predetermined threshold to
decide whether or not this window is anomalous
 This procedure is repeated for constantly incoming Time Series
Windows
© Copyright 2015 Simularity. All Rights Reserved
1616
The Metric
 Once we have determined the
values, we need to turn them into a
metric which tells us how anomalous a
window is
 The metric should discriminate
!  We should be able to discriminate
between multiple levels of anomaly
values
 The metric should be easy to compute
!  Embedded applications may not have
complex math libraries which allow for
complicated computation
 The metric should reflect the real world
© Copyright 2015 Simularity. All Rights Reserved
1717
The Metric – P-Values
 P-Values seem like a good metric
!  Expressed as a probability, they have a connection to the real world
 Unfortunately, P-Values closely approach zero and one once the
number of samples gets significant
!  This makes it difficult to set an “anomaly threshold”
!  This sets a hard criterion for an anomaly
© Copyright 2015 Simularity. All Rights Reserved
1818
The Metric – Log-Likelihood Ratio
 The Log-Likelihood ratio is perhaps a better choice of metric
!  Scaling the ratio between -1.0 and 1.0 gives a manageable value
!  Even extremely unlikely events can be discriminated
 Reversing the sign of the scaled log-likelihood ratio gives values
that are easier to understand
 Use the likelihood function for a binomial distribution
!  The number of trials is the Total Windows
!  The number of successes is the occurrence of this Window
!  The Probability is the Symbol Probability
 The log likelihood is particularly useful as it accounts for the
significance of the data i.e. the number of samples
 Like P-Values, it requires a floating point library
© Copyright 2015 Simularity. All Rights Reserved
1919
The Metric – Rate Ratio
 The rate ratio is the number of times more likely the event is
observed to have occurred, than would be predicted by
random chance
!  Smaller values mean more anomalous – less than 1 implies less likely than
chance
!  The reciprocal of the rate ratio gives an anomaly score which increases
!  Uses observed probabilities
 Doesn’t require math harder than division
 Doesn’t account for significance – significance has to be
accounted for by some other means
© Copyright 2015 Simularity. All Rights Reserved
2020
Other Means Of Symbolizing
 SAX may not always be the best way to reduce a window to a
symbol
!  SAX reduces resolution equally across all its members
!  Tiny, but important variations will be lost
 Self Organizing Maps can also be used
!  They require more computation, but don’t reduce resolution
!  Self Organizing Maps can encode magnitude directly
© Copyright 2015 Simularity. All Rights Reserved
2121
Using Self Organizing Maps
  Self Organizing Maps (SOMs) are (typically) a grid of vectors, which can be thought of as
weights or prototypes
!  The SOM algorithm adjusts the prototypes based on training data
  To operate the SOM, a Window vector is compared to each of the prototypes – the best
matching one “wins” and the symbol associated with the window is the row:column of the
matching grid
  The row:column is then used to index the count of how many times that prototype has
been seen.
  We now have the 3 values for computing the metric
© Copyright 2015 Simularity. All Rights Reserved
2222
Predicting Events
 A set of time series may be used to predict events
!  We look for the correlation between the symbols representing the time
series windows and Events which happen in the future
 This can be used to categorize Events according to an Event
Signature
!  Event signatures imply outcomes at a particular time index
© Copyright 2015 Simularity. All Rights Reserved
2323
A Concrete Example
 The SMART data on hard drives can be used to predict failures
!  Simularity used 53 of the sensors to test for anomalies and predict failures
 Information from nearly 400 hard drives was used to “train” the
anomaly detector
 Once trained, the system was used to identify Event Signatures
which indicated failure
 The time series in the system were reduced to SAX words, and
correlated with a single event, failure (all that was known)
 This can then be used to predict failure
© Copyright 2015 Simularity. All Rights Reserved
2424
Event Signatures For Failure Prediction
Notice there
are two
different
event
signatures
for these
failing drives
© Copyright 2015 Simularity. All Rights Reserved
2525
Credit
 This technique is similar, although not identical, to the TARZAN
methodology outlined by Eamonn Keogh and Jessica Lin
!  It and other work pertaining to SAX is available here:
http://www.cs.ucr.edu/~eamonn/SAX.htm
 Self Organizing Maps were invented by Teuvo Kohonen
http://www.cis.hut.fi/research/som-research/teuvo.html
© Copyright 2015 Simularity. All Rights Reserved
2626
Source Code
 Simularity maintains a GitHub repository of open-source software,
including an implementation of SAX suitable for using with the
techniques described here
www.github.com/simularity/SAX
1160, Brickyard Cove Road, Suite 200
Point Richmond, CA 94801
United States
+ 1 678-488-8857
ray@simularity.com
THANK YOU
@rayrichardson

Más contenido relacionado

Destacado

Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享
Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享
Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享Shane Leonard, CFA
 
From start-up to strategic growth business: How A Suit That Fits is accelerat...
From start-up to strategic growth business: How A Suit That Fits is accelerat...From start-up to strategic growth business: How A Suit That Fits is accelerat...
From start-up to strategic growth business: How A Suit That Fits is accelerat...The Nurture Network
 
Five Reasons to Visit the Health & Wellness Hub During Social Media Week NYC
Five Reasons to Visit the Health & Wellness Hub During Social Media Week NYCFive Reasons to Visit the Health & Wellness Hub During Social Media Week NYC
Five Reasons to Visit the Health & Wellness Hub During Social Media Week NYCLuminary Labs
 
【未來學堂】實驗班計畫說明
【未來學堂】實驗班計畫說明【未來學堂】實驗班計畫說明
【未來學堂】實驗班計畫說明Yu-cheng Liu
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016MLconf
 
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pubChao Zhu
 
豆瓣数据架构实践
豆瓣数据架构实践豆瓣数据架构实践
豆瓣数据架构实践Xupeng Yun
 
What is the maker movement?
What is the maker movement?What is the maker movement?
What is the maker movement?Luminary Labs
 
Transfer of Learning
Transfer of LearningTransfer of Learning
Transfer of LearningAbby Rondilla
 
Women in Tech: How to Build A Human Company
Women in Tech: How to Build A Human CompanyWomen in Tech: How to Build A Human Company
Women in Tech: How to Build A Human CompanyLuminary Labs
 
Building Healthier Communities: TEDMED 2016
Building Healthier Communities: TEDMED 2016Building Healthier Communities: TEDMED 2016
Building Healthier Communities: TEDMED 2016Luminary Labs
 
How to TEDx [Presentation Design Tips] - #TED #TEDX
How to TEDx [Presentation Design Tips] - #TED #TEDXHow to TEDx [Presentation Design Tips] - #TED #TEDX
How to TEDx [Presentation Design Tips] - #TED #TEDXEmpowered Presentations
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
The Human Company Playbook, Version 1.0
The Human Company Playbook, Version 1.0The Human Company Playbook, Version 1.0
The Human Company Playbook, Version 1.0Luminary Labs
 
Lab session #2: The Human Company
Lab session #2: The Human CompanyLab session #2: The Human Company
Lab session #2: The Human CompanyLuminary Labs
 
The Business of Social Media
The Business of Social Media The Business of Social Media
The Business of Social Media Dave Kerpen
 
Transfer Of Learning
Transfer Of LearningTransfer Of Learning
Transfer Of Learningajones1
 
The hottest analysis tools for startups
The hottest analysis tools for startupsThe hottest analysis tools for startups
The hottest analysis tools for startupsLiane Siebenhaar
 
10 Steps of Project Management in Digital Agencies
10 Steps of Project Management in Digital Agencies 10 Steps of Project Management in Digital Agencies
10 Steps of Project Management in Digital Agencies Alemsah Ozturk
 
Lost in Cultural Translation
Lost in Cultural TranslationLost in Cultural Translation
Lost in Cultural TranslationVanessa Vela
 

Destacado (20)

Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享
Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享
Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享
 
From start-up to strategic growth business: How A Suit That Fits is accelerat...
From start-up to strategic growth business: How A Suit That Fits is accelerat...From start-up to strategic growth business: How A Suit That Fits is accelerat...
From start-up to strategic growth business: How A Suit That Fits is accelerat...
 
Five Reasons to Visit the Health & Wellness Hub During Social Media Week NYC
Five Reasons to Visit the Health & Wellness Hub During Social Media Week NYCFive Reasons to Visit the Health & Wellness Hub During Social Media Week NYC
Five Reasons to Visit the Health & Wellness Hub During Social Media Week NYC
 
【未來學堂】實驗班計畫說明
【未來學堂】實驗班計畫說明【未來學堂】實驗班計畫說明
【未來學堂】實驗班計畫說明
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
 
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub
 
豆瓣数据架构实践
豆瓣数据架构实践豆瓣数据架构实践
豆瓣数据架构实践
 
What is the maker movement?
What is the maker movement?What is the maker movement?
What is the maker movement?
 
Transfer of Learning
Transfer of LearningTransfer of Learning
Transfer of Learning
 
Women in Tech: How to Build A Human Company
Women in Tech: How to Build A Human CompanyWomen in Tech: How to Build A Human Company
Women in Tech: How to Build A Human Company
 
Building Healthier Communities: TEDMED 2016
Building Healthier Communities: TEDMED 2016Building Healthier Communities: TEDMED 2016
Building Healthier Communities: TEDMED 2016
 
How to TEDx [Presentation Design Tips] - #TED #TEDX
How to TEDx [Presentation Design Tips] - #TED #TEDXHow to TEDx [Presentation Design Tips] - #TED #TEDX
How to TEDx [Presentation Design Tips] - #TED #TEDX
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
The Human Company Playbook, Version 1.0
The Human Company Playbook, Version 1.0The Human Company Playbook, Version 1.0
The Human Company Playbook, Version 1.0
 
Lab session #2: The Human Company
Lab session #2: The Human CompanyLab session #2: The Human Company
Lab session #2: The Human Company
 
The Business of Social Media
The Business of Social Media The Business of Social Media
The Business of Social Media
 
Transfer Of Learning
Transfer Of LearningTransfer Of Learning
Transfer Of Learning
 
The hottest analysis tools for startups
The hottest analysis tools for startupsThe hottest analysis tools for startups
The hottest analysis tools for startups
 
10 Steps of Project Management in Digital Agencies
10 Steps of Project Management in Digital Agencies 10 Steps of Project Management in Digital Agencies
10 Steps of Project Management in Digital Agencies
 
Lost in Cultural Translation
Lost in Cultural TranslationLost in Cultural Translation
Lost in Cultural Translation
 

Similar a Ray Richardson, Chief Technology Officer at Simularity at MLconf SEA - 5/01/15

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Time Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureTime Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureMarco Parenzan
 
Time Series Anomaly Detection for .net and Azure
Time Series Anomaly Detection for .net and AzureTime Series Anomaly Detection for .net and Azure
Time Series Anomaly Detection for .net and AzureMarco Parenzan
 
Deep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data ServicesDeep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data ServicesMarco Parenzan
 
Semplificare l'observability per progetti Serverless
Semplificare l'observability per progetti ServerlessSemplificare l'observability per progetti Serverless
Semplificare l'observability per progetti ServerlessLuciano Mammino
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Simplilearn
 
Air Passenger Prediction Using ARIMA Model
Air Passenger Prediction Using ARIMA Model Air Passenger Prediction Using ARIMA Model
Air Passenger Prediction Using ARIMA Model AkarshAvinash
 
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...Databricks
 
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...Fred Madrid
 
SignalFx Elasticsearch Metrics Monitoring and Alerting
SignalFx Elasticsearch Metrics Monitoring and AlertingSignalFx Elasticsearch Metrics Monitoring and Alerting
SignalFx Elasticsearch Metrics Monitoring and AlertingSignalFx
 
5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWSChristian Beedgen
 
Introduction to scala for a c programmer
Introduction to scala for a c programmerIntroduction to scala for a c programmer
Introduction to scala for a c programmerGirish Kumar A L
 

Similar a Ray Richardson, Chief Technology Officer at Simularity at MLconf SEA - 5/01/15 (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
SAX-TimeSeries
SAX-TimeSeriesSAX-TimeSeries
SAX-TimeSeries
 
Time Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureTime Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and Azure
 
Time Series Anomaly Detection for .net and Azure
Time Series Anomaly Detection for .net and AzureTime Series Anomaly Detection for .net and Azure
Time Series Anomaly Detection for .net and Azure
 
Rust presentation convergeconf
Rust presentation convergeconfRust presentation convergeconf
Rust presentation convergeconf
 
DAA Unit 1.pdf
DAA Unit 1.pdfDAA Unit 1.pdf
DAA Unit 1.pdf
 
APN Live - Technical Track
APN Live - Technical TrackAPN Live - Technical Track
APN Live - Technical Track
 
Deep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data ServicesDeep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data Services
 
Semplificare l'observability per progetti Serverless
Semplificare l'observability per progetti ServerlessSemplificare l'observability per progetti Serverless
Semplificare l'observability per progetti Serverless
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
 
Air Passenger Prediction Using ARIMA Model
Air Passenger Prediction Using ARIMA Model Air Passenger Prediction Using ARIMA Model
Air Passenger Prediction Using ARIMA Model
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Start with swift
Start with swiftStart with swift
Start with swift
 
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
 
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
 
What`s New in Java 8
What`s New in Java 8What`s New in Java 8
What`s New in Java 8
 
SignalFx Elasticsearch Metrics Monitoring and Alerting
SignalFx Elasticsearch Metrics Monitoring and AlertingSignalFx Elasticsearch Metrics Monitoring and Alerting
SignalFx Elasticsearch Metrics Monitoring and Alerting
 
5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS
 
Tom Kyte at Hotsos 2015
Tom Kyte at Hotsos 2015Tom Kyte at Hotsos 2015
Tom Kyte at Hotsos 2015
 
Introduction to scala for a c programmer
Introduction to scala for a c programmerIntroduction to scala for a c programmer
Introduction to scala for a c programmer
 

Más de MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Más de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Último

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Ray Richardson, Chief Technology Officer at Simularity at MLconf SEA - 5/01/15

  • 1. © Copyright 2015 Simularity. All Rights Reserved Ray Richardson, Founder & CTO | ray@simularity.com Practical Predictive Analytics on Time Series Data using SAX MLConf Seattle, May 1, 2015
  • 2. © Copyright 2015 Simularity. All Rights Reserved 2 Anomaly Detection  A time series anomaly is simply an unusual subsequence of the series  “Unusual” will be taken to mean “improbable” !  The degree of anomaly is isomorphic with the improbability of the subsequence !  Probability is not defined for Time Series !  Probability can be defined for Symbols  Mapping a time series to a symbol may allow us to assign a probability to the time series subsequence  This involves mapping the time series subsequence to a symbol in some Symbol Space
  • 3. © Copyright 2015 Simularity. All Rights Reserved 33 Symbolic Representation  All data in a modern computer is in a Symbolic Representation !  Integers, Floating point numbers and Strings are all symbols, and are all composed of bytes  Anomaly detection requires a special kind of symbol – one from a Finite Symbol Space !  This means there are a finite number of symbols available
  • 4. © Copyright 2015 Simularity. All Rights Reserved 44 Finite Symbol Spaces  For our purposes, a Finite Symbol Space is defined by 2 attributes !  An Alphabet, from which components are drawn !  A Symbol Length, defining the fixed number of components of the symbol  Thus, if we define the alphabet as a..d and a length of 4, a legitimate symbol might be abcd  Another legitimate symbol might be 10:15, where 10 is the row of a matrix and 15 is the column !  The size of the matrix must be constant  Fixed point numbers are drawn from a Finite Symbol Space if there is a lower and upper bound
  • 5. © Copyright 2015 Simularity. All Rights Reserved 55 Why Finite Symbol Spaces?  A Finite Symbol Space allows us to compute a (perhaps naïve) probability of seeing a particular symbol !  The number of possible symbols is al where a is the cardinality of the alphabet and l is the length of the symbol !  Perhaps naïve due to the fact that some symbols may never appear •  In some symbolic representations of time series aaaa and dddd represent the same series  We can compute a probability of seeing a symbol if they are random – it’s the reciprocal of size of the symbol space
  • 6. © Copyright 2015 Simularity. All Rights Reserved 66 Time Series  A time series is a sequence of pairs !  Each pair consists of a Time Index and a Value !  The Time Index may be implied if there is a constant difference between values  The time series can be segmented into “Windows” which represent the time series between 2 Time Indices  Symbols can represent Windows! !  Because symbols in a Finite Symbol Space have a probability, we can think of the probability of a time series !  Symbols are easy to store and manipulate– each symbol can be represented as an integer
  • 7. © Copyright 2015 Simularity. All Rights Reserved 77 Normalizing Time Series   A time series window can be put into a “normal form” called PAA (Piecewise Aggregate Approximation).   The PAA consists of K floating point values which represent the aggregate value of the times series over fixed time spans   Each value is the average of the readings that fall into each “box” !  Each box is a time window with a start and end derived by segmenting the time series window into K windows
  • 8. © Copyright 2015 Simularity. All Rights Reserved 88 The Symbolic Representation Of Time Series  A number of algorithms exist to represent time series as symbols in a Finite Symbol Space !  These algorithms are often though of as “Feature Reducers”  Self Organizing Maps are a traditional form of Feature Reducer  SAX (Symbolic Aggregate approXimation) is another, designed specifically for time series  There are many other ways to reduce a time series to symbol !  As long as the symbol is drawn from a Finite Symbol Space, the technique described here will work baabccbc
  • 9. © Copyright 2015 Simularity. All Rights Reserved 99 What is SAX?  SAX is a methodology for reducing a time series window to a symbol  The technique was developed by Dr. Eamonn Keogh et al. at the University of California at Riverside in the early 2000’s  It has since drawn a great deal of attention in the world of time series analysis
  • 10. © Copyright 2015 Simularity. All Rights Reserved 1010 What’s a SAX Word?  A SAX word is the symbol generated by the SAX algorithm  It is defined by a SAX Alphabet and a length !  The SAX Alphabet is traditionally represented by letters, and its components are referred to as “SAX Letters” !  The size of the alphabet is typically small – this is particularly important for anomaly detection  When we write out a description of a SAX word, we typically use a string like representation, such as “abcdefg” !  SAX letters don’t have to be letters – implementations often use numbers based at zero, however, we often display them as letters
  • 11. © Copyright 2015 Simularity. All Rights Reserved 1111 Building A SAX Word  Convert the Time Series Window to a PAA of the length of the SAX word, and Z-normalize the PAA !  Which mean and standard deviation are used for normalization will affect the outcome  Compute the SAX letter by dividing the Standard Normal Distribution into K regions of equal area under the curve and assigning each component of the PAA a letter from the SAX Alphabet corresponding to the region indexed by the PAA value  Repeating for each value of the PAA yields a SAX word of equivalent length to the PAA
  • 12. © Copyright 2015 Simularity. All Rights Reserved How do we obtain SAX? First convert the time series to PAA representation, then convert the PAA to symbols It takes linear time 0 20 40 60 80 100 120 C C Slide by Eamonn Keogh and Jessica Lin. Used with permission. 0 -- 0 20 40 60 80 100 120 b b b a cc c a baabccbc
  • 13. © Copyright 2015 Simularity. All Rights Reserved 1313 Encoding Magnitude And Slope  The Magnitude and slope can be encoded in a SAX word  The Magnitude (mean) can be Z-normalized over the entire space of the time series, and divided into SAX letters !  These letters need not be from the same alphabet as the SAX word which represents the shape, we just need to consider the alphabet size when computing the size of the Finite Symbol Space  Slope can be encoded by dividing 180º into equal spaces, and assigning each space to a letter !  The slope can be determined by a number of methodologies
  • 14. © Copyright 2015 Simularity. All Rights Reserved 1414 Computing The Anomaly  We need a data structure, which uses SAX words as an index, and stores the number of times we have seen each SAX word, as well as the total number windows we’ve seen  Due to the fact that our SAX words are of a fixed length and alphabet, we know the total number of possible SAX words  Tries are one choice of data structure !  Allow for quick access  Converting the SAX word to a number, which is an array index is another !  Requires exponentiation
  • 15. © Copyright 2015 Simularity. All Rights Reserved 1515 Computing The Anomaly  The procedure for examining a window !  Convert the window into a SAX word !  Lookup the current count for that SAX word and increment it !  Compute a metric which determines how anomalous the window is using 3 values – The total number of windows, the number of instances of this SAX word, and the size of the Finite Symbol Space of SAX words !  Compare the result of the metric with a predetermined threshold to decide whether or not this window is anomalous  This procedure is repeated for constantly incoming Time Series Windows
  • 16. © Copyright 2015 Simularity. All Rights Reserved 1616 The Metric  Once we have determined the values, we need to turn them into a metric which tells us how anomalous a window is  The metric should discriminate !  We should be able to discriminate between multiple levels of anomaly values  The metric should be easy to compute !  Embedded applications may not have complex math libraries which allow for complicated computation  The metric should reflect the real world
  • 17. © Copyright 2015 Simularity. All Rights Reserved 1717 The Metric – P-Values  P-Values seem like a good metric !  Expressed as a probability, they have a connection to the real world  Unfortunately, P-Values closely approach zero and one once the number of samples gets significant !  This makes it difficult to set an “anomaly threshold” !  This sets a hard criterion for an anomaly
  • 18. © Copyright 2015 Simularity. All Rights Reserved 1818 The Metric – Log-Likelihood Ratio  The Log-Likelihood ratio is perhaps a better choice of metric !  Scaling the ratio between -1.0 and 1.0 gives a manageable value !  Even extremely unlikely events can be discriminated  Reversing the sign of the scaled log-likelihood ratio gives values that are easier to understand  Use the likelihood function for a binomial distribution !  The number of trials is the Total Windows !  The number of successes is the occurrence of this Window !  The Probability is the Symbol Probability  The log likelihood is particularly useful as it accounts for the significance of the data i.e. the number of samples  Like P-Values, it requires a floating point library
  • 19. © Copyright 2015 Simularity. All Rights Reserved 1919 The Metric – Rate Ratio  The rate ratio is the number of times more likely the event is observed to have occurred, than would be predicted by random chance !  Smaller values mean more anomalous – less than 1 implies less likely than chance !  The reciprocal of the rate ratio gives an anomaly score which increases !  Uses observed probabilities  Doesn’t require math harder than division  Doesn’t account for significance – significance has to be accounted for by some other means
  • 20. © Copyright 2015 Simularity. All Rights Reserved 2020 Other Means Of Symbolizing  SAX may not always be the best way to reduce a window to a symbol !  SAX reduces resolution equally across all its members !  Tiny, but important variations will be lost  Self Organizing Maps can also be used !  They require more computation, but don’t reduce resolution !  Self Organizing Maps can encode magnitude directly
  • 21. © Copyright 2015 Simularity. All Rights Reserved 2121 Using Self Organizing Maps   Self Organizing Maps (SOMs) are (typically) a grid of vectors, which can be thought of as weights or prototypes !  The SOM algorithm adjusts the prototypes based on training data   To operate the SOM, a Window vector is compared to each of the prototypes – the best matching one “wins” and the symbol associated with the window is the row:column of the matching grid   The row:column is then used to index the count of how many times that prototype has been seen.   We now have the 3 values for computing the metric
  • 22. © Copyright 2015 Simularity. All Rights Reserved 2222 Predicting Events  A set of time series may be used to predict events !  We look for the correlation between the symbols representing the time series windows and Events which happen in the future  This can be used to categorize Events according to an Event Signature !  Event signatures imply outcomes at a particular time index
  • 23. © Copyright 2015 Simularity. All Rights Reserved 2323 A Concrete Example  The SMART data on hard drives can be used to predict failures !  Simularity used 53 of the sensors to test for anomalies and predict failures  Information from nearly 400 hard drives was used to “train” the anomaly detector  Once trained, the system was used to identify Event Signatures which indicated failure  The time series in the system were reduced to SAX words, and correlated with a single event, failure (all that was known)  This can then be used to predict failure
  • 24. © Copyright 2015 Simularity. All Rights Reserved 2424 Event Signatures For Failure Prediction Notice there are two different event signatures for these failing drives
  • 25. © Copyright 2015 Simularity. All Rights Reserved 2525 Credit  This technique is similar, although not identical, to the TARZAN methodology outlined by Eamonn Keogh and Jessica Lin !  It and other work pertaining to SAX is available here: http://www.cs.ucr.edu/~eamonn/SAX.htm  Self Organizing Maps were invented by Teuvo Kohonen http://www.cis.hut.fi/research/som-research/teuvo.html
  • 26. © Copyright 2015 Simularity. All Rights Reserved 2626 Source Code  Simularity maintains a GitHub repository of open-source software, including an implementation of SAX suitable for using with the techniques described here www.github.com/simularity/SAX
  • 27. 1160, Brickyard Cove Road, Suite 200 Point Richmond, CA 94801 United States + 1 678-488-8857 ray@simularity.com THANK YOU @rayrichardson