SlideShare a Scribd company logo
1 of 19
Here is the
anomalow-down!
Sevvandi Kandanaarachchi
RMIT University
Joint work with Rob Hyndman
1
Why anomalies?
• They tell a different story
• Fraudulent credit card transactions amongst billions of
legitimate transactions
• Computer network intrusions
• Astronomical anomalies – solar flares
• Weather anomalies – tsunamis
• Stock market anomalies – heralding a crash?
2
Anomaly detection – why?
• Take fraud and network intrusions for example
• Training a model on certain fraud/intrusions/cyber attacks is
not optimal, because there are new types of fraud/attacks,
always!
• You want to be alerted when weird things happen.
• Anomaly detection is used in these applications.
3
Is everything rosy?
4
Some
Current
Challenges
High dimensionality of data
• Finding anomalies in high dimensional data is hard
• Anomalies and normal points look similar
High false positives
• Do not want an “alarm factory” – confidence in the
system goes down
Parameters need to be defined by the user
• But expert knowledge is needed
5
Overview
lookout – an
anomaly
detection
method
Low false positives
User does not need to specify parameters
lookout – on CRAN
dobin – a
dimension
reduction
method for
anomaly
detection
Addresses the high dimensionality challenge
dobin – on CRAN
6
dobin –
dimension
reduction for
outlier detection
Sevvandi Kandanaarachchi, Rob Hyndman
JCGS, (2021) 30:1, 204-219
7
What is it?
Original anomalies are still
anomalies in the reduced
dimensional space
It is a preprocessing technique
Not an anomaly detection method
8
What does
it do?
Find a set of new axes (basis
vectors), which preserves
anomalies
First basis vector in the direction of
most anomalousness (largest knn
distances), second basis vector in
the direction of second largest knn
distances
9
Example
• Uniform distribution in 20
dimensions,
• one point at (0.9, 0.9, 0.9, . . .)
• This is the outlier
• In R
• > dobin(X)
10
Sevvandi Kandanaarachchi, Rob Hyndman
Preprint - https://bit.ly/lookoutliers
lookout – leave one
out kde for outlier
detection
11
lookout
Outlier detection method
• Because of Extreme Value Theory
(EVT)
• EVT is used to model 100-year floods
• Use a Generalized Pareto Distribution
Low false positives
Not an “alarm factory”
12
lookout
User does not need to specify
parameters
• Use Kernel Density Estimates –
need a bandwidth parameter
• But general bandwidth is not
appropriate for anomaly detection
• Select bandwidth using topological
data analysis
• bw(TDA) → KDE → EVT → outliers
Anomaly persistence
• Which anomalies are consistently
identified, with changing
bandwidth?
• Visual representation of anomaly
persistence
13
Example 1
2D normal distribution, with outliers at the far end.
The outlying indices are 501 - 505
The persistence diagram. The outliers get identified
for a large range of bandwidth values.
14
Example 2
2D bimodal distribution, with outliers in the trough.
The outliers have indices 1001 - 1005
The persistence diagram. Again, the outliers
get identified for a large range of bandwidth values.
15
Example 3
Points in 3 normally distributed clusters, with anomalies
away from them. Anomalies have indices 701 - 703.
The persistence diagram. Anomalies get
identified for a broad range of bandwidth
values.
16
Example 4
Points in an annulus with anomalies in the middle.
Anomalies have indices 1001 - 1010
The persistence diagram.
17
Summary
• dobin - a dimension reduction method for anomaly detection
• lookout - a EVT based method to find anomalies
• Both paper/preprint available
• https://doi.org/10.1080/10618600.2020.1807353
• https://bit.ly/lookoutliers
• Both packages on CRAN
18
Thank you!
19

More Related Content

Similar to Here is the anomalow-down!

presentation.pptx
presentation.pptxpresentation.pptx
presentation.pptxshamaaslam3
 
Big Data for Big Power: How smart is the grid if the infrastructure is stupid?
Big Data for Big Power:  How smart is the grid if the infrastructure is stupid?Big Data for Big Power:  How smart is the grid if the infrastructure is stupid?
Big Data for Big Power: How smart is the grid if the infrastructure is stupid?OReillyStrata
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networksCSIRO
 
4th Year Project Presentation Slides
4th Year Project Presentation Slides4th Year Project Presentation Slides
4th Year Project Presentation SlidesItrat Rahman
 
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01MapR Technologies
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionTed Dunning
 
Final observability starts_with_data
Final observability starts_with_dataFinal observability starts_with_data
Final observability starts_with_dataDave McAllister
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperGarvit Burad
 
Reproducible Emulation of Analog Behavioral Models
Reproducible Emulation of Analog Behavioral ModelsReproducible Emulation of Analog Behavioral Models
Reproducible Emulation of Analog Behavioral Modelsfnothaft
 
Anomalies and events keep us on our toes
Anomalies and events keep us on our toesAnomalies and events keep us on our toes
Anomalies and events keep us on our toesCSIRO
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P
 
Estimating default risk in fund structures
Estimating default risk in fund structuresEstimating default risk in fund structures
Estimating default risk in fund structuresIFMR
 
Portal Imaging used to clear setup uncertainty
Portal Imaging used to clear setup uncertaintyPortal Imaging used to clear setup uncertainty
Portal Imaging used to clear setup uncertaintyMajoVJJose
 
Practical solutions in ultra low power design for artificial retina
Practical solutions in ultra low power design for artificial retinaPractical solutions in ultra low power design for artificial retina
Practical solutions in ultra low power design for artificial retinachiportal
 
Digital radiography testing
Digital radiography testingDigital radiography testing
Digital radiography testingmehrdad kehtari
 
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...Edge AI and Vision Alliance
 
Wqtc2013 invest ofperformanceprobswitheds-20130910
Wqtc2013 invest ofperformanceprobswitheds-20130910Wqtc2013 invest ofperformanceprobswitheds-20130910
Wqtc2013 invest ofperformanceprobswitheds-20130910John B. Cook, PE, CEO
 

Similar to Here is the anomalow-down! (20)

FINAL B.V.C 8051.pptx
FINAL B.V.C 8051.pptxFINAL B.V.C 8051.pptx
FINAL B.V.C 8051.pptx
 
presentation.pptx
presentation.pptxpresentation.pptx
presentation.pptx
 
Big Data for Big Power: How smart is the grid if the infrastructure is stupid?
Big Data for Big Power:  How smart is the grid if the infrastructure is stupid?Big Data for Big Power:  How smart is the grid if the infrastructure is stupid?
Big Data for Big Power: How smart is the grid if the infrastructure is stupid?
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networks
 
4th Year Project Presentation Slides
4th Year Project Presentation Slides4th Year Project Presentation Slides
4th Year Project Presentation Slides
 
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
 
Final observability starts_with_data
Final observability starts_with_dataFinal observability starts_with_data
Final observability starts_with_data
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research Paper
 
Reproducible Emulation of Analog Behavioral Models
Reproducible Emulation of Analog Behavioral ModelsReproducible Emulation of Analog Behavioral Models
Reproducible Emulation of Analog Behavioral Models
 
Anomalies and events keep us on our toes
Anomalies and events keep us on our toesAnomalies and events keep us on our toes
Anomalies and events keep us on our toes
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
 
cable fault.pptx
cable fault.pptxcable fault.pptx
cable fault.pptx
 
Estimating default risk in fund structures
Estimating default risk in fund structuresEstimating default risk in fund structures
Estimating default risk in fund structures
 
Portal Imaging used to clear setup uncertainty
Portal Imaging used to clear setup uncertaintyPortal Imaging used to clear setup uncertainty
Portal Imaging used to clear setup uncertainty
 
Practical solutions in ultra low power design for artificial retina
Practical solutions in ultra low power design for artificial retinaPractical solutions in ultra low power design for artificial retina
Practical solutions in ultra low power design for artificial retina
 
238 iit conf 238
238 iit conf  238238 iit conf  238
238 iit conf 238
 
Digital radiography testing
Digital radiography testingDigital radiography testing
Digital radiography testing
 
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...
 
Wqtc2013 invest ofperformanceprobswitheds-20130910
Wqtc2013 invest ofperformanceprobswitheds-20130910Wqtc2013 invest ofperformanceprobswitheds-20130910
Wqtc2013 invest ofperformanceprobswitheds-20130910
 

More from CSIRO

The painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataThe painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataCSIRO
 
Explainable insights on algorithm performance
Explainable insights on algorithm performanceExplainable insights on algorithm performance
Explainable insights on algorithm performanceCSIRO
 
The painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataThe painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataCSIRO
 
Sophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationSophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationCSIRO
 
Explainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in educationExplainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in educationCSIRO
 
A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?CSIRO
 
Explainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptxExplainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptxCSIRO
 
Anomalous Networks
Anomalous NetworksAnomalous Networks
Anomalous NetworksCSIRO
 
Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonCSIRO
 
Comparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataComparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataCSIRO
 
Algorithm evaluation using Item Response Theory
Algorithm evaluation using Item Response TheoryAlgorithm evaluation using Item Response Theory
Algorithm evaluation using Item Response TheoryCSIRO
 
Getting better at detecting anomalies by using ensembles
Getting better at detecting anomalies by using ensemblesGetting better at detecting anomalies by using ensembles
Getting better at detecting anomalies by using ensemblesCSIRO
 
Evaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response TheoryEvaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response TheoryCSIRO
 
Anomalies! You can't escape them.
Anomalies! You can't escape them.Anomalies! You can't escape them.
Anomalies! You can't escape them.CSIRO
 
Algorithm evaluation using item response theory
Algorithm evaluation using item response theoryAlgorithm evaluation using item response theory
Algorithm evaluation using item response theoryCSIRO
 

More from CSIRO (15)

The painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataThe painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral data
 
Explainable insights on algorithm performance
Explainable insights on algorithm performanceExplainable insights on algorithm performance
Explainable insights on algorithm performance
 
The painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataThe painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS data
 
Sophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationSophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data exploration
 
Explainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in educationExplainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in education
 
A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?
 
Explainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptxExplainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptx
 
Anomalous Networks
Anomalous NetworksAnomalous Networks
Anomalous Networks
 
Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparison
 
Comparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataComparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial data
 
Algorithm evaluation using Item Response Theory
Algorithm evaluation using Item Response TheoryAlgorithm evaluation using Item Response Theory
Algorithm evaluation using Item Response Theory
 
Getting better at detecting anomalies by using ensembles
Getting better at detecting anomalies by using ensemblesGetting better at detecting anomalies by using ensembles
Getting better at detecting anomalies by using ensembles
 
Evaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response TheoryEvaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response Theory
 
Anomalies! You can't escape them.
Anomalies! You can't escape them.Anomalies! You can't escape them.
Anomalies! You can't escape them.
 
Algorithm evaluation using item response theory
Algorithm evaluation using item response theoryAlgorithm evaluation using item response theory
Algorithm evaluation using item response theory
 

Recently uploaded

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 

Recently uploaded (20)

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 

Here is the anomalow-down!

  • 1. Here is the anomalow-down! Sevvandi Kandanaarachchi RMIT University Joint work with Rob Hyndman 1
  • 2. Why anomalies? • They tell a different story • Fraudulent credit card transactions amongst billions of legitimate transactions • Computer network intrusions • Astronomical anomalies – solar flares • Weather anomalies – tsunamis • Stock market anomalies – heralding a crash? 2
  • 3. Anomaly detection – why? • Take fraud and network intrusions for example • Training a model on certain fraud/intrusions/cyber attacks is not optimal, because there are new types of fraud/attacks, always! • You want to be alerted when weird things happen. • Anomaly detection is used in these applications. 3
  • 5. Some Current Challenges High dimensionality of data • Finding anomalies in high dimensional data is hard • Anomalies and normal points look similar High false positives • Do not want an “alarm factory” – confidence in the system goes down Parameters need to be defined by the user • But expert knowledge is needed 5
  • 6. Overview lookout – an anomaly detection method Low false positives User does not need to specify parameters lookout – on CRAN dobin – a dimension reduction method for anomaly detection Addresses the high dimensionality challenge dobin – on CRAN 6
  • 7. dobin – dimension reduction for outlier detection Sevvandi Kandanaarachchi, Rob Hyndman JCGS, (2021) 30:1, 204-219 7
  • 8. What is it? Original anomalies are still anomalies in the reduced dimensional space It is a preprocessing technique Not an anomaly detection method 8
  • 9. What does it do? Find a set of new axes (basis vectors), which preserves anomalies First basis vector in the direction of most anomalousness (largest knn distances), second basis vector in the direction of second largest knn distances 9
  • 10. Example • Uniform distribution in 20 dimensions, • one point at (0.9, 0.9, 0.9, . . .) • This is the outlier • In R • > dobin(X) 10
  • 11. Sevvandi Kandanaarachchi, Rob Hyndman Preprint - https://bit.ly/lookoutliers lookout – leave one out kde for outlier detection 11
  • 12. lookout Outlier detection method • Because of Extreme Value Theory (EVT) • EVT is used to model 100-year floods • Use a Generalized Pareto Distribution Low false positives Not an “alarm factory” 12
  • 13. lookout User does not need to specify parameters • Use Kernel Density Estimates – need a bandwidth parameter • But general bandwidth is not appropriate for anomaly detection • Select bandwidth using topological data analysis • bw(TDA) → KDE → EVT → outliers Anomaly persistence • Which anomalies are consistently identified, with changing bandwidth? • Visual representation of anomaly persistence 13
  • 14. Example 1 2D normal distribution, with outliers at the far end. The outlying indices are 501 - 505 The persistence diagram. The outliers get identified for a large range of bandwidth values. 14
  • 15. Example 2 2D bimodal distribution, with outliers in the trough. The outliers have indices 1001 - 1005 The persistence diagram. Again, the outliers get identified for a large range of bandwidth values. 15
  • 16. Example 3 Points in 3 normally distributed clusters, with anomalies away from them. Anomalies have indices 701 - 703. The persistence diagram. Anomalies get identified for a broad range of bandwidth values. 16
  • 17. Example 4 Points in an annulus with anomalies in the middle. Anomalies have indices 1001 - 1010 The persistence diagram. 17
  • 18. Summary • dobin - a dimension reduction method for anomaly detection • lookout - a EVT based method to find anomalies • Both paper/preprint available • https://doi.org/10.1080/10618600.2020.1807353 • https://bit.ly/lookoutliers • Both packages on CRAN 18