SlideShare a Scribd company logo
1 of 13
Anomaly Detection(10.1 ~ 10.3) Khalid Elshafie abolkog@dblab.cbnu.ac.kr Database / Bioinformatics Lab. Chungbuk National University
Anomaly Detection (10.1 ~ 10.3) Contents 1 2 3 Introduction Statistical Approach Proximity-based Approach 2
Anomaly Detection (10.1 ~ 10.3) Introduction (1/4) Anomaly Detection Find objects that are different from most other objects. Anomaly objects are often known as outliers. On a scatter plot of data, they lie far away from other data points. Also knows as Deviation detection Anomalous objects have attribute values that deviate significantly from the expected or typical attribute values. Exception mining Because anomalies are exceptional in some sense. 3 outlier
Anomaly Detection (10.1 ~ 10.3) Introduction (2/4) Applications Fraud Detection. The purchasing behavior of someone who steals a credit card is probably different from that of the original owner. Intrusion Detection. Attacks on computer systems and computer networks. Ecosystem Disturbance. Hurricanes, floods, heat waves…etc Medicine. Unusual symptoms or test result may indicate potential health problem. …… 4
Anomaly Detection (10.1 ~ 10.3) Introduction (3/4) What causes anomalies Data from Different Sources Someone who committing credit card fraud belongs to different class than those people who use credit card legitimately. Such anomalies are often of considerable interest and are the focus of anomaly detection in the field of data mining. An outlier is an observation that differs so much from other observations as to arouse suspicion that it was generated by different mechanism (Hawkins’ Definition of Outlier). Natural Variant Many data sets can be modeled by statistical distribution where the probability of a data object decrease rapidly as the distance of the object from the center of the distribution increases. Most objects are near a center (average object) and the likelihood that an object differs from this average is small. Anomalies that represent extreme or unlikely variations are often interesting. Data Measurement and Collection Error Error in the data collection or measurement process are another source of anomalies. The goal is to eliminate such anomalies since they provide no interesting information but only reduce the quality of the data and the subsequent data analysis. 5
Anomaly Detection (10.1 ~ 10.3) Introduction (4/4) Approach to Anomaly Detection Model-based Technique. Build a model of the data. Anomalies are objects that do not fit the model very well. Proximity-based Technique. Many of the technique in this area are based on distances and are referred toasdistance-based outlier detection technique. Anomalous object are those that are distant from most of the other objects. Density-Based Technique. Objects that are in regions of low density are relatively distant from their neighbors and can be considered anomalous. 6
Anomaly Detection (10.1 ~ 10.3) Statistical Approach (1/2) Statistical approach are model-based approaches A model is created for the data and object are evaluated with respect to how well they fit the model. Most statistical approach to outlier detection are based on building a probability model distribution model and considering how likely objects are under that model. Outliers are objects that has a low probability with respect to probability distribution model of the data (Probabilistic Definition of an Outlier). 7
Anomaly Detection (10.1 ~ 10.3) Statistical Approach (2/2) Strength and weakness  Have a firm foundation and build on standard statistical technique When there is sufficient knowledge of the data and the type of the test that should be applied, these tests can be very effective. There are a wide variety of statistical outliers test for single attributes, fewer options are available for multivariate data.  Can perform poorly for high-dimensional data. 8
Anomaly Detection (10.1 ~ 10.3) Proximity-based Approach (1/3) Proximity-based Approach The basic notation of this approach is straightforward An object is anomaly if it is distant from most point. More general and more easily applied than statistical approaches. Its easier to determine a meaningful proximity measure for data set than to determine its statistical distribution. One of the simplest way to measure whether an object is distant from most point is to use the distance to the k-nearest neighbor. The outlier score of an object is given by the distance to its k-nearest neighbor. The lowest value of outlier score is 0 The highest value is the maximum possible value of the distance function (usually infinity). 9
Anomaly Detection (10.1 ~ 10.3) Proximity-based Approach (2/4) 10 Approach: Compute the distance between every pair of data points There are various ways to define outliers: Data points for which there are fewer than p neighboring points within a distance D The top n data points whose distance to the kth nearest neighbor is greatest The top n data points whose average distance to the kth nearest neighbors is greatest
Anomaly Detection (10.1 ~ 10.3) Proximity-based Approach (3/4) 11 Proximity-based Approach ,[object Object]
The outlier score can be highly sensitive to the value of k
If k is too small e.g., 1 then a small number of nearby outliers can cause a low outlier score

More Related Content

What's hot

What's hot (20)

Data cleaning-outlier-detection
Data cleaning-outlier-detectionData cleaning-outlier-detection
Data cleaning-outlier-detection
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
Outlier Detection
Outlier DetectionOutlier Detection
Outlier Detection
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
Anomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesAnomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time Series
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
data mining
data miningdata mining
data mining
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection Technique
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 

Similar to Chapter 10 Anomaly Detection

Data Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxData Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
randyburney60861
 
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier DetectionReverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
1crore projects
 

Similar to Chapter 10 Anomaly Detection (20)

Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detection
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional Data
 
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxData Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
 
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier DetectionReverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.ppt
 
Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection
 
Kdd08 abod
Kdd08 abodKdd08 abod
Kdd08 abod
 
angle based outlier de
angle based outlier deangle based outlier de
angle based outlier de
 
12 outlier
12 outlier12 outlier
12 outlier
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
 
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
 
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data StreamA Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
 
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
Unsupervised Distance Based Detection of Outliers by using Anti-hubsUnsupervised Distance Based Detection of Outliers by using Anti-hubs
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
 
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
 
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised DataOutlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
 
Detection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed ApproachDetection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed Approach
 
G44093135
G44093135G44093135
G44093135
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Chapter 10 Anomaly Detection

  • 1. Anomaly Detection(10.1 ~ 10.3) Khalid Elshafie abolkog@dblab.cbnu.ac.kr Database / Bioinformatics Lab. Chungbuk National University
  • 2. Anomaly Detection (10.1 ~ 10.3) Contents 1 2 3 Introduction Statistical Approach Proximity-based Approach 2
  • 3. Anomaly Detection (10.1 ~ 10.3) Introduction (1/4) Anomaly Detection Find objects that are different from most other objects. Anomaly objects are often known as outliers. On a scatter plot of data, they lie far away from other data points. Also knows as Deviation detection Anomalous objects have attribute values that deviate significantly from the expected or typical attribute values. Exception mining Because anomalies are exceptional in some sense. 3 outlier
  • 4. Anomaly Detection (10.1 ~ 10.3) Introduction (2/4) Applications Fraud Detection. The purchasing behavior of someone who steals a credit card is probably different from that of the original owner. Intrusion Detection. Attacks on computer systems and computer networks. Ecosystem Disturbance. Hurricanes, floods, heat waves…etc Medicine. Unusual symptoms or test result may indicate potential health problem. …… 4
  • 5. Anomaly Detection (10.1 ~ 10.3) Introduction (3/4) What causes anomalies Data from Different Sources Someone who committing credit card fraud belongs to different class than those people who use credit card legitimately. Such anomalies are often of considerable interest and are the focus of anomaly detection in the field of data mining. An outlier is an observation that differs so much from other observations as to arouse suspicion that it was generated by different mechanism (Hawkins’ Definition of Outlier). Natural Variant Many data sets can be modeled by statistical distribution where the probability of a data object decrease rapidly as the distance of the object from the center of the distribution increases. Most objects are near a center (average object) and the likelihood that an object differs from this average is small. Anomalies that represent extreme or unlikely variations are often interesting. Data Measurement and Collection Error Error in the data collection or measurement process are another source of anomalies. The goal is to eliminate such anomalies since they provide no interesting information but only reduce the quality of the data and the subsequent data analysis. 5
  • 6. Anomaly Detection (10.1 ~ 10.3) Introduction (4/4) Approach to Anomaly Detection Model-based Technique. Build a model of the data. Anomalies are objects that do not fit the model very well. Proximity-based Technique. Many of the technique in this area are based on distances and are referred toasdistance-based outlier detection technique. Anomalous object are those that are distant from most of the other objects. Density-Based Technique. Objects that are in regions of low density are relatively distant from their neighbors and can be considered anomalous. 6
  • 7. Anomaly Detection (10.1 ~ 10.3) Statistical Approach (1/2) Statistical approach are model-based approaches A model is created for the data and object are evaluated with respect to how well they fit the model. Most statistical approach to outlier detection are based on building a probability model distribution model and considering how likely objects are under that model. Outliers are objects that has a low probability with respect to probability distribution model of the data (Probabilistic Definition of an Outlier). 7
  • 8. Anomaly Detection (10.1 ~ 10.3) Statistical Approach (2/2) Strength and weakness Have a firm foundation and build on standard statistical technique When there is sufficient knowledge of the data and the type of the test that should be applied, these tests can be very effective. There are a wide variety of statistical outliers test for single attributes, fewer options are available for multivariate data. Can perform poorly for high-dimensional data. 8
  • 9. Anomaly Detection (10.1 ~ 10.3) Proximity-based Approach (1/3) Proximity-based Approach The basic notation of this approach is straightforward An object is anomaly if it is distant from most point. More general and more easily applied than statistical approaches. Its easier to determine a meaningful proximity measure for data set than to determine its statistical distribution. One of the simplest way to measure whether an object is distant from most point is to use the distance to the k-nearest neighbor. The outlier score of an object is given by the distance to its k-nearest neighbor. The lowest value of outlier score is 0 The highest value is the maximum possible value of the distance function (usually infinity). 9
  • 10. Anomaly Detection (10.1 ~ 10.3) Proximity-based Approach (2/4) 10 Approach: Compute the distance between every pair of data points There are various ways to define outliers: Data points for which there are fewer than p neighboring points within a distance D The top n data points whose distance to the kth nearest neighbor is greatest The top n data points whose average distance to the kth nearest neighbors is greatest
  • 11.
  • 12. The outlier score can be highly sensitive to the value of k
  • 13. If k is too small e.g., 1 then a small number of nearby outliers can cause a low outlier score
  • 14.
  • 15. Thank You ! www.themegallery.com