SlideShare a Scribd company logo
1 of 7
Anomaly Detection via Online Over-Sampling Principal Component
Analysis
ABSTRACT:
Anomaly detection has been an important research topic in data mining and machine learning.
Many real-world applications such as intrusion or credit card fraud detection require an
effective and efficient framework to identify deviated data instances. However, most anomaly
detection methods are typically implemented in batch mode, and thus cannot be easily extended
to large-scale problems without sacrificing computation and memory requirements. In this
paper, we propose an online over-sampling principal component analysis (osPCA) algorithm to
address this problem, and we aim at detecting the presence of outliers from a large amount of
data via an online updating technique. Unlike prior PCA based approaches, we do not store the
entire data matrix or covariance matrix, and thus our approach is especially of interest in online
or large-scale problems. By over-sampling the target instance and extracting the principal
direction of the data, the proposed osPCA allows us to determine the anomaly of the target
instance according to the variation of the resulting dominant eigenvector. Since our osPCA need
not perform eigen analysis explicitly, the proposed framework is favored
for online applications which have computation or memory limitations. Compared with the
well-known power method for PCA and other popular anomaly detection algorithms, our
GLOBALSOFT TECHNOLOGIES
IEEE PROJECTS & SOFTWARE DEVELOPMENTS
IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE
BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS
CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401
Visit: www.finalyearprojects.org Mail to:ieeefinalsemprojects@gmail.com
experimental results verify the feasibility of our proposed method in terms of both accuracy and
efficiency.
EXISTING SYSTEM:
The existing approaches can be divided into three categories:
1. distribution (statistical),
2. distance and
3. density based methods.
Statistical approaches assume that the data follows some standard or predetermined
distributions, and this type of approach aims to find the outliers which deviate form such
distributions.
For distance-based methods, the distances between each data point of interest and its neighbors
are calculated. If the result is above some predetermined threshold, the target instance will be
considered as an outlier.
One of the representatives of this type of approach is to use a density based local outlier factor
(LOF) to measure the outlierness of each data instance. Based on the local density of each data
instance, the LOF determines the degree of outlierness, which provides suspicious ranking
scores for all samples. The most important property of the LOF is the ability to estimate local
data structure via density estimation. This allows users to identify outliers which are sheltered
under a global data structure
DISADVANTAGES OF EXISTING SYSTEM:
Most distribution models are assumed univariate, and thus the lack of robustness for
multidimensional data is a concern. Moreover, since these methods are typically implemented
in the original data space directly, their solution models might suffer from the noise present in
the data
PROPOSED SYSTEM:
PCA is a well known unsupervised dimension reduction method, which determines the
principal directions of the data distribution. This will prohibit the use of our proposed
framework for real-world large-scale applications. Although the well known power method is
able to produce approximated PCA solutions, it requires the storage of the covariance matrix
and cannot be easily extended to applications with streaming data or online settings. Therefore,
we present an online updating technique for our osPCA. This updating technique allows us to
efficiently calculate the approximated dominant eigenvector without performing eigen analysis
or storing the data covariance matrix.
ADVANTAGES OF PROPOSED SYSTEM:
Compared to the power method or other popular anomaly detection algorithms, the
required computational costs and memory requirements are significantly reduced, and
thus our method is especially preferable in online, streaming data, or large scale
problems.
SYSTEM ARCHITECTURE:
ALGORITHMS USED:
Anomaly Detection via Online Oversampling PCA
MODULES
1. Cleaning Data
2. Detecting Outliers
3. Clustering
MODULES DESCRIPTION
MODULE - I
Cleaning Data
The osPCA is applied for the data set for finding the principal direction. In this method the
target instance will be duplicated multiple times, and the idea is to amplify the effect of outlier
rather than that of normal data. After that using Leave One Out (LOO) strategy, the angle
difference will be identified. In which if we add or remove one data instance, the direction will
be changed. For normal data instances this angle difference should be smaller and for outliers
this might be larger.
A set of data instances in the original data set is taken as predefined input. This data may be
contaminated by noise and incorrect data labelling etc., This data might be error free, because
this is going to be used as training data. So the cleaning is done using Over-Sampling Principal
Component Analysis (osPCA) method. And then the score of outlierness St is calculated for
each data instances. The smallest St value is taken as the threshold value.
MODULE - II
Detection
This is for detecting the outlierness of the user input. When the user giving the input to the
system, the system calculate the St value for the new input. And then compare that new St value
with the threshold value which is calculated in earlier.
If the St value of the new data instance is above the threshold value, then that input data is
identified as an outlier and that value will be discarded by the system. Otherwise it is
considered as a normal data instance, and the PCA value of that particular data instance is
updated accordingly.
MODULE - III
Clustering
The training data will be selected only by our assumption. So there is a possibility that
some outlier data may be considered as normal data in the previous method due to our training
data. So the clustering method is used to solve this problem. The clusters are formed for input
data instances and then the outlier calculation is applied for each cluster to find the outlier
exactly.
SYSTEM CONFIGURATION:-
HARDWARE CONFIGURATION:-
 Processor - Pentium –IV
 Speed - 1.1 Ghz
 RAM - 256 MB(min)
 Hard Disk - 20 GB
 Key Board - Standard Windows Keyboard
 Mouse - Two or Three Button Mouse
 Monitor - SVGA
SOFTWARE CONFIGURATION:-
 Operating System : Windows XP
 Programming Language : JAVA
 Java Version : JDK 1.6 & above.
REFERENCE:
Yuh-Jye Lee, Yi-Ren Yeh, and Yu-Chiang Frank Wang, “Anomaly Detection via Online Over-
Sampling Principal Component Analysis”, IEEE TRANSACTIONS ON KNOWLEDGE
AND DATA ENGINEERING 2013.

More Related Content

More from IEEEFINALYEARPROJECTS

Query adaptive image search with hash codes
Query adaptive image search with hash codesQuery adaptive image search with hash codes
Query adaptive image search with hash codesIEEEFINALYEARPROJECTS
 
Noise reduction based on partial reference, dual-tree complex wavelet transfo...
Noise reduction based on partial reference, dual-tree complex wavelet transfo...Noise reduction based on partial reference, dual-tree complex wavelet transfo...
Noise reduction based on partial reference, dual-tree complex wavelet transfo...IEEEFINALYEARPROJECTS
 
Local directional number pattern for face analysis face and expression recogn...
Local directional number pattern for face analysis face and expression recogn...Local directional number pattern for face analysis face and expression recogn...
Local directional number pattern for face analysis face and expression recogn...IEEEFINALYEARPROJECTS
 
An access point based fec mechanism for video transmission over wireless la ns
An access point based fec mechanism for video transmission over wireless la nsAn access point based fec mechanism for video transmission over wireless la ns
An access point based fec mechanism for video transmission over wireless la nsIEEEFINALYEARPROJECTS
 
Towards differential query services in cost efficient clouds
Towards differential query services in cost efficient cloudsTowards differential query services in cost efficient clouds
Towards differential query services in cost efficient cloudsIEEEFINALYEARPROJECTS
 
Spoc a secure and privacy preserving opportunistic computing framework for mo...
Spoc a secure and privacy preserving opportunistic computing framework for mo...Spoc a secure and privacy preserving opportunistic computing framework for mo...
Spoc a secure and privacy preserving opportunistic computing framework for mo...IEEEFINALYEARPROJECTS
 
Secure and efficient data transmission for cluster based wireless sensor netw...
Secure and efficient data transmission for cluster based wireless sensor netw...Secure and efficient data transmission for cluster based wireless sensor netw...
Secure and efficient data transmission for cluster based wireless sensor netw...IEEEFINALYEARPROJECTS
 
Privacy preserving back propagation neural network learning over arbitrarily ...
Privacy preserving back propagation neural network learning over arbitrarily ...Privacy preserving back propagation neural network learning over arbitrarily ...
Privacy preserving back propagation neural network learning over arbitrarily ...IEEEFINALYEARPROJECTS
 
Harnessing the cloud for securely outsourcing large
Harnessing the cloud for securely outsourcing largeHarnessing the cloud for securely outsourcing large
Harnessing the cloud for securely outsourcing largeIEEEFINALYEARPROJECTS
 
Geo community-based broadcasting for data dissemination in mobile social netw...
Geo community-based broadcasting for data dissemination in mobile social netw...Geo community-based broadcasting for data dissemination in mobile social netw...
Geo community-based broadcasting for data dissemination in mobile social netw...IEEEFINALYEARPROJECTS
 
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
Enabling data dynamic and indirect mutual trust for cloud computing storage s...Enabling data dynamic and indirect mutual trust for cloud computing storage s...
Enabling data dynamic and indirect mutual trust for cloud computing storage s...IEEEFINALYEARPROJECTS
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...IEEEFINALYEARPROJECTS
 
A secure protocol for spontaneous wireless ad hoc networks creation
A secure protocol for spontaneous wireless ad hoc networks creationA secure protocol for spontaneous wireless ad hoc networks creation
A secure protocol for spontaneous wireless ad hoc networks creationIEEEFINALYEARPROJECTS
 
Utility privacy tradeoff in databases an information-theoretic approach
Utility privacy tradeoff in databases an information-theoretic approachUtility privacy tradeoff in databases an information-theoretic approach
Utility privacy tradeoff in databases an information-theoretic approachIEEEFINALYEARPROJECTS
 
Two tales of privacy in online social networks
Two tales of privacy in online social networksTwo tales of privacy in online social networks
Two tales of privacy in online social networksIEEEFINALYEARPROJECTS
 
Sort a self organizing trust model for peer-to-peer systems
Sort a self organizing trust model for peer-to-peer systemsSort a self organizing trust model for peer-to-peer systems
Sort a self organizing trust model for peer-to-peer systemsIEEEFINALYEARPROJECTS
 
Security analysis of a single sign on mechanism for distributed computer netw...
Security analysis of a single sign on mechanism for distributed computer netw...Security analysis of a single sign on mechanism for distributed computer netw...
Security analysis of a single sign on mechanism for distributed computer netw...IEEEFINALYEARPROJECTS
 
Securing class initialization in java like languages
Securing class initialization in java like languagesSecuring class initialization in java like languages
Securing class initialization in java like languagesIEEEFINALYEARPROJECTS
 

More from IEEEFINALYEARPROJECTS (20)

Query adaptive image search with hash codes
Query adaptive image search with hash codesQuery adaptive image search with hash codes
Query adaptive image search with hash codes
 
Noise reduction based on partial reference, dual-tree complex wavelet transfo...
Noise reduction based on partial reference, dual-tree complex wavelet transfo...Noise reduction based on partial reference, dual-tree complex wavelet transfo...
Noise reduction based on partial reference, dual-tree complex wavelet transfo...
 
Local directional number pattern for face analysis face and expression recogn...
Local directional number pattern for face analysis face and expression recogn...Local directional number pattern for face analysis face and expression recogn...
Local directional number pattern for face analysis face and expression recogn...
 
An access point based fec mechanism for video transmission over wireless la ns
An access point based fec mechanism for video transmission over wireless la nsAn access point based fec mechanism for video transmission over wireless la ns
An access point based fec mechanism for video transmission over wireless la ns
 
Towards differential query services in cost efficient clouds
Towards differential query services in cost efficient cloudsTowards differential query services in cost efficient clouds
Towards differential query services in cost efficient clouds
 
Spoc a secure and privacy preserving opportunistic computing framework for mo...
Spoc a secure and privacy preserving opportunistic computing framework for mo...Spoc a secure and privacy preserving opportunistic computing framework for mo...
Spoc a secure and privacy preserving opportunistic computing framework for mo...
 
Secure and efficient data transmission for cluster based wireless sensor netw...
Secure and efficient data transmission for cluster based wireless sensor netw...Secure and efficient data transmission for cluster based wireless sensor netw...
Secure and efficient data transmission for cluster based wireless sensor netw...
 
Privacy preserving back propagation neural network learning over arbitrarily ...
Privacy preserving back propagation neural network learning over arbitrarily ...Privacy preserving back propagation neural network learning over arbitrarily ...
Privacy preserving back propagation neural network learning over arbitrarily ...
 
Non cooperative location privacy
Non cooperative location privacyNon cooperative location privacy
Non cooperative location privacy
 
Harnessing the cloud for securely outsourcing large
Harnessing the cloud for securely outsourcing largeHarnessing the cloud for securely outsourcing large
Harnessing the cloud for securely outsourcing large
 
Geo community-based broadcasting for data dissemination in mobile social netw...
Geo community-based broadcasting for data dissemination in mobile social netw...Geo community-based broadcasting for data dissemination in mobile social netw...
Geo community-based broadcasting for data dissemination in mobile social netw...
 
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
Enabling data dynamic and indirect mutual trust for cloud computing storage s...Enabling data dynamic and indirect mutual trust for cloud computing storage s...
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 
A secure protocol for spontaneous wireless ad hoc networks creation
A secure protocol for spontaneous wireless ad hoc networks creationA secure protocol for spontaneous wireless ad hoc networks creation
A secure protocol for spontaneous wireless ad hoc networks creation
 
Utility privacy tradeoff in databases an information-theoretic approach
Utility privacy tradeoff in databases an information-theoretic approachUtility privacy tradeoff in databases an information-theoretic approach
Utility privacy tradeoff in databases an information-theoretic approach
 
Two tales of privacy in online social networks
Two tales of privacy in online social networksTwo tales of privacy in online social networks
Two tales of privacy in online social networks
 
Spatial approximate string search
Spatial approximate string searchSpatial approximate string search
Spatial approximate string search
 
Sort a self organizing trust model for peer-to-peer systems
Sort a self organizing trust model for peer-to-peer systemsSort a self organizing trust model for peer-to-peer systems
Sort a self organizing trust model for peer-to-peer systems
 
Security analysis of a single sign on mechanism for distributed computer netw...
Security analysis of a single sign on mechanism for distributed computer netw...Security analysis of a single sign on mechanism for distributed computer netw...
Security analysis of a single sign on mechanism for distributed computer netw...
 
Securing class initialization in java like languages
Securing class initialization in java like languagesSecuring class initialization in java like languages
Securing class initialization in java like languages
 

Recently uploaded

Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfalexjohnson7307
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...SOFTTECHHUB
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxMasterG
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 

Recently uploaded (20)

Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 

Anomaly detection via online over sampling principal component analysis

  • 1. Anomaly Detection via Online Over-Sampling Principal Component Analysis ABSTRACT: Anomaly detection has been an important research topic in data mining and machine learning. Many real-world applications such as intrusion or credit card fraud detection require an effective and efficient framework to identify deviated data instances. However, most anomaly detection methods are typically implemented in batch mode, and thus cannot be easily extended to large-scale problems without sacrificing computation and memory requirements. In this paper, we propose an online over-sampling principal component analysis (osPCA) algorithm to address this problem, and we aim at detecting the presence of outliers from a large amount of data via an online updating technique. Unlike prior PCA based approaches, we do not store the entire data matrix or covariance matrix, and thus our approach is especially of interest in online or large-scale problems. By over-sampling the target instance and extracting the principal direction of the data, the proposed osPCA allows us to determine the anomaly of the target instance according to the variation of the resulting dominant eigenvector. Since our osPCA need not perform eigen analysis explicitly, the proposed framework is favored for online applications which have computation or memory limitations. Compared with the well-known power method for PCA and other popular anomaly detection algorithms, our GLOBALSOFT TECHNOLOGIES IEEE PROJECTS & SOFTWARE DEVELOPMENTS IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401 Visit: www.finalyearprojects.org Mail to:ieeefinalsemprojects@gmail.com
  • 2. experimental results verify the feasibility of our proposed method in terms of both accuracy and efficiency. EXISTING SYSTEM: The existing approaches can be divided into three categories: 1. distribution (statistical), 2. distance and 3. density based methods. Statistical approaches assume that the data follows some standard or predetermined distributions, and this type of approach aims to find the outliers which deviate form such distributions. For distance-based methods, the distances between each data point of interest and its neighbors are calculated. If the result is above some predetermined threshold, the target instance will be considered as an outlier. One of the representatives of this type of approach is to use a density based local outlier factor (LOF) to measure the outlierness of each data instance. Based on the local density of each data instance, the LOF determines the degree of outlierness, which provides suspicious ranking scores for all samples. The most important property of the LOF is the ability to estimate local data structure via density estimation. This allows users to identify outliers which are sheltered under a global data structure DISADVANTAGES OF EXISTING SYSTEM: Most distribution models are assumed univariate, and thus the lack of robustness for multidimensional data is a concern. Moreover, since these methods are typically implemented
  • 3. in the original data space directly, their solution models might suffer from the noise present in the data PROPOSED SYSTEM: PCA is a well known unsupervised dimension reduction method, which determines the principal directions of the data distribution. This will prohibit the use of our proposed framework for real-world large-scale applications. Although the well known power method is able to produce approximated PCA solutions, it requires the storage of the covariance matrix and cannot be easily extended to applications with streaming data or online settings. Therefore, we present an online updating technique for our osPCA. This updating technique allows us to efficiently calculate the approximated dominant eigenvector without performing eigen analysis or storing the data covariance matrix. ADVANTAGES OF PROPOSED SYSTEM: Compared to the power method or other popular anomaly detection algorithms, the required computational costs and memory requirements are significantly reduced, and thus our method is especially preferable in online, streaming data, or large scale problems.
  • 4. SYSTEM ARCHITECTURE: ALGORITHMS USED: Anomaly Detection via Online Oversampling PCA
  • 5. MODULES 1. Cleaning Data 2. Detecting Outliers 3. Clustering MODULES DESCRIPTION MODULE - I Cleaning Data The osPCA is applied for the data set for finding the principal direction. In this method the target instance will be duplicated multiple times, and the idea is to amplify the effect of outlier rather than that of normal data. After that using Leave One Out (LOO) strategy, the angle difference will be identified. In which if we add or remove one data instance, the direction will be changed. For normal data instances this angle difference should be smaller and for outliers this might be larger. A set of data instances in the original data set is taken as predefined input. This data may be contaminated by noise and incorrect data labelling etc., This data might be error free, because this is going to be used as training data. So the cleaning is done using Over-Sampling Principal Component Analysis (osPCA) method. And then the score of outlierness St is calculated for each data instances. The smallest St value is taken as the threshold value. MODULE - II Detection
  • 6. This is for detecting the outlierness of the user input. When the user giving the input to the system, the system calculate the St value for the new input. And then compare that new St value with the threshold value which is calculated in earlier. If the St value of the new data instance is above the threshold value, then that input data is identified as an outlier and that value will be discarded by the system. Otherwise it is considered as a normal data instance, and the PCA value of that particular data instance is updated accordingly. MODULE - III Clustering The training data will be selected only by our assumption. So there is a possibility that some outlier data may be considered as normal data in the previous method due to our training data. So the clustering method is used to solve this problem. The clusters are formed for input data instances and then the outlier calculation is applied for each cluster to find the outlier exactly. SYSTEM CONFIGURATION:- HARDWARE CONFIGURATION:-  Processor - Pentium –IV  Speed - 1.1 Ghz  RAM - 256 MB(min)  Hard Disk - 20 GB  Key Board - Standard Windows Keyboard  Mouse - Two or Three Button Mouse
  • 7.  Monitor - SVGA SOFTWARE CONFIGURATION:-  Operating System : Windows XP  Programming Language : JAVA  Java Version : JDK 1.6 & above. REFERENCE: Yuh-Jye Lee, Yi-Ren Yeh, and Yu-Chiang Frank Wang, “Anomaly Detection via Online Over- Sampling Principal Component Analysis”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2013.