SlideShare una empresa de Scribd logo
1 de 5
Descargar para leer sin conexión
DISTRIBUTED DATA MINING IN
  CREDIT CARD FRAUD DETECTION
INTRODUCTION
Credit card transactions grow in number, taking a larger share of any
country’s payment system and this is turn has led to a higher rate of
stolen account numbers and subsequent losses by banks. Hence,
improved fraud detection has become essential to maintain the
viability of the country’s payment system.
Banks have used early fraud warning systems for some years. Large-
scale data-mining techniques can improve on the state of the art in
commercial practice. Scalable techniques to analyze massive
amounts of transaction data that efficiently compute fraud detectors
in a timely manner is an important problem, especially for e-
commerce.
Besides scalability and efficiency, the fraud-detection task exhibits
technical problems that include skewed distributions of training data
and non-uniform cost per error, both of which have not been widely
studied in the knowledge-discovery and datamining community.
In this project, a deep survey is made and evaluates a number of
techniques that address these three main issues concurrently.
Our proposed methods of combining multiple learned fraud detectors
under a “cost model” are general and demonstrably useful; our
empirical results demonstrate that we can significantly reduce loss
due to fraud through distributed data mining of fraud models.


DATA MINING AND MACHINE LEARNING
The aim of data mining is to extract knowledge from large amounts of
data. This knowledge is nontrivial and hidden in the data. Machine
learning is often used in data mining.
DATA MINING: A DEFINITION
Art/Science of uncovering non-trivial, valuable information from
                       a large database
Emphasis on:
  Non-obvious (difficult)

  Useful (cost vs benefit)

  Large (automatic)



Yet, no rules, provided that the process is efficient in time, space and
human resources.
  Data Mining is the process of finding interesting trends or
  patterns in large datasets in order to guide future decisions.

  Related to exploratory data analysis (area of statistics) and
  knowledge discovery (area in artificial intelligence, machine
  learning).
  Data Mining is characterized by having VERY LARGE datasets.



DATA MINING VS. MACHINE LEARNING

  Size: Databases are usually very large so algorithms must scale
  well

  Design Purpose: Databases are not usually designed for data
  mining (but for other purposes), and thus, may not have
  convenient attributes
  Errors and Noise: Databases almost always contain errors


The aim of machine learning is to adapt to new circumstances, to
detect and extrapolate. A distinction can be made between
unsupervised and supervised machine learning algorithms.
PROPOSED SYSTEM

In today’s increasingly electronic society and with the rapid advances
of electronic commerce on the Internet, the use of credit cards for
purchases has become convenient and necessary.

Credit card transactions have become the de facto standard for
Internet and Webbased e-commerce. The US government estimates
that credit cards accounted for approximately US $13 billion in
Internet sales during 1998. This figure is expected to grow rapidly
each year.

However, the growing number of credit card transactions provides
more opportunity for thieves to steal credit card numbers and
subsequently commit fraud. When banks lose money because of
credit card fraud, cardholders pay for all of that loss through higher
interest rates, higher fees, and reduced benefits.

Hence, it is in both the banks’ and cardholders’ interest to reduce
illegitimate use of credit cards by early fraud detection. For many
years, the credit card industry has studied computing models for
automated detection systems; recently, these models have been the
subject of academic research, especially with respect to e-
commerce.

The credit card fraud-detection domain presents a number of
challenging issues for data mining:

  There are millions of credit card transactions processed each day.
  Mining such massive amounts of data requires highly efficient
  techniques that scale.

  The data are highly skewed—many more transactions are
  legitimate than fraudulent.

  Typical accuracy-based mining techniques can generate highly
  accurate fraud detectors by simply predicting that all transactions
  are legitimate, although this is equivalent to not detecting fraud at
  all.
Each transaction record has a different dollar amount and thus has
  a variable potential loss, rather than a fixed misclassification cost
  per error type, as is commonly assumed in cost-based mining
  techniques.


Our approach addresses the efficiency and scalability issues in
several ways. We divide large data set of labeled transactions (either
fraudulent or legitimate) into smaller subsets, apply mining
techniques to generate classifiers

in parallel, and combine the resultant base models by metalearning
from the classifiers’ behavior to generate a metaclassifier. Our
approach treats the classifiers as black boxes so that we can employ
a variety of learning algorithms.

Besides extensibility, combining multiple models computed over all
available data produces metaclassifiers that can offset the loss of
predictive performance that usually occurs when mining from data
subsets or sampling.

Furthermore, when we use the learned classifiers (for example,
during transaction authorization), the base classifiers can execute in
parallel, with the metaclassifier then combining their results. So, our
approach is highly efficient in generating these models and also
relatively efficient in applying them.

Another parallel approach focuses on parallelizing a particular
algorithm on a particular parallel architecture. However, a new
algorithm or architecture requires a substantial amount of parallel-
programming work.

Although our architecture and algorithm-independent approach is not
as efficient as some fine-grained parallelization approaches, it lets
users plug different off-the-shelf learning programs into a parallel and
distributed environment with relative ease and eliminates the need for
expensive parallel hardware.
We are going to use the ADACost algorithm.



SOFTWARE TOOLS
  • ASP .NET

  • Oracle Database


HARDWARE TOOLS
  • Pentium Server with Client

Más contenido relacionado

Más de ncct

Biomedical Wearable Device For Remote Monitoring Ofphysiological Signals
Biomedical Wearable Device For Remote Monitoring Ofphysiological SignalsBiomedical Wearable Device For Remote Monitoring Ofphysiological Signals
Biomedical Wearable Device For Remote Monitoring Ofphysiological Signalsncct
 
Digital Water Marking For Video Piracy Detection
Digital Water Marking For Video Piracy DetectionDigital Water Marking For Video Piracy Detection
Digital Water Marking For Video Piracy Detectionncct
 
Self Repairing Tree Topology Enabling Content Based Routing In Local Area Ne...
Self Repairing Tree Topology Enabling  Content Based Routing In Local Area Ne...Self Repairing Tree Topology Enabling  Content Based Routing In Local Area Ne...
Self Repairing Tree Topology Enabling Content Based Routing In Local Area Ne...ncct
 
Cockpit White Box
Cockpit White BoxCockpit White Box
Cockpit White Boxncct
 
Rail Track Inspector
Rail Track InspectorRail Track Inspector
Rail Track Inspectorncct
 
Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...
Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...
Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...ncct
 
Bot Robo Tanker Sound Detector
Bot Robo  Tanker  Sound DetectorBot Robo  Tanker  Sound Detector
Bot Robo Tanker Sound Detectorncct
 
Distance Protection
Distance ProtectionDistance Protection
Distance Protectionncct
 
Bluetooth Jammer
Bluetooth  JammerBluetooth  Jammer
Bluetooth Jammerncct
 
Crypkit 1
Crypkit 1Crypkit 1
Crypkit 1ncct
 
I E E E 2009 Java Projects
I E E E 2009  Java  ProjectsI E E E 2009  Java  Projects
I E E E 2009 Java Projectsncct
 
B E Projects M C A Projects B
B E  Projects  M C A  Projects  BB E  Projects  M C A  Projects  B
B E Projects M C A Projects Bncct
 
J2 E E Projects, I E E E Projects 2009
J2 E E  Projects,  I E E E  Projects 2009J2 E E  Projects,  I E E E  Projects 2009
J2 E E Projects, I E E E Projects 2009ncct
 
J2 M E Projects, I E E E Projects 2009
J2 M E  Projects,  I E E E  Projects 2009J2 M E  Projects,  I E E E  Projects 2009
J2 M E Projects, I E E E Projects 2009ncct
 
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...
Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...ncct
 
B E M E Projects M C A Projects B
B E  M E  Projects  M C A  Projects  BB E  M E  Projects  M C A  Projects  B
B E M E Projects M C A Projects Bncct
 
I E E E 2009 Java Projects, I E E E 2009 A S P
I E E E 2009  Java  Projects,  I E E E 2009  A S PI E E E 2009  Java  Projects,  I E E E 2009  A S P
I E E E 2009 Java Projects, I E E E 2009 A S Pncct
 
Advantages Of Software Projects N C C T
Advantages Of  Software  Projects  N C C TAdvantages Of  Software  Projects  N C C T
Advantages Of Software Projects N C C Tncct
 
Engineering Projects
Engineering  ProjectsEngineering  Projects
Engineering Projectsncct
 
Software Projects Java Projects Mobile Computing
Software  Projects  Java  Projects  Mobile  ComputingSoftware  Projects  Java  Projects  Mobile  Computing
Software Projects Java Projects Mobile Computingncct
 

Más de ncct (20)

Biomedical Wearable Device For Remote Monitoring Ofphysiological Signals
Biomedical Wearable Device For Remote Monitoring Ofphysiological SignalsBiomedical Wearable Device For Remote Monitoring Ofphysiological Signals
Biomedical Wearable Device For Remote Monitoring Ofphysiological Signals
 
Digital Water Marking For Video Piracy Detection
Digital Water Marking For Video Piracy DetectionDigital Water Marking For Video Piracy Detection
Digital Water Marking For Video Piracy Detection
 
Self Repairing Tree Topology Enabling Content Based Routing In Local Area Ne...
Self Repairing Tree Topology Enabling  Content Based Routing In Local Area Ne...Self Repairing Tree Topology Enabling  Content Based Routing In Local Area Ne...
Self Repairing Tree Topology Enabling Content Based Routing In Local Area Ne...
 
Cockpit White Box
Cockpit White BoxCockpit White Box
Cockpit White Box
 
Rail Track Inspector
Rail Track InspectorRail Track Inspector
Rail Track Inspector
 
Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...
Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...
Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...
 
Bot Robo Tanker Sound Detector
Bot Robo  Tanker  Sound DetectorBot Robo  Tanker  Sound Detector
Bot Robo Tanker Sound Detector
 
Distance Protection
Distance ProtectionDistance Protection
Distance Protection
 
Bluetooth Jammer
Bluetooth  JammerBluetooth  Jammer
Bluetooth Jammer
 
Crypkit 1
Crypkit 1Crypkit 1
Crypkit 1
 
I E E E 2009 Java Projects
I E E E 2009  Java  ProjectsI E E E 2009  Java  Projects
I E E E 2009 Java Projects
 
B E Projects M C A Projects B
B E  Projects  M C A  Projects  BB E  Projects  M C A  Projects  B
B E Projects M C A Projects B
 
J2 E E Projects, I E E E Projects 2009
J2 E E  Projects,  I E E E  Projects 2009J2 E E  Projects,  I E E E  Projects 2009
J2 E E Projects, I E E E Projects 2009
 
J2 M E Projects, I E E E Projects 2009
J2 M E  Projects,  I E E E  Projects 2009J2 M E  Projects,  I E E E  Projects 2009
J2 M E Projects, I E E E Projects 2009
 
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...
Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...
 
B E M E Projects M C A Projects B
B E  M E  Projects  M C A  Projects  BB E  M E  Projects  M C A  Projects  B
B E M E Projects M C A Projects B
 
I E E E 2009 Java Projects, I E E E 2009 A S P
I E E E 2009  Java  Projects,  I E E E 2009  A S PI E E E 2009  Java  Projects,  I E E E 2009  A S P
I E E E 2009 Java Projects, I E E E 2009 A S P
 
Advantages Of Software Projects N C C T
Advantages Of  Software  Projects  N C C TAdvantages Of  Software  Projects  N C C T
Advantages Of Software Projects N C C T
 
Engineering Projects
Engineering  ProjectsEngineering  Projects
Engineering Projects
 
Software Projects Java Projects Mobile Computing
Software  Projects  Java  Projects  Mobile  ComputingSoftware  Projects  Java  Projects  Mobile  Computing
Software Projects Java Projects Mobile Computing
 

Último

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Último (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Distributed Data Mining In Credit Card Fraud Detection

  • 1. DISTRIBUTED DATA MINING IN CREDIT CARD FRAUD DETECTION INTRODUCTION Credit card transactions grow in number, taking a larger share of any country’s payment system and this is turn has led to a higher rate of stolen account numbers and subsequent losses by banks. Hence, improved fraud detection has become essential to maintain the viability of the country’s payment system. Banks have used early fraud warning systems for some years. Large- scale data-mining techniques can improve on the state of the art in commercial practice. Scalable techniques to analyze massive amounts of transaction data that efficiently compute fraud detectors in a timely manner is an important problem, especially for e- commerce. Besides scalability and efficiency, the fraud-detection task exhibits technical problems that include skewed distributions of training data and non-uniform cost per error, both of which have not been widely studied in the knowledge-discovery and datamining community. In this project, a deep survey is made and evaluates a number of techniques that address these three main issues concurrently. Our proposed methods of combining multiple learned fraud detectors under a “cost model” are general and demonstrably useful; our empirical results demonstrate that we can significantly reduce loss due to fraud through distributed data mining of fraud models. DATA MINING AND MACHINE LEARNING The aim of data mining is to extract knowledge from large amounts of data. This knowledge is nontrivial and hidden in the data. Machine learning is often used in data mining.
  • 2. DATA MINING: A DEFINITION Art/Science of uncovering non-trivial, valuable information from a large database Emphasis on: Non-obvious (difficult) Useful (cost vs benefit) Large (automatic) Yet, no rules, provided that the process is efficient in time, space and human resources. Data Mining is the process of finding interesting trends or patterns in large datasets in order to guide future decisions. Related to exploratory data analysis (area of statistics) and knowledge discovery (area in artificial intelligence, machine learning). Data Mining is characterized by having VERY LARGE datasets. DATA MINING VS. MACHINE LEARNING Size: Databases are usually very large so algorithms must scale well Design Purpose: Databases are not usually designed for data mining (but for other purposes), and thus, may not have convenient attributes Errors and Noise: Databases almost always contain errors The aim of machine learning is to adapt to new circumstances, to detect and extrapolate. A distinction can be made between unsupervised and supervised machine learning algorithms.
  • 3. PROPOSED SYSTEM In today’s increasingly electronic society and with the rapid advances of electronic commerce on the Internet, the use of credit cards for purchases has become convenient and necessary. Credit card transactions have become the de facto standard for Internet and Webbased e-commerce. The US government estimates that credit cards accounted for approximately US $13 billion in Internet sales during 1998. This figure is expected to grow rapidly each year. However, the growing number of credit card transactions provides more opportunity for thieves to steal credit card numbers and subsequently commit fraud. When banks lose money because of credit card fraud, cardholders pay for all of that loss through higher interest rates, higher fees, and reduced benefits. Hence, it is in both the banks’ and cardholders’ interest to reduce illegitimate use of credit cards by early fraud detection. For many years, the credit card industry has studied computing models for automated detection systems; recently, these models have been the subject of academic research, especially with respect to e- commerce. The credit card fraud-detection domain presents a number of challenging issues for data mining: There are millions of credit card transactions processed each day. Mining such massive amounts of data requires highly efficient techniques that scale. The data are highly skewed—many more transactions are legitimate than fraudulent. Typical accuracy-based mining techniques can generate highly accurate fraud detectors by simply predicting that all transactions are legitimate, although this is equivalent to not detecting fraud at all.
  • 4. Each transaction record has a different dollar amount and thus has a variable potential loss, rather than a fixed misclassification cost per error type, as is commonly assumed in cost-based mining techniques. Our approach addresses the efficiency and scalability issues in several ways. We divide large data set of labeled transactions (either fraudulent or legitimate) into smaller subsets, apply mining techniques to generate classifiers in parallel, and combine the resultant base models by metalearning from the classifiers’ behavior to generate a metaclassifier. Our approach treats the classifiers as black boxes so that we can employ a variety of learning algorithms. Besides extensibility, combining multiple models computed over all available data produces metaclassifiers that can offset the loss of predictive performance that usually occurs when mining from data subsets or sampling. Furthermore, when we use the learned classifiers (for example, during transaction authorization), the base classifiers can execute in parallel, with the metaclassifier then combining their results. So, our approach is highly efficient in generating these models and also relatively efficient in applying them. Another parallel approach focuses on parallelizing a particular algorithm on a particular parallel architecture. However, a new algorithm or architecture requires a substantial amount of parallel- programming work. Although our architecture and algorithm-independent approach is not as efficient as some fine-grained parallelization approaches, it lets users plug different off-the-shelf learning programs into a parallel and distributed environment with relative ease and eliminates the need for expensive parallel hardware.
  • 5. We are going to use the ADACost algorithm. SOFTWARE TOOLS • ASP .NET • Oracle Database HARDWARE TOOLS • Pentium Server with Client