SlideShare una empresa de Scribd logo
1 de 52
Fraud Analytics
Alejandro Correa Bahnsen, PhD
Data Scientist
About me
• PhD in Machine Learning at Luxembourg University
• Data Scientist at Easy Solutions
• Worked for +8 years as a data scientist at GE Money, Scotiabank
and SIX Financial Services
• Bachelor and Master in Industrial Engineering
• Organizer of the Big Data & Data Science Bogota Meetup
2
About us
Industry recognitionA leading global provider of electronic fraud
prevention for financial institutions and enterprise
customers
280+ customers
In 26 countries
75 million
Users protected
22+ billion
Online connections monitored in
last 12 months
3
Our Approach:Total Fraud Protection®
4
~1Billion USD
~171Millions USD
~3Billions USD
Does fraud affect me?
5
Does fraud affect me?
6
€ -
€ 100
€ 200
€ 300
€ 400
€ 500
€ 600
€ 700
€ 800
2007 2008 2009 2010 2011 2012
Europe fraud evolution
Card not present (Internet) transactions
7
$-
$500
$1,000
$1,500
$2,000
$2,500
$3,000
$3,500
$4,000
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
US fraud evolution
Card not present (Internet) transactions
8
1.10%
1.30%
1.10%
0.90% 0.88% 0.87%
0.09% 0.08% 0.08% 0.06% 0.05% 0.05%
2006 2007 2008 2009 2010 2011
Card Present vs. Card Not Present Fraud Rates
Card Not Present Card Present
23.3
26.8
30.0
33.3
35.0
2009 2010 2011 2012 2013
US Online Banking
Billions of Transactions
1.2
3.0
5.6
9.4
14.0
2009 2010 2011 2012 2013
US Mobile Banking
Billions of Transactions
9
10
La Banca Móvil continúa creciendo mientras los canales
tradicionales pierden usuarios
¿Qué medios usa para realizar operaciones bancarias / consulta de saldo / pagos de servicios
/pago de impuestos u otros pagos o compras
11
Retos de Seguridad en Móviles
12
La principal razón de quienes NO usan Internet para
transacciones o compras es el temor al fraude electrónico
¿Por qué NO USA Internet para realizar operaciones bancarias, pagos o compras?
There is a need for
better fraud
detection strategies
13
14
“War is ninety percent information”
• Napoleon Bonaparte
15
BigData?
16
17
18
Big data (Data Science) is like teenage sex:
everyone talks about it,
nobody really knows how to do it,
everyone thinks everyone else is doing it,
so everyone claims they are doing it...
19
20
21
Man on the Moon
Man on the Moon
Distance: 356,000Km
Never been there
before
Must return to Earth
22
Man on the Moon – Small Data!!
Apollo XI
Speed: 3,500 km/hour
Weight: 13,500kg
Lots of complex data
Computer Program
64kb, 2Kb RAM,
Fortran
Must work the first
time
Apollo XI, 1969
64Kb, 2Kb RAM
23
Man on the Moon – Small Data!!
iphone 6
128GB, 2GB RAM
BigData Analytics
24
BigData Analytics is the
use of methods and
tools of Machine
Learning and Artificial
Intelligence with the
objective making data-
driven decisions
25
Fraud detection
and prevention
26
Estimate the probability of a transaction being fraud based on analyzing
customer patterns and recent fraudulent behavior
Issues when constructing a fraud detection system:
• Skewness of the data
• Cost-sensitivity
• Short time response of the system
• Dimensionality of the search space
• Feature preprocessing
• Model selection
27
Credit card fraud detection
Network
Fraud??
28
• Larger European card processing
company
• 2012 & 2013 card present
transactions
• 20MM Transactions
• 40,000 Frauds
• 0.467% Fraud rate
• ~ 2MM EUR lost due to fraud on
test dataset
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
Test
Train
Data
• “Purpose is to use facts and rules, taken from the knowledge
of many human experts, to help make decisions.”
• Example of rules
• More than 4 ATM transactions in one hour?
• More than 2 transactions in 5 minutes?
• Magnetic stripe transaction then internet transaction?
30
If-Then rules (Expert rules)
1.04%
31%
17%
22%
Miss-cla Recall Precision F1-Score
31
If-Then rules (Expert rules)
Credit card fraud detection is a cost-sensitive problem. As the cost due to a
false positive is different than the cost of a false negative.
• False positives: When predicting a transaction as fraudulent, when in
fact it is not a fraud, there is an administrative cost that is incurred by
the financial institution.
• False negatives: Failing to detect a fraud, the amount of that transaction
is lost.
Moreover, it is not enough to assume a constant cost difference between
false positives and false negatives, as the amount of the transactions varies
quite significantly.
32
Financial evaluation
Cost matrix
𝐶𝑜𝑠𝑡 𝑓 𝑆 =
𝑖=1
𝑁
𝑦𝑖 𝑐𝑖 𝐶 𝑇𝑃 𝑖
+ 1 − 𝑐𝑖 𝐶 𝐹𝑁 𝑖
+ 1 − 𝑦𝑖 𝑐𝑖 𝐶 𝐹𝑃 𝑖
+ 1 − 𝑐𝑖 𝐶 𝑇𝑁 𝑖
33
Actual Positive
𝒚𝒊 = 𝟏
Actual Negative
𝒚𝒊 = 𝟎
Predicted Positive
𝒄𝒊 = 𝟏
𝐶 𝑇𝑃 𝑖
= 𝐶 𝑎 𝐶 𝐹𝑃 𝑖
= 𝐶 𝑎
Predicted Negative
𝒄𝒊 = 𝟎
𝐶 𝐹𝑁 𝑖
= 𝐴𝑚𝑡𝑖 𝐶 𝑇𝑁 𝑖
= 0
Financial evaluation
1.24 €
1.94 €
Cost Total Losses
1.04%
31%
17%
22%
Miss-cla Recall Precision F1-Score
34
If-Then rules (Expert rules)
Fraud Analytics
35
Fraud Analytics is the use of statistical
and mathematical techniques (Machine
Learning) to discover patterns in data in
order to make predictions
Fraud Analytics
Raw features
37
Attribute name Description
Transaction ID Transaction identification number
Time Date and time of the transaction
Account number Identification number of the customer
Card number Identification of the credit card
Transaction type ie. Internet, ATM, POS, ...
Entry mode ie. Chip and pin, magnetic stripe, ...
Amount Amount of the transaction in Euros
Merchant code Identification of the merchant type
Merchant group Merchant group identification
Country Country of trx
Country 2 Country of residence
Type of card ie. Visa debit, Mastercard, American Express...
Gender Gender of the card holder
Age Card holder age
Bank Issuer bank of the card
Features
Transaction aggregation strategy
38
Raw Features
TrxId Time Type Country Amt
1 1/1 18:20 POS Lux 250
2 1/1 20:35 POS Lux 400
3 1/1 22:30 ATM Lux 250
4 2/1 00:50 POS Ger 50
5 2/1 19:18 POS Ger 100
6 2/1 23:45 POS Ger 150
7 3/1 06:00 POS Lux 10
Aggregated Features
No Trx
last 24h
Amt last
24h
No Trx
last 24h
same
type and
country
Amt last
24h same
type and
country
0 0 0 0
1 250 1 250
2 650 0 0
3 900 0 0
3 700 1 50
2 150 2 150
3 400 0 0
Features
When is a customer expected to
make a new transaction?
Considering a von Mises
distribution with a period of 24
hours such that
𝑃(𝑡𝑖𝑚𝑒) ~ 𝑣𝑜𝑛𝑚𝑖𝑠𝑒𝑠 𝜇, 𝜎
=
𝑒 𝜎𝑐𝑜𝑠(𝑡𝑖𝑚𝑒−𝜇)
2𝜋𝐼0 𝜎
where 𝝁 is the mean, 𝝈 is the standard
deviation, and 𝑰 𝟎 is the Bessel function
39
Periodic features
40
Periodic features
Amountofthetransaction
Number of transactions last day
Normal Transaction
Fraud
41
42
Amountofthetransaction
Number of transactions last day
Normal Transaction
Fraud
43
Amount of the transaction
Normal Transaction
Fraud
Number of transactions last dayNumber of ATM transactions
last week
Fraud Analytics
Algorithms
Fuzzy Rules
Neural Nets
Naive Bayes
*Random Forests
RF – with Cost-Proportionate
Rejection Sampling
*Cost-Sensitive Random Patches
Decision Trees
44
45
Decision Trees
X1=Amountofthetransaction
X2= Number of transactions last day
A decision tree is a classification model that iteratively creates binary
decision rules that maximize certain criteria (Gini, entropy, …).
Initial
Node
X2<10 X2≥10
X1<100
X1<50
X2<15 X2≥15
X1≥50
X1≥100
A Random Forest is made by combining many different decision trees. Each
one trained on a random subset of the initial dataset
46
Random Forests
47
Random Forests & Random Patches
1
2
3
4
5
6
7
8
8
6
2
5
2
1
3
6
1
5
8
1
4
4
2
1
9
4
6
1
1
5
8
1
4
4
2
1
1
5
8
1
4
4
2
1
1
5
8
1
4
4
2
1
Bagging Random forest Random patches
Training set
48
Cost-Sensitive Decision Trees
• Standard decision trees create rules
that maximize either the Gini or the
entropy measures
• However this assumes that all
misclassification errors carry the same
cost
• Not true in fraud detection
• Instead the cost-sensitive decision tree
minimizes the cost of each rule
𝐶𝑜𝑠𝑡 𝑓 𝑛𝑜𝑑𝑒
Initial
Node
X2<10 X2≥10
X1<100
X1<50
X2<15 X2≥15
X1≥50
X1≥100
0%
20%
40%
60%
80%
100%
Expert
Rules
Fuzzy
Rules
Neural
Nets
Naïve
Bayes
Random
Forests
RF - CP
Random
Sampling
CS
Random
Patches
% Savings % Frauds
49
• Fraud Analytics (ML) models are significantly
better than expert rules
• Models should be evaluated taking into
account real financial costs of the application
• Algorithms should be developed to
incorporate those financial costs
Conclusions
50
51
Questions?
Alejandro Correa Bahnsen, PhD
Data Scientist
acorrea@Easysol.net
52

Más contenido relacionado

La actualidad más candente

How to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4jHow to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4jNeo4j
 
Introduction to Data Visualization
Introduction to Data Visualization Introduction to Data Visualization
Introduction to Data Visualization Ana Jofre
 
Credit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksCredit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksDatabricks
 
Ai and using ml in mobile apps
Ai and using ml in mobile appsAi and using ml in mobile apps
Ai and using ml in mobile appsParamvir Singh
 
The ANTI-MONEYLAUNDERING LEGAL FRAMEWORK
The ANTI-MONEYLAUNDERING LEGAL FRAMEWORK The ANTI-MONEYLAUNDERING LEGAL FRAMEWORK
The ANTI-MONEYLAUNDERING LEGAL FRAMEWORK Melissa Cammarata
 
Data Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopData Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopJSI
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To AnalyticsAlex Meadows
 
Global AI Conference Presentation - Machine Learning for SMB
Global AI Conference Presentation - Machine Learning for SMBGlobal AI Conference Presentation - Machine Learning for SMB
Global AI Conference Presentation - Machine Learning for SMBRichard Jolly, PhD
 
Core Banking Sharing: Finacle on AWS
Core Banking Sharing: Finacle on AWS Core Banking Sharing: Finacle on AWS
Core Banking Sharing: Finacle on AWS Amazon Web Services
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data VisualizationRaffael Marty
 
A Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine LearningA Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine Learningijtsrd
 
Fraud Detection presentation
Fraud Detection presentationFraud Detection presentation
Fraud Detection presentationHernan Huwyler
 
Automated anti money laundering using artificial intelligence and machine lea...
Automated anti money laundering using artificial intelligence and machine lea...Automated anti money laundering using artificial intelligence and machine lea...
Automated anti money laundering using artificial intelligence and machine lea...Santhosh L
 
Demystifying Open Banking
Demystifying Open BankingDemystifying Open Banking
Demystifying Open Bankingaccenture
 
Enterprise Fraud Management: How Banks Need to Adapt
Enterprise Fraud Management: How Banks Need to AdaptEnterprise Fraud Management: How Banks Need to Adapt
Enterprise Fraud Management: How Banks Need to AdaptCapgemini
 

La actualidad más candente (20)

How to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4jHow to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4j
 
Introduction to Data Visualization
Introduction to Data Visualization Introduction to Data Visualization
Introduction to Data Visualization
 
Credit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksCredit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In Databricks
 
Artificial Intelligence in Banking
Artificial Intelligence in BankingArtificial Intelligence in Banking
Artificial Intelligence in Banking
 
Ai and using ml in mobile apps
Ai and using ml in mobile appsAi and using ml in mobile apps
Ai and using ml in mobile apps
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Trends in AML Compliance
Trends in AML ComplianceTrends in AML Compliance
Trends in AML Compliance
 
AI in Fintech
AI in FintechAI in Fintech
AI in Fintech
 
The ANTI-MONEYLAUNDERING LEGAL FRAMEWORK
The ANTI-MONEYLAUNDERING LEGAL FRAMEWORK The ANTI-MONEYLAUNDERING LEGAL FRAMEWORK
The ANTI-MONEYLAUNDERING LEGAL FRAMEWORK
 
Data Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopData Visualization Design Best Practices Workshop
Data Visualization Design Best Practices Workshop
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To Analytics
 
Global AI Conference Presentation - Machine Learning for SMB
Global AI Conference Presentation - Machine Learning for SMBGlobal AI Conference Presentation - Machine Learning for SMB
Global AI Conference Presentation - Machine Learning for SMB
 
Core Banking Sharing: Finacle on AWS
Core Banking Sharing: Finacle on AWS Core Banking Sharing: Finacle on AWS
Core Banking Sharing: Finacle on AWS
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
A Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine LearningA Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine Learning
 
Fraud Detection presentation
Fraud Detection presentationFraud Detection presentation
Fraud Detection presentation
 
Automated anti money laundering using artificial intelligence and machine lea...
Automated anti money laundering using artificial intelligence and machine lea...Automated anti money laundering using artificial intelligence and machine lea...
Automated anti money laundering using artificial intelligence and machine lea...
 
Demystifying Open Banking
Demystifying Open BankingDemystifying Open Banking
Demystifying Open Banking
 
Atm
AtmAtm
Atm
 
Enterprise Fraud Management: How Banks Need to Adapt
Enterprise Fraud Management: How Banks Need to AdaptEnterprise Fraud Management: How Banks Need to Adapt
Enterprise Fraud Management: How Banks Need to Adapt
 

Destacado

2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practiceAlejandro Correa Bahnsen, PhD
 
Fraud analytics detección y prevención de fraudes en la era del big data sl...
Fraud analytics detección y prevención de fraudes en la era del big data   sl...Fraud analytics detección y prevención de fraudes en la era del big data   sl...
Fraud analytics detección y prevención de fraudes en la era del big data sl...Alejandro Correa Bahnsen, PhD
 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionAlejandro Correa Bahnsen, PhD
 
Classifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural NetworksClassifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural NetworksAlejandro Correa Bahnsen, PhD
 
Maximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningMaximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningAlejandro Correa Bahnsen, PhD
 
PhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationPhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationAlejandro Correa Bahnsen, PhD
 
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Alejandro Correa Bahnsen, PhD
 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesAlejandro Correa Bahnsen, PhD
 

Destacado (13)

2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice
 
Fraud analytics detección y prevención de fraudes en la era del big data sl...
Fraud analytics detección y prevención de fraudes en la era del big data   sl...Fraud analytics detección y prevención de fraudes en la era del big data   sl...
Fraud analytics detección y prevención de fraudes en la era del big data sl...
 
Analytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacionAnalytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacion
 
1609 Fraud Data Science
1609 Fraud Data Science1609 Fraud Data Science
1609 Fraud Data Science
 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
 
Classifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural NetworksClassifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural Networks
 
Maximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningMaximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learning
 
2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycle2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycle
 
Modern Data Science
Modern Data ScienceModern Data Science
Modern Data Science
 
PhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationPhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive Classification
 
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
 
Demystifying machine learning using lime
Demystifying machine learning using limeDemystifying machine learning using lime
Demystifying machine learning using lime
 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slides
 

Similar a Fraud Detection with Cost-Sensitive Predictive Analytics

ACAMS NY Chapter Presentation - C-AML: Exploring the New Frontier of Crypto-AML
ACAMS NY Chapter Presentation -  C-AML: Exploring the New Frontier of Crypto-AMLACAMS NY Chapter Presentation -  C-AML: Exploring the New Frontier of Crypto-AML
ACAMS NY Chapter Presentation - C-AML: Exploring the New Frontier of Crypto-AMLMadeline Ross
 
Fraud Detection in Real-time @ Apache Big Data con
Fraud Detection in Real-time @ Apache Big Data conFraud Detection in Real-time @ Apache Big Data con
Fraud Detection in Real-time @ Apache Big Data conSeshika Fernando
 
Fraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data ConFraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data ConSeshika Fernando
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersBrian Griffith
 
How AI is preventing account fraud at web scale
How AI is preventing account fraud at web scaleHow AI is preventing account fraud at web scale
How AI is preventing account fraud at web scaleAmir Moghimi
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Fighting Financial Crime with Artificial Intelligence
Fighting Financial Crime with Artificial IntelligenceFighting Financial Crime with Artificial Intelligence
Fighting Financial Crime with Artificial IntelligenceDataWorks Summit
 
Data analysis for credit card fraud detection.pptx
Data analysis for credit card fraud detection.pptxData analysis for credit card fraud detection.pptx
Data analysis for credit card fraud detection.pptxKRNL1
 
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...Bernhard Haslhofer
 
Innovationstag Digital Banking Liechtenstein 2016
Innovationstag Digital Banking Liechtenstein 2016Innovationstag Digital Banking Liechtenstein 2016
Innovationstag Digital Banking Liechtenstein 2016Roman Dinkel
 
Next Generation Fraud Solutions using Neo4j
Next Generation Fraud Solutions using Neo4jNext Generation Fraud Solutions using Neo4j
Next Generation Fraud Solutions using Neo4jNeo4j
 
Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"
Liubomyr Bregman  "Financial Crime Detection using Advanced Analytics"Liubomyr Bregman  "Financial Crime Detection using Advanced Analytics"
Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"Lviv Startup Club
 
Machine learning for bestt group - 20170714
Machine learning for bestt group - 20170714Machine learning for bestt group - 20170714
Machine learning for bestt group - 20170714IBM Thailand Co Ltd
 
Cybercrime, Digital Investigation and Public Private Partnership by Francesca...
Cybercrime, Digital Investigation and Public Private Partnership by Francesca...Cybercrime, Digital Investigation and Public Private Partnership by Francesca...
Cybercrime, Digital Investigation and Public Private Partnership by Francesca...Tech and Law Center
 
Understanding the Card Fraud Lifecycle : A Guide For Private Label Issuers
Understanding the Card Fraud Lifecycle :  A Guide For Private Label IssuersUnderstanding the Card Fraud Lifecycle :  A Guide For Private Label Issuers
Understanding the Card Fraud Lifecycle : A Guide For Private Label IssuersChristopher Uriarte
 
2018 oct executive_forum_sysman_214
2018 oct executive_forum_sysman_2142018 oct executive_forum_sysman_214
2018 oct executive_forum_sysman_214Alex Petrov
 
Graphs for Finance - A technological background
Graphs for Finance - A technological backgroundGraphs for Finance - A technological background
Graphs for Finance - A technological backgroundNeo4j
 
credit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstractcredit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstractVenkat Projects
 
Abuse prevention in the globally distributed economy presentation
Abuse prevention in the globally distributed economy presentationAbuse prevention in the globally distributed economy presentation
Abuse prevention in the globally distributed economy presentationJustin Dorfman
 

Similar a Fraud Detection with Cost-Sensitive Predictive Analytics (20)

ACAMS NY Chapter Presentation - C-AML: Exploring the New Frontier of Crypto-AML
ACAMS NY Chapter Presentation -  C-AML: Exploring the New Frontier of Crypto-AMLACAMS NY Chapter Presentation -  C-AML: Exploring the New Frontier of Crypto-AML
ACAMS NY Chapter Presentation - C-AML: Exploring the New Frontier of Crypto-AML
 
Fraud Detection in Real-time @ Apache Big Data con
Fraud Detection in Real-time @ Apache Big Data conFraud Detection in Real-time @ Apache Big Data con
Fraud Detection in Real-time @ Apache Big Data con
 
Fraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data ConFraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data Con
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
 
How AI is preventing account fraud at web scale
How AI is preventing account fraud at web scaleHow AI is preventing account fraud at web scale
How AI is preventing account fraud at web scale
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Fighting Financial Crime with Artificial Intelligence
Fighting Financial Crime with Artificial IntelligenceFighting Financial Crime with Artificial Intelligence
Fighting Financial Crime with Artificial Intelligence
 
Data analysis for credit card fraud detection.pptx
Data analysis for credit card fraud detection.pptxData analysis for credit card fraud detection.pptx
Data analysis for credit card fraud detection.pptx
 
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
 
Innovationstag Digital Banking Liechtenstein 2016
Innovationstag Digital Banking Liechtenstein 2016Innovationstag Digital Banking Liechtenstein 2016
Innovationstag Digital Banking Liechtenstein 2016
 
Next Generation Fraud Solutions using Neo4j
Next Generation Fraud Solutions using Neo4jNext Generation Fraud Solutions using Neo4j
Next Generation Fraud Solutions using Neo4j
 
Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"
Liubomyr Bregman  "Financial Crime Detection using Advanced Analytics"Liubomyr Bregman  "Financial Crime Detection using Advanced Analytics"
Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"
 
Machine learning for bestt group - 20170714
Machine learning for bestt group - 20170714Machine learning for bestt group - 20170714
Machine learning for bestt group - 20170714
 
Cybercrime, Digital Investigation and Public Private Partnership by Francesca...
Cybercrime, Digital Investigation and Public Private Partnership by Francesca...Cybercrime, Digital Investigation and Public Private Partnership by Francesca...
Cybercrime, Digital Investigation and Public Private Partnership by Francesca...
 
Understanding the Card Fraud Lifecycle : A Guide For Private Label Issuers
Understanding the Card Fraud Lifecycle :  A Guide For Private Label IssuersUnderstanding the Card Fraud Lifecycle :  A Guide For Private Label Issuers
Understanding the Card Fraud Lifecycle : A Guide For Private Label Issuers
 
2018 oct executive_forum_sysman_214
2018 oct executive_forum_sysman_2142018 oct executive_forum_sysman_214
2018 oct executive_forum_sysman_214
 
Graphs for Finance - A technological background
Graphs for Finance - A technological backgroundGraphs for Finance - A technological background
Graphs for Finance - A technological background
 
credit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstractcredit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstract
 
Abuse prevention in the globally distributed economy presentation
Abuse prevention in the globally distributed economy presentationAbuse prevention in the globally distributed economy presentation
Abuse prevention in the globally distributed economy presentation
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 

Más de Alejandro Correa Bahnsen, PhD

Más de Alejandro Correa Bahnsen, PhD (6)

black hat deephish
black hat deephishblack hat deephish
black hat deephish
 
DeepPhish: Simulating malicious AI
DeepPhish: Simulating malicious AIDeepPhish: Simulating malicious AI
DeepPhish: Simulating malicious AI
 
AI vs. AI: Can Predictive Models Stop the Tide of Hacker AI?
AI vs. AI: Can Predictive Models Stop the Tide of Hacker AI?AI vs. AI: Can Predictive Models Stop the Tide of Hacker AI?
AI vs. AI: Can Predictive Models Stop the Tide of Hacker AI?
 
How I Learned to Stop Worrying and Love Building Data Products
How I Learned to Stop Worrying and Love Building Data ProductsHow I Learned to Stop Worrying and Love Building Data Products
How I Learned to Stop Worrying and Love Building Data Products
 
Fraud Detection by Stacking Cost-Sensitive Decision Trees
Fraud Detection by Stacking Cost-Sensitive Decision TreesFraud Detection by Stacking Cost-Sensitive Decision Trees
Fraud Detection by Stacking Cost-Sensitive Decision Trees
 
2012 predictive clusters
2012 predictive clusters2012 predictive clusters
2012 predictive clusters
 

Último

Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 

Último (20)

Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 

Fraud Detection with Cost-Sensitive Predictive Analytics

  • 1. Fraud Analytics Alejandro Correa Bahnsen, PhD Data Scientist
  • 2. About me • PhD in Machine Learning at Luxembourg University • Data Scientist at Easy Solutions • Worked for +8 years as a data scientist at GE Money, Scotiabank and SIX Financial Services • Bachelor and Master in Industrial Engineering • Organizer of the Big Data & Data Science Bogota Meetup 2
  • 3. About us Industry recognitionA leading global provider of electronic fraud prevention for financial institutions and enterprise customers 280+ customers In 26 countries 75 million Users protected 22+ billion Online connections monitored in last 12 months 3
  • 4. Our Approach:Total Fraud Protection® 4
  • 5. ~1Billion USD ~171Millions USD ~3Billions USD Does fraud affect me? 5
  • 7. € - € 100 € 200 € 300 € 400 € 500 € 600 € 700 € 800 2007 2008 2009 2010 2011 2012 Europe fraud evolution Card not present (Internet) transactions 7
  • 8. $- $500 $1,000 $1,500 $2,000 $2,500 $3,000 $3,500 $4,000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 US fraud evolution Card not present (Internet) transactions 8
  • 9. 1.10% 1.30% 1.10% 0.90% 0.88% 0.87% 0.09% 0.08% 0.08% 0.06% 0.05% 0.05% 2006 2007 2008 2009 2010 2011 Card Present vs. Card Not Present Fraud Rates Card Not Present Card Present 23.3 26.8 30.0 33.3 35.0 2009 2010 2011 2012 2013 US Online Banking Billions of Transactions 1.2 3.0 5.6 9.4 14.0 2009 2010 2011 2012 2013 US Mobile Banking Billions of Transactions 9
  • 10. 10 La Banca Móvil continúa creciendo mientras los canales tradicionales pierden usuarios ¿Qué medios usa para realizar operaciones bancarias / consulta de saldo / pagos de servicios /pago de impuestos u otros pagos o compras
  • 11. 11 Retos de Seguridad en Móviles
  • 12. 12 La principal razón de quienes NO usan Internet para transacciones o compras es el temor al fraude electrónico ¿Por qué NO USA Internet para realizar operaciones bancarias, pagos o compras?
  • 13. There is a need for better fraud detection strategies 13
  • 14. 14
  • 15. “War is ninety percent information” • Napoleon Bonaparte 15
  • 17. 17
  • 18. 18
  • 19. Big data (Data Science) is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it... 19
  • 20. 20
  • 22. Man on the Moon Distance: 356,000Km Never been there before Must return to Earth 22 Man on the Moon – Small Data!! Apollo XI Speed: 3,500 km/hour Weight: 13,500kg Lots of complex data Computer Program 64kb, 2Kb RAM, Fortran Must work the first time
  • 23. Apollo XI, 1969 64Kb, 2Kb RAM 23 Man on the Moon – Small Data!! iphone 6 128GB, 2GB RAM
  • 25. BigData Analytics is the use of methods and tools of Machine Learning and Artificial Intelligence with the objective making data- driven decisions 25
  • 27. Estimate the probability of a transaction being fraud based on analyzing customer patterns and recent fraudulent behavior Issues when constructing a fraud detection system: • Skewness of the data • Cost-sensitivity • Short time response of the system • Dimensionality of the search space • Feature preprocessing • Model selection 27 Credit card fraud detection
  • 29. • Larger European card processing company • 2012 & 2013 card present transactions • 20MM Transactions • 40,000 Frauds • 0.467% Fraud rate • ~ 2MM EUR lost due to fraud on test dataset Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan Test Train Data
  • 30. • “Purpose is to use facts and rules, taken from the knowledge of many human experts, to help make decisions.” • Example of rules • More than 4 ATM transactions in one hour? • More than 2 transactions in 5 minutes? • Magnetic stripe transaction then internet transaction? 30 If-Then rules (Expert rules)
  • 31. 1.04% 31% 17% 22% Miss-cla Recall Precision F1-Score 31 If-Then rules (Expert rules)
  • 32. Credit card fraud detection is a cost-sensitive problem. As the cost due to a false positive is different than the cost of a false negative. • False positives: When predicting a transaction as fraudulent, when in fact it is not a fraud, there is an administrative cost that is incurred by the financial institution. • False negatives: Failing to detect a fraud, the amount of that transaction is lost. Moreover, it is not enough to assume a constant cost difference between false positives and false negatives, as the amount of the transactions varies quite significantly. 32 Financial evaluation
  • 33. Cost matrix 𝐶𝑜𝑠𝑡 𝑓 𝑆 = 𝑖=1 𝑁 𝑦𝑖 𝑐𝑖 𝐶 𝑇𝑃 𝑖 + 1 − 𝑐𝑖 𝐶 𝐹𝑁 𝑖 + 1 − 𝑦𝑖 𝑐𝑖 𝐶 𝐹𝑃 𝑖 + 1 − 𝑐𝑖 𝐶 𝑇𝑁 𝑖 33 Actual Positive 𝒚𝒊 = 𝟏 Actual Negative 𝒚𝒊 = 𝟎 Predicted Positive 𝒄𝒊 = 𝟏 𝐶 𝑇𝑃 𝑖 = 𝐶 𝑎 𝐶 𝐹𝑃 𝑖 = 𝐶 𝑎 Predicted Negative 𝒄𝒊 = 𝟎 𝐶 𝐹𝑁 𝑖 = 𝐴𝑚𝑡𝑖 𝐶 𝑇𝑁 𝑖 = 0 Financial evaluation
  • 34. 1.24 € 1.94 € Cost Total Losses 1.04% 31% 17% 22% Miss-cla Recall Precision F1-Score 34 If-Then rules (Expert rules)
  • 36. Fraud Analytics is the use of statistical and mathematical techniques (Machine Learning) to discover patterns in data in order to make predictions Fraud Analytics
  • 37. Raw features 37 Attribute name Description Transaction ID Transaction identification number Time Date and time of the transaction Account number Identification number of the customer Card number Identification of the credit card Transaction type ie. Internet, ATM, POS, ... Entry mode ie. Chip and pin, magnetic stripe, ... Amount Amount of the transaction in Euros Merchant code Identification of the merchant type Merchant group Merchant group identification Country Country of trx Country 2 Country of residence Type of card ie. Visa debit, Mastercard, American Express... Gender Gender of the card holder Age Card holder age Bank Issuer bank of the card Features
  • 38. Transaction aggregation strategy 38 Raw Features TrxId Time Type Country Amt 1 1/1 18:20 POS Lux 250 2 1/1 20:35 POS Lux 400 3 1/1 22:30 ATM Lux 250 4 2/1 00:50 POS Ger 50 5 2/1 19:18 POS Ger 100 6 2/1 23:45 POS Ger 150 7 3/1 06:00 POS Lux 10 Aggregated Features No Trx last 24h Amt last 24h No Trx last 24h same type and country Amt last 24h same type and country 0 0 0 0 1 250 1 250 2 650 0 0 3 900 0 0 3 700 1 50 2 150 2 150 3 400 0 0 Features
  • 39. When is a customer expected to make a new transaction? Considering a von Mises distribution with a period of 24 hours such that 𝑃(𝑡𝑖𝑚𝑒) ~ 𝑣𝑜𝑛𝑚𝑖𝑠𝑒𝑠 𝜇, 𝜎 = 𝑒 𝜎𝑐𝑜𝑠(𝑡𝑖𝑚𝑒−𝜇) 2𝜋𝐼0 𝜎 where 𝝁 is the mean, 𝝈 is the standard deviation, and 𝑰 𝟎 is the Bessel function 39 Periodic features
  • 41. Amountofthetransaction Number of transactions last day Normal Transaction Fraud 41
  • 42. 42 Amountofthetransaction Number of transactions last day Normal Transaction Fraud
  • 43. 43 Amount of the transaction Normal Transaction Fraud Number of transactions last dayNumber of ATM transactions last week
  • 44. Fraud Analytics Algorithms Fuzzy Rules Neural Nets Naive Bayes *Random Forests RF – with Cost-Proportionate Rejection Sampling *Cost-Sensitive Random Patches Decision Trees 44
  • 45. 45 Decision Trees X1=Amountofthetransaction X2= Number of transactions last day A decision tree is a classification model that iteratively creates binary decision rules that maximize certain criteria (Gini, entropy, …). Initial Node X2<10 X2≥10 X1<100 X1<50 X2<15 X2≥15 X1≥50 X1≥100
  • 46. A Random Forest is made by combining many different decision trees. Each one trained on a random subset of the initial dataset 46 Random Forests
  • 47. 47 Random Forests & Random Patches 1 2 3 4 5 6 7 8 8 6 2 5 2 1 3 6 1 5 8 1 4 4 2 1 9 4 6 1 1 5 8 1 4 4 2 1 1 5 8 1 4 4 2 1 1 5 8 1 4 4 2 1 Bagging Random forest Random patches Training set
  • 48. 48 Cost-Sensitive Decision Trees • Standard decision trees create rules that maximize either the Gini or the entropy measures • However this assumes that all misclassification errors carry the same cost • Not true in fraud detection • Instead the cost-sensitive decision tree minimizes the cost of each rule 𝐶𝑜𝑠𝑡 𝑓 𝑛𝑜𝑑𝑒 Initial Node X2<10 X2≥10 X1<100 X1<50 X2<15 X2≥15 X1≥50 X1≥100
  • 50. • Fraud Analytics (ML) models are significantly better than expert rules • Models should be evaluated taking into account real financial costs of the application • Algorithms should be developed to incorporate those financial costs Conclusions 50
  • 51. 51
  • 52. Questions? Alejandro Correa Bahnsen, PhD Data Scientist acorrea@Easysol.net 52

Notas del editor

  1. Analytics at work. Davenport 2010.
  2. En 2015, el Internet y la tecnología móvil han solidificado su estatus en Latinoamérica como los canales más populares para operaciones bancarias, pagos y compras. Las oficinas bancarias continúan perdiendo uso y menos del 30% de los usuarios utilizan regularmente canales tradicionales como cajeros automáticos o sistemas de audio-respuesta. Está claro que los usuarios de transacciones en Internet muestran una clara preferencia por eliminar el efectivo de sus transacciones tanto como les sea posible, y el uso de tarjetas de crédito parece seguir esta tendencia debido a que los usuarios cada vez más prefieren manejar sus operaciones en computadoras y dispositivos móviles. Si bien el uso de dispositivos móviles para realizar operaciones financieras continúa creciendo, aún existe resistencia de parte de los usuarios para utilizar estos dispositivos de la misma forma que sus computadoras, incluso siendo más convenientes al poderlos llevar a todas partes. Internet se mantiene como el canal más frecuentemente usado, con un promedio de uso por persona de 3.8 veces por mes. Anécdota de bancos en Colombia y sus filas.
  3. Y los usuarios están en lo cierto al ser tan precavidos. Un estudio conducido por la empresa Arxan Technologies dice que un 95% de las principales aplicaciones móviles financieras para Android (y 70% de las de iOS) han sido hackeadas. En 2014, Trend Micro encontró que el 77% de las 50 aplicaciones gratuitas más descargadas de Google Play tenian versiones falsas, haciendo muy difícil para los usuarios detectar cuál es de ellas son auténticas o fraudulentas.
  4. El análisis de la visión y opiniones de aquellos que regularmente utilizan la Internet para operaciones bancarias y compras es de gran importancia a la hora de diseñar una estrategia que intente aprovechar todo el potencial que este canal ofrece. No obstante, es de igual importancia el examinar aquellos usuarios que debido a una variedad de razones no utilizan Internet con propósitos de finanzas o comercio electrónico. La principal razón mencionada por estos usuarios para no tomar ventaja de los servicios bancarios online fue el miedo al fraude electrónico. Si consideramos que los portales de banca online ofrecen mayor conveniencia a los usuarios y sus menores costos de operación benefician a las instituciones, entonces es imperativo que los bancos continúen investigando formas de promover la adopción de canales de banca electrónica. La prevención del fraude en estos canales no es sólo una forma de prevenir pérdidas económicas y proteger la reputación de las instituciones, una fuerte protección contra fraude también puede hacer que usuarios antes escépticos, adquieran la confianza necesaria para incorporar estos canales a su rutina bancaria normal, y que los bancos con tasas de adopción más altas obtengan una ventaja competitiva más amplia.
  5. Analytics at work. Davenport 2010.
  6. http://tagul.com/
  7. The famous French general didn’t even live the information age, and yet he attributed most of his military success to having the right information. When you’re battling for a competitive advantage in business, analytics data can be equally important to your success.
  8. http://www.kurzweilai.net/googles-self-driving-car-gathers-nearly-1-gbsec
  9. http://www.visualnews.com/2012/06/19/how-much-data-created-every-minute/?view=infographic
  10. http://www.visualnews.com/2012/06/19/how-much-data-created-every-minute/?view=infographic
  11. http://www.visualnews.com/2012/06/19/how-much-data-created-every-minute/?view=infographic
  12. http://www.visualnews.com/2012/06/19/how-much-data-created-every-minute/?view=infographic
  13. The famous French general didn’t even live the information age, and yet he attributed most of his military success to having the right information. When you’re battling for a competitive advantage in business, analytics data can be equally important to your success.