2. 1. AML Problem Statement
2. Set – Up: Basic Idea
3. Results / Success Factors / Next Steps
Footer 2
Content
3. • What is money laundering?
• According to Financial Action Task Force(FATF), the goal of many criminal acts is to generate a profit for the individual or group that carries out the act. Money
laundering is the processing of these criminal proceeds to disguise their illegal origin.
• Corporates
• Many reasons for occurring money laundering such as black money, which earned illegal way, tax evasion like hiding the expenses of the organization.
• Countries
• The United Nations Office on Drugs and Crime (UNODC) estimates that between 2 and 5% of global GDP is laundered each year. That’s between EUR 715 billion
and 1.87 trillion each year.
• In 2009, the estimated global success rate of money laundering controls was a mere 0.2% (according to the UN and US State Department)
• Damages long-term economic development by distorting capital flows and the economy’s international trade (Bartlett, 2002).
Footer 3
Background – Money Laundering
4. • How is it done?
-Alerts are generated (based on predefined scenarios)
-Alerts are distributed to Anti-Money Laundering Officer and are analyzed
-Confirmed alerts are reported to the authorities(Suspicious Activity Reports)
• Business Problem:
-High Number of False Positives
-Cases which are not detected by scenarios
-Subject to human bias
Footer 4
Compliance Overview
5. AML ML PoC Modelling Approach 5
Set-Up
UC 1: Classification
Problem
Supervised
Machine
Learning
MODEL IS TRAINED BASED ON HISTORICAL DATA
1
NORMAL
SUSPICIOU
S
CUSTOMER
S
NEW ALERTS CAN BE CLASSIFIED AUTOMATICALLY
2
ALERT X
SUSPICIOUS
(86% CONFIDENCE)
Alert X
UC 2: Peer Group comparison
Detect outliers within peer groups indicating suspicious
activities with unsupervised Machine Learning
suspicious
normal
suspicious
Supervised
Machine
Learning
• Target group: Alerted customers (Suspicious Activity Report)
Historical Customer,
Account & Transaction data
Unsupervised
Machine
Learning
• Target group: Non-alerted Customers
6. • Instead of ‘transactional view’
• ‘customer view’ has been constructed.
AML ML PoC Modelling Approach 6
Data Construction
Customer Transaction Alert
Feature creation (e.g. transaction aggregations)
Analytical Record on customer level
Database or data table Transformation step
DATA
Standardization and Harmonization
Feature selection
Alert scoring
model
Peer Group
comparison model
… ML model X
Simplified workflow for
Analytical Record creation
Account
7. Use Case Goal:
• Based on a feature set comparing a customer’s behavior with his historic
transaction pattern, data driven peer groups, and specific transaction categories,
outliers are identified by use of unsupervised Machine Learning techniques.
• The goals of the use case is to provide a (color) coding for clients into high-risk,
medium-risk, and low-risk clients, with the goal that
• 50% of clients categorized as high-risk, are confirmed to be ‘relevant’
• 33% of clients categorized as medium-risk should correspond to ‘unusual
transactional behavior’.
• The low-risk category should ideally have no suspicious behavior.
Footer 7
Unsupervised Learning
Modelling Hypothesis and Rationale:
• The modelling hypothesis is that suspicious behavior in the AML context is
correlated with transaction data that strongly deviate from what is "typical" for a
certain set of customers.
8. • To be able to detect customers showing "outlier" behavior specific feature categories
are constructed:
• Customers current transaction profile
• Customers transaction profiles are compared to the average transaction profile of their
peer-groups (industry classification, company form) – homogeneous transactional behavior
• Customers present transaction profile is compared to the transaction profile in the past, to
detect sudden changes in the transaction behavior
Footer 8
Unsupervised Learning
9. The score of each customer is determined by calculating the following metrics for each data point after applying PCA.
Footer 9
Unsupervised Learning
10. • Customers with unexpected behavior will score high in
score and/or orthogonal loss.
• Hence, customers far away from main clusters can
represent potential SARs.
• 1% - Red Zone
• 3% - Yellow Zone
• 96%- Green Zone
Footer 10
Result - Unsupervised Learning
Unsupervised Machine Learning models overcome the draw-back of supervised ML models, which detect only patterns they have been trained on. Unsupervised ML, on the other hand, can detect new AML topologies, where supervised models may fail to highlight suspicious activities.(slide 5)