Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Nitin sharma - Deep Learning Applications to Online Payment Fraud Detection

453 visualizaciones

Publicado el

Deep Learning Applications to Online Payment Fraud Detection

Publicado en: Tecnología
  • FREE TRAINING: "How to Earn a 6-Figure Side-Income Online" ... ★★★
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí
  • Tired of being scammed? Take advantage of a program that, actually makes you money! ▲▲▲
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

Nitin sharma - Deep Learning Applications to Online Payment Fraud Detection

  1. 1. Deep Learning Applications To Online Payments Fraud Detection
  2. 2. Agenda
  3. 3. Part 1 - Problem Background & Motivation
  4. 4. PayPal Ecosystem (1) ©2017 PayPal Inc. Confidential and proprietary. Complex Social Graph of Consumers & Merchants v Establish confidence/trust for millions of account holders to connect and transact in different modes, at scale in markets all over the world. v Personal Accounts v PayPal Personal Account ü Send Money ü Receive Money ü Make Purchases ü Defer Payments (PayPal Credit) v PayPal Mobile App v Business Accounts v Different needs of different users; Collecting funds in exchange of goods/services v Connect at cash registers through Mobile for web- based checkouts, app-based or Credit Card readers • Person unloading goods online • Food Truck Collecting Payments on Tablet • Landscaping Services - payment on phone • Major retailers with checkout flows Where? Online In-store Web Mobile What? Money Transfer Goods Digital Tangible Services Local/Small Scale Retail/Large Scale International US-based Credit Person Business Who? Person Business 1. 2. 3. OR ? Heterogeneous Ecosystem Good User? Fraudster
  5. 5. 2018 Full-Year Statistics $15.45B REVENUE** $578B TOTAL PAYMENT VOLUME1 9.9B TRANSACTIONS2 $227B MOBILE PAYMENT VOLUME 3.7B MOBILE PAYMENT TRANSACTIONS 4 246M* Consumer Accounts 21M* Merchant Accounts • Multiple Countries/Regions • Multiple Currencies • Multiple Funding Instruments PayPal Ecosystem (2) Massive Scale of E-Commerce
  6. 6. Problem Formulation – Fraud Detection • Reliably facilitate large scale e-commerce between buyers and sellers: • Protect the identity of transacting entities • Establish trust between transacting entities • Scale across countries, currencies, products and modes of transaction • Facilitate e-commerce or money exchange swiftly • Boils down to: • Reliably separating good customers from potentially bad ones • Maximize decline of bad transactions or fraudulent entities • Addressing complex fraud patterns across countries, currencies, products and modes of transaction • Addressing temporally evolving fraud patterns on different platforms • Maximize approval of good transactions or legitimate entities • Approve good transactions up-front quickly for best user experience • Reduce False Positives or Good User Declines • Modus Operandi or Behaviors of good and bad customers – What is it? How does it manifest? ©2017 PayPal Inc. Confidential and proprietary. 6 Business Bottomline
  7. 7. Complexity of Risks in PayPal Ecosystem What is it? - Gain unauthorized access to account and transact. - Log in and out but not transact How to Monetize? - Use existing FS to buy goods from legit sellers - Send money to themselves from account - Sell account to others - Attack Prep, Mask with SF, Layering How is identity stolen? - Data Breaches - Phishing - Not Sufficient Funds – Bank Transfers take time, no immediate response (account exists? Has enough balance?) - Collusion (not received or different), Friendly Fraud, Abuse Buyers Buyer Abuse Bounced Check Sellers Consumer Identity/Stolen Financial CreditRisk Fraud Risk Protections Policy Collusion, Mal Intent Bankruptcy Seller Identity Account Take Over Stolen Identity Stolen Financials What is it? - Steal FS (CC/DC or bank) and add to new account. How to Monetize? - Use existing FS to buy goods from legit sellers - Send money to another PP account or bank account - Aging How is financial stolen? - Data Breaches - Phishing Others - Use stolen identity to apply for credit Credit Fraud Credit Risk - Will they pay on time? - Assess Credit-Worthiness - Consumers / Merchants - Allocate Credit Lines - Heavy Regulation in Modeling How different fraud behaviors manifest? ©2017 PayPal Inc. Confidential and proprietary.
  8. 8. Market for Fraudsters Source: SecureWorks Underground Hacker Markets Annual Report April 2016 Credentials Available Online for a Price 17
  9. 9. Sustaining Model Performance ©2017 PayPal Inc. Confidential and proprietary. 9 Performance deteriorates with Time TRAIN TEST Jan Dec April 2016 2017 Conceptual Population P(x, Y) OFFLINE May Nov 2017 LIVE FPR TPR TPR FPR 05/17 07/17 09/17
  10. 10. Time-Varying Ecosystem ©2017 PayPal Inc. Confidential and proprietary. 10 Areas of Improvement • Technology • Gradual Ramp-up of new features or products. • Evolving Fraud (Desktop / Mobile) • Seasonality – Short / Long Why does the population change? DomainFeaturesModel Raw Data from Events + Data + Time Aggregation • Round-about view of Time / Memory • Long/Short term – seasonal distinction • Anomalies • Assumptions with Time-based Manual Feature Engineering • High Dimensionality of Initial Space: • Correlation / Redundancy • How features change with systemic fraud evolution? Feature generation removed from training process • Robust features across time/distribution shifts • What about cross-domain learning? • Discover new features common representation across domains • How can we explicitly also reduce good user declines? • Can we learn from past intelligence? • What could be ways to address class imbalance? Manual Feature Engineering: Traditional ML F(x) • How is Fraud Data Different? • Representation (No Pixel-like consistency) • Temporality (X and Y)
  11. 11. Part 2 – Applications of Deep Learning Architectures
  12. 12. Addressing Class Imbalance • Given the low ratio of fraudulent to legitimate transactions, the modeling context poses class imbalance problem. ©2017 PayPal Inc. Confidential and proprietary. Small Bad to Good Ratio – SMOTE (Chawla et al., 2002) • Introduce synthetic examples along the line segments joining any/all of the k minority class nearest neighbor. • Depending on how much oversampling, neighbors from k NN are randomly chosen. • Take difference between feature vector (sample) and its NN; multiply by URN(0, 1) – add to feature vector under consideration. • Forces decision region of minority class to be more general. • Consider $-value of fraud, high risk regions for sampling bias • Use: • Edited NN – remove instances whose class label differs from majority of its K-NN. • Tomek Links – remove Tomek links (pair of examples which are NN but have different classes); only remove majority class instance. SMOTE and variants ADAptive SYNthetic (ADASYN) Adaptive Neighbor Synthetic (ANS) Border SMOTE Safe-Level SMOTE DBSMOTE TomekLink 1. 2. Weighted Loss Functions
  13. 13. Opportunities for Improvement ©2017 PayPal Inc. Confidential and proprietary. 13 Manual Feature Engineering: The Prologue • Example: Time property • Event-perspective for temporality or rawness. • Event features created BEFORE and independent of model training • Can we learn the function and all underlying complexities from scratch? E10 E9E8 w1 w2 w3 Manual Feature Engineering Constants Time Windows Event Sequence in time order Raw Feature 1 Raw Feature 1 Raw Feature k • Correlation • Redundancy • Always Decay? Representation Learning for Temporality
  14. 14. Temporal Representation Learning Using LSTM ©2017 PayPal Inc. Confidential and proprietary. 14 Event-driven Deep Learning (Yuan et al., 2017) DomainFeatures Raw Data from Events + E10E9E8 w1 w2w3 Raw Data from Events + Features Feature Discovery Using LSTM • LSTM: learn long-term dependencies – leverage long sequences of user behavior (good/bad). • Classify user behaviors given lags of unknown duration between key events (specific fraud behaviors). • Event sequences as input, predict either future sequences or labels. • Use raw event sequences: no restriction on function, time decay. • Features replace manually engineered features based on assumptions. • For LSTM: • Use payment attempt event data (raw features) – all transactions • Replace manually-generated features with less than half of raw features. • Sequence train LSTM architecture using raw features. • Using features from newly discovered feature hierarchies and other features, train another model. • Approximately 7-10% relative increase in performance. Model P1 P2 P3 P4 P5 P6 M3 (LSTM Feature Learning + NN) 1.0747 1.0665 1.0419 1.0720 1.1374 1.1094 E10E9E8 w1 w2w3 Fraud Cells remember event behaviors over arbitrary time intervals • Homogeneous • Heterogeneous
  15. 15. Robust Feature Learning to address Post-Deployment Shifts ©2017 PayPal Inc. Confidential and proprietary. 15 Discover stable feature spaces to boost robustness • Train stacked denoising auto-encoder to reconstruct the input from a corrupted version of it. • Corruption based on past systemic behavior or random; for example: build models that are robust to IP corruption. • Force the hidden-layer to discover more robust features; hence stable models. • Simulates feature shifts/scenarios post-deployment. • Use weights as a choice instead of randomly initializing the weights for a second stage supervised multi-task learning problem. Feature Selection Ensemble Recursive Feature Elimination Training
  16. 16. Multi-Task & Transfer Learning ©2017 PayPal Inc. Confidential and proprietary. Multi-Input Multi-Output Modeling Architectures • Stacked Architecture to learn robust hierarchical features from long term fraud patterns, multi-task cross-domain learners, hard example mining learners: • Iteratively better than learning Ensembles from sub-sampling and then weighting scores linearly. • Cross Stitch Networks (Misra et al., 2016): At each layer learn linear combination of activation maps from each task – next layer filters operate on shared representation … … … … Long-term Feature Learners … … … … Multi-task Feature Learners Short-term MO Specific Models
  17. 17. Model Performance Comparison ©2017 PayPal Inc. Confidential and proprietary. 17 Robust Feature Learning Using Hybridized Architectures Model Monthly (18 m) Weekly (78 w) Std. Deviation Proportion > cut- off1 Proportion > cutoff2 Std. Deviation Proportion > cut- off1 Proportion > cutoff2 M01_AE x 2.50x 1.39x x 2.61x 1.71x BM 1 1.98x 1.33x 0.98x 2.24x 1.28x 1.05x BM 2 2.85x x x 2.26x x x
  18. 18. Reducing Good User Declines (1) • General Objective of a machine learning algorithm: • Find parameters or weights that optimize (minimize, in this context) a loss function. • Loss function measures how far off the prediction is from ground truth • Gradient search is directed in a way to optimize the loss function. • Beyond canned loss functions: • Can a loss function be designed that explicitly penalizes false positives? • Search is then directed to optimize a loss function that: • Minimizes the gap between ground truth and prediction while • Constraining to search spaces where false positives are lower. • For fraud context: • Improve TP or maximize fraud catch rate • While constraining to search spaces where FP or good user decline is lowest. • Caveat in some cases: No free lunch (FP v/s FN) ©2017 PayPal Inc. Confidential and proprietary. 18 Explore DNN search space for solutions – Cost Functions Low FPR Region Optimal Catch Region
  19. 19. Reducing Good User Declines (2) ©2017 PayPal Inc. Confidential and proprietary. 19 Transfer Learning using Generative Modeling Contexts Rejection Region X >= k, Decline Good Users Fraudsters M1 M2 Mk …. What’s the probability of a transaction being fraudulent? What’s the distribution of features that generates fraud? What’s the distribution of features that generates good users who get declined by M1 … Mk?• Deep Autoencoder Learning • Transfer Learning (Feature Learners or prediction override) Decision Boundary Distant Discriminative Generative
  20. 20. Reducing Good User Declines (3) ©2017 PayPal Inc. Confidential and proprietary. 20 Hard Example Mining – Object Detection Good Bad Shrivastava et al., 2016 • Train Model • Freeze -- Identify Hard Examples • Create Minibatch (Different Variations based on segmentation, risky business domain, dollar value of fraud) • Unfreeze and Continue Training – Backpropagate only hard examples P(Y = 1 / X) > k GOOD BAD Freeze Network Unfreeze / Continue Training Create Minibatch • Good Users who got declined • Two passes to identify good users who get declined and then improve classifier to re-classify these hard examples as “good users”.
  21. 21. Model Performance Comparison (Catch v/s FPR) Model* P1 P2 P3 P4 P5 DNN_CFU 1.000 (1.000) 1.000 (1.000) 1.000 (1.000) 1.000 (1.000) 1.000 (1.000) DNN_RRFL 1.0074 (0.9488) 1.0052 (0.9702) 1.0108 (0.9701) 0.9900 (1.0474) 1.0186 (0.8362) DNN_OHEM 1.0131 (0.8279) 1.0141 (0.9007) 1.0229 (0.8856) 1.0141 (0.7905) 1.0342 (0.6595) ©2017 PayPal Inc. Confidential and proprietary. 21 FPR ratios across different methods • Online Hard Example Mining consistently provides low FPR while retaining high catch rate, beats status-quo champion. • Cost-function based optimizers involve locally weighting data batch by batch and need significant tuning – often cause variability in FPR. • Rejection Feature Learning needs further tuning, the current combination is basically a feature learner.
  22. 22. Part 3 - Conclusions
  23. 23. Deep Learning Applications to Fraud Detection • Key Conclusions: • Next step-function increase in performance. • Scale performance robustly to rapidly evolving fraud patterns. • Deep Learning Architectures offer significant performance boost: • Far lesser trade-off between performance & robustness • Performance scales very well with data or better hardware. • No Pre-training Initial Assumptions (legacy): • Learn temporally/systemically robust features while training • Significant reduction in manual Feature Engineering (assumption-driven, static definitions) • Learn cross-domain features -- less domain-centric restriction (segmentation, tagging) • Past intelligence better utilized due to transfer learning and domain adaptation. • Boost catch rate while also reduce good user decline ©2017 PayPal Inc. Confidential and proprietary. 23 Conclusions DomainFeaturesTraining Extent of ML / Restriction Raw Data from Events Traditional ML Performance Stability Sweet Zone Deep Learning Domain Features Training Raw Data from Events DNN Architectures Cross-Domain
  24. 24. References
  25. 25. References [1] Abhinav Shrivastava, Abhinav Gupta and Ross Girshick. "Training Region-based Object Detectors with Online Hard Example Mining," arXiv:1604.03540 [cs.CV], 2016. [2] Ishan Misra, Abhinav Shrivastava, Abhinav Gupta and Martial Hebert. "Cross-stitch Networks for Multi-task Learning," arXiv:1604.03539 [cs.CV], 2016. [3] Dell SecureWorks. 2006. Underground Hacker Markets Annual Report - April 2006. [4] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. "SMOTE: synthetic minority over-sampling technique," arXiv:1106.1813 [cs.AI], 2002. [5] Shuhan Yuan, Panpan Zheng, Xintao Wu and Yang Xiang. "Wikipedia Vandal Early Detection: from User Behavior to User Embedding, " arXiv:1706.00887 [cs.CR], 2017. ©2017 PayPal Inc. Confidential and proprietary. Research Papers