2. Agenda
• Fraud detection and risk management use cases
• Why ML?
• Intro to the AWS ML stack
• Natural Language Processing (NLP) approaches
• How FINRA uses NLP to accelerate investigations
• Fraud Classification Modeling Approach
• How Intuit decreased model development time by 90%
• Demo – build a credit card fraud detection model
4. Example Use-case: CC Transaction Fraud
• Goal – determine if transaction is
valid or fraudulent based on details:
• Date and time
• $$$
• Geo location
• Transaction description
• Gas Station, Electronics Store, etc.
• Historical data
Approve Deny Manual Review
Potential Actions:
5. Rules-based Approach to CC Fraud Detection
Unrecognized Location?
…
Card Reported Stolen?
Over Limit?
Deny
Approve
Yes
Yes
Yes
7. Solution: Machine Learning Approach
Understand
your data
Algorithmically
discover hidden
patterns
Generalize to
make decisions
on new data
Adapt to
Changing Data
Make
predictions
8. The Amazon ML stack: Broadest & deepest set of capabilities
Amazon
Rekognition
Amazon
Textract
Amazon
Polly
Amazon
Transcribe
Amazon
Translate
Amazon
Comprehend
Amazon
Lex
Amazon
Forecast
Amazon
Personalize
Amazon SageMaker
Amazon
EC2
P3 & P3N
Amazon
EC2
C5
Amazon
EC2
FPGAs
Amazon
Greengrass
Amazon
Elastic
Inference
Frameworks Interfaces
Build Train Deploy
Pre-build algorithms & notebooks
Data labeling (Ground Truth)
Algorithm & models (AWS
Marketplace for Machine Learning)
One-click model training & tuning
Optimization (NEO)
Reinforcement learning
One-click deployment & hosting
Vision Speech Language ChatbotsForecasting Recommendations
AI Services
Platform
Services
Frameworks &
Infrastructure
Infrastructure
10. of
storage
events per day
30+pb135 Billion
Up to
Monitoring 99% Equities
& 65% Options in the US
Reconstructing
Trillions
of Market Nodes & Edges
Investor
protection
Market
integrity
11. FINRA Challenge: Unstructured Data
• High volumes of unstructured content
• ~1M documents each year from stock brokers
and investors to be reviewed
• Documents contain incredibly useful
information but mining is a challenge
• Complex cases might have 1000’s of
associated records
• Numerous features of interest
• Finding information about the Who, What,
Where, When and How is labor intensive
Documents
Reference Data
Forms
Free Style Text
13. NLP to Accelerate Investigations
John Doe alleged he did not understand how the
two fixed annuities sold to him by William Alex
Smith worked and did not understand the impact.
At that time William Smith was employed by
Company, Inc.
Investor John Doe
Broker William Alex Smith, ID 12345
16. Intuit Fraud Detection Use-Case
• Intuit builds fraud detection
models for:
• Account Takeover
• Identity Theft
• Require real-time
predictions for TurboTax
Client
• Previous model deployment
process was complex and
time-consuming
6 Months
Provision
Compute
Environment
Connect to
Data Lake
Create
Microservices
Traffic Load-
Balancing
Architect the
Service
17. Amazon SageMaker
A fully-managed service that enables data scientists and
developers to quickly and easily build machine-learning
based models into smart production applications.
18. Amazon SageMaker Components
Pre-built
notebooks
for common
problems
Built-in, high
performance
algorithms
BUILD
One-click
training
Hyperparameter
optimization
TRAIN
Fully managed
hosting with
auto-scaling
DEPLOY
One-click
deployment
19. How Intuit Cut Model Development time by 90%
Model hosting (SM)
Calculate
features
Reader
Cleanser
Processor
Data
Look-up
Training
Feature store
Model training (SM)
Model
Client serviceAmazon EMR
6 month process ⇢ ~1 week
Real-time predictions
20. Classifying Fraud with Supervised Learning
• ML approach that trains a model using
data labeled with the correct answers
• Labeled data required!
• Continues learning as it sees more data
• Adjust dynamically to new ground
truth
• SageMaker built-in algorithms:
• XGBoost
• LinearLearner
• Factorization Machines
• Or bring your own!
Date Amount Att1 Att2 … Fraud
02-28-18 $137 4.5 12 … 0
03-01-18 $5 0.2 1.4 … 1
…
03-28-18 $627 -2.1 3.7 … 0
Model Training
Model
New
transactions
Predictions
Labeled Training Data
22. Tips for Getting Started
• You don’t need to be an expert to begin building ML-
enabled applications
• Take iterative approaches
• Many models have already been developed for general use
cases and you don’t need to train something from scratch
• Use general purpose models/APIs first, then customize if
needed as your use-case matures
• Utilize SageMaker examples and modify to suit your
use-case
• Identify projects with clear business outcomes
23. Next Steps for Building a Proof-of-Concept
1. Get Trained: https://aws.amazon.com/training/learning-paths/machine-
learning/
2. Build something: https://aws.amazon.com/getting-started/
11 Tutorials, 1 Project, 5 Digital Trainings