SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
Jim Dowling
CEO / Co-Founder
Logical Clocks
Associate Prof at KTH – Royal Institute of Technology
Anti Money Laundering and GANs
Berlin Meetup
@jim_dowling
● Problem: Increase detection rate and reduce costs for AML.
● Solution: We used the Hopsworks platform to train GANs to classify
transactions as suspected for money laundering or not. We have worked with
a large transaction dataset (~40 TB) and the solution uses Spark for Feature
Engineering and TensorFlow/GPUs to train a binary classifier, classifying
transactions as either clean or dirty. We use the open-source Hopsworks
platform to manage features, scale-out training, and manage models.
● Reference: Whitepaper
Agenda
● Money laundering involves turning the “dirty” money into “clean” money either
through an obscure sequence of banking transfers or through commercial
transactions.
● The three broad stages of money laundering* are:
○ Placement (smurf it)
○ Layering (spread it out fast)
○ Integration (buy stuff)
What is Money Laundering?
*https://towardsdatascience.com/the-art-of-engineering-features-for-a-strong-machine-learning-model-a47a876e654c
Rules-Base AML vs Deep Learning AML
AML as a Supervised ML Problem
● Anti-money laundering (AML) is a pattern
matching problem
● AML systems should automatically flag ‘suspect’
financial transactions
○ Followed by manual investigation
● Historical transaction datasets have massive
data imbalance between the number of ‘clean’
transactions versus ‘dirty’ transactions
Clean
Transactions
Dirty
Transactions
Millions or Billions
100s or 1000s
Implications of AML as a Binary Classification Problem
True Positive
Reality:  A Money Laundering Transaction 
Prediction: “Dirty” transaction predicted
Result: Good
False Positive
Reality: Not a Money Laundering Transaction 
Prediction: “Dirty” transaction predicted
Result: Unnecessary work and cost!
False Negative
Reality: A Money Laundering Transaction 
Prediction: “Clean” transaction predicted
Result: Fines/jail by authorities/regulator!
True Negative
Reality: Not a Money Laundering Transaction 
Prediction: “Clean” transaction predicted
Result: Good
Confusion matrix of our Binary AML Classifier with all possible predictions and their consequences.
We use a variant of the F1 score to evaluate models (precision, recall, fallout should not be weighted equally).
AML as an Anomaly Detection Problem
“Anomaly detection follows
quite naturally from a good
unsupervised model”
Alex Graves (Deep Mind)
Traditional unsupervised
approaches do not scale:
k-means clustering and
principal component
analysis
[Image from Ruff et al, “Deep Semi-Supervised Anomaly Detection”, https://arxiv.org/pdf/1906.02694.pdf
AML - Semi-Supervised Anomaly Detection
AML is not a classical use-case for anomaly
detection as we typically have labelled
datasets, albeit imbalanced.
“Semi-supervised learning is a class of machine
learning tasks and techniques that also make
use of unlabeled data for training – typically a
small amount of labeled data with a large
amount of unlabeled data.” Wikipedia
GANs and Other Methods for Anomaly Detection
● Variational Auto-Encoders for Anomaly Detection
○ Easier to train, performance not state-of-the-art
● Generative Adversarial Networks (GANs)
○ Learn the manifold of normal samples (what to do if anomaly-free
dataset is polluted)
○ One-Class Classifier for Novelty Detection GAN
○ BiGAN, BigGAN, BigBiGAN, GANOMALY, f-AnoGAN, GANs for Fraud
“[For GANs] the Convolutional Neural Network architecture is more important than how you
train them”, Marc Aurelio Ranzato (Facebook) at NeurIPS 2018.
GAN Discriminator-Based Anomaly Detection
GAN Discriminator-Based Anomaly Detection
GANs are hard to train
● Pick the right GAN Architecture
● Risk of mode-collapse
● Hard to tune Hyperparameters
Different Hyperparameter Tuning Strategies
GANs are hard to train
● Mode collapse
○ Transactional data distributions are multimodal. There will be multiple types of
transactional behaviour that will be perfectly normal.
○ Original GAN is based on the zero-sum non-cooperative game. In these setting
when the mini-max game reaches the Nash equilibrium too soon. The generator
will learn to produce only a limited number of modes and mode collapse occurs.
● GANs are highly sensitive to the hyperparameters.
○ Finding good hyperparameters takes time, especially for GANs. List of possible
hyperparameters and tricks are listed here https://github.com/soumith/ganhacks
○ It is essential to have a good optimization and hyperparameter tuning engine
How to address mode collapse problem
● MO-GAAL [Liu, et al] proposed using multiple generators, where different
generators will be in charge of learning different modes of distribution.
● Schleg, et al in f-AnoGAN proposed replacing DCGAN with WGAN-GP and
introducing an encoder that was trained sequentially for image to latent
space mapping.
● Berg, et al improved f-AnoGAN by training Generator and Encoder jointly, as
well as employing progressive growing GAN.
WGAN-Gradient-Penalty Based Anomaly Detection
[Image from Berg et Al - https://arxiv.org/pdf/1905.11034.pdf ]
[Image from Berg et Al - https://arxiv.org/pdf/1905.11034.pdf ]
WGAN-Gradient-Penalty Based Anomaly Detection
Will GANs help improve AML predictions?
Expected results from using GANs (Anomaly Detection at Spark/AI EU Summit 2019)
“[In China] two commercial
banks have reduced losses
of about 10 million RMB in
twelve weeks and
significantly improved their
business reputation”
GAN-based telecom fraud
detection at the receiving
bank
Online
Feature Store
Offline
Feature Store
Train,
Batch App
Feature Store
<10ms
TBs/PBs
How can we manage the Features between Training/Serving?
Recent transaction counts
(Steaming App)
Streaming App pushes CDC data
Pandas App updates every hour
Batch PySpark App pushes
updates every day
Low
Latency
Features
High
Latency
Features
Real-time features
(cust IDs, amount, type, datetime)
Real-time
Data
Event Data
SQL
S3, HDFS
Online AML
App
SQL DW
DataFrameAPI
HOPSWORKS
Offline FS
Apache Hive
HopsFS
Read and Join Features
Online FS
MySQL Cluster
(External)
Spark Cluster
fs.get_features([“name”, “Pclass”,
“Sex”, “Balance”, “Survived”])
Storage
(S3, HopsFS, HDFS, ADLS)
.npy, .tfrecords, .csv
Create AML Training Datasets
Model Development Lifecycle in Hopsworks
Hopsworks Conventions
/training_datasets
/models
/logs
/notebooks
/featurestore
Conventions and Implicit Provenance in Hopsworks*
*https://www.usenix.org/conference/opml20/presentation/ormenisan
In [
]:
dataset = tf.data.Dataset.list_files("training_datasets/resnet/*.tfrecord")
tf.saved_model.save(model, ‘models/ResNet’)
maggy.lagom(....)
Exploration
Experimentati
on
Model
Training
Explainability
Validation
Serving
Feature
Pipelines
ML Model Development Lifecycle
Hyperparameter Tuning for GANs
Explore
and Design
Experimentation:
Tune and Search
Model Training
(Distributed)
Explainability and
Ablation Studies
ML Model Dev Lifecycle is Iternative
Explore
and Design
Experimentation:
Tune and Search
Model Training
(Distributed)
Explainability and
Ablation Studies
Rewrite your code at each stage => Iteration is impossible!
Ablation StudiesEDA HParam Tuning Training (Dist)
It’s the Frameworks’ fault – they make us rewrite it!
OBLIVIOUS
TRAINING
FUNCTION
# RUNS ON THE WORKERS
def train():
def input_fn(): # return dataset
model = …
optimizer = …
model.compile(…)
….
Ablation StudiesEDA HParam Tuning Training (Dist)
Obvlious Training Function – Write Once, Reuse Many Times
def dataset(batch_size):
(x_train, y_train) = load_data()
x_train = x_train / np.float32(255)
y_train = y_train.astype(np.int64)
train_dataset = tf.data.Dataset.from_tensor_slices(
(x_train,y_train)).shuffle(60000)
.repeat().batch(batch_size)
return train_dataset
def build_and_compile_cnn_model(lr):
model = tf.keras.Sequential([
tf.keras.Input(shape=(28, 28)),
tf.keras.layers.Conv2D(32, 3, activation='relu'),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(
loss=SparseCategoricalCrossentropy(from_logits=True),
optimizer=SGD(learning_rate=lr))
return model
def dataset(batch_size):
(x_train, y_train) = load_data()
x_train = x_train / np.float32(255)
y_train = y_train.astype(np.int64)
train_dataset = tf.data.Dataset.from_tensor_slices(
(x_train,y_train)).shuffle(60000)
.repeat().batch(batch_size)
return train_dataset
def build_and_compile_cnn_model(lr):
model = tf.keras.Sequential([
tf.keras.Input(shape=(28, 28)),
tf.keras.layers.Conv2D(32, 3, activation='relu'),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(
loss=SparseCategoricalCrossentropy(from_logits=True),
optimizer=SGD(learning_rate=lr))
return model
NO
CHANGES!
What is Transparent Code in Practice?
def aml(kernel, pool, dropout, reporter):
# This is your training iteration loop
for i in range(number_iterations):
...
# add the maggy reporter to report the metric to be optimized
reporter.broadcast(metric=accuracy)
...
# Return the same final metric
return accuracy
from maggy import experiment, Searchspace
sp = Searchspace(kernel=('INTEGER', [2, 8]), pool=('INTEGER', [2, 8]))
result = experiment.lagom(train=aml, searchspace=sp, optimizer='randomsearch’,
direction='max’, num_trials=15, name='MNIST’ )
Maggy for HParam Optimization
Maggy is built on top of PySpark
Get Started: Paysim AML Dataset (Kaggle)
● Graph-based Candidate Features, Concatenated Features
○ Link the origin account, destination account, and transaction type to track
the problem of smurfing and the higher cash withdrawals
● Frequency Candidate Features
○ Learn how frequently the account is used
● Amount Features
○ Magnitude of the amount of transactions.
● Time-Since Features
○  Learn the speed of transactions
● Velocity-Change Features
○ Identify a sudden change in the behaviour of accounts
https://www.kaggle.com/ntnu-testimon/paysim1?select=PS_20174392719_1491204439457_log.csv
Hopsworks Cluster
Project-Based Multi-Tenant Security
API
KEY
IAM Profile
Users
Jobs
Dev Feature Store
Staging Feature Store
Prod Feature Store
User
Login
(LDAP, AD,
OAuth2, 2FA)
databricks
SageMaker
Kubeflow
Amazon EMR
Delta LakeSnowflakeAmazon S3
Amazon
Redshift
Full Featured
AGPL-v3 License Model
Hopsworks Community
Kubernetes Support
• Model Serving
• Other services for robustness (Jupyter, more coming)
Authentication (LDAP, Kerberos, OAuth2)
Github support
Hopsworks Enterprise
Managed SAAS platform (currently only on AWS)
Hopsworks.ai
Hopsworks – open-source or managed platform
Thank you.
github.com/logicalclocks/hopsworks
-
@logicalclocks
-
www.logicalclocks.com

Más contenido relacionado

La actualidad más candente

REST API Pentester's perspective
REST API Pentester's perspectiveREST API Pentester's perspective
REST API Pentester's perspectiveSecuRing
 
Alamo ACE - Threat Hunting with CVAH
Alamo ACE - Threat Hunting with CVAHAlamo ACE - Threat Hunting with CVAH
Alamo ACE - Threat Hunting with CVAHBrandon DeVault
 
Threat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-onThreat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-onSplunk
 
JSMVCOMFG - To sternly look at JavaScript MVC and Templating Frameworks
JSMVCOMFG - To sternly look at JavaScript MVC and Templating FrameworksJSMVCOMFG - To sternly look at JavaScript MVC and Templating Frameworks
JSMVCOMFG - To sternly look at JavaScript MVC and Templating FrameworksMario Heiderich
 
Threat hunting and achieving security maturity
Threat hunting and achieving security maturityThreat hunting and achieving security maturity
Threat hunting and achieving security maturityDNIF
 
Effective Threat Hunting with Tactical Threat Intelligence
Effective Threat Hunting with Tactical Threat IntelligenceEffective Threat Hunting with Tactical Threat Intelligence
Effective Threat Hunting with Tactical Threat IntelligenceDhruv Majumdar
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Andrea Dal Pozzolo
 
Synthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGangerSynthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGangerQuantUniversity
 
AI and Cybersecurity - Food for Thought
AI and Cybersecurity - Food for ThoughtAI and Cybersecurity - Food for Thought
AI and Cybersecurity - Food for ThoughtNUS-ISS
 
OWASP AppSecCali 2015 - Marshalling Pickles
OWASP AppSecCali 2015 - Marshalling PicklesOWASP AppSecCali 2015 - Marshalling Pickles
OWASP AppSecCali 2015 - Marshalling PicklesChristopher Frohoff
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkSqrrl
 
PowerShell for Practical Purple Teaming
PowerShell for Practical Purple TeamingPowerShell for Practical Purple Teaming
PowerShell for Practical Purple TeamingNikhil Mittal
 
Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management Argyle Executive Forum
 
Enterprise JavaScript Error Handling (Ajax Experience 2008)
Enterprise JavaScript Error Handling (Ajax Experience 2008)Enterprise JavaScript Error Handling (Ajax Experience 2008)
Enterprise JavaScript Error Handling (Ajax Experience 2008)Nicholas Zakas
 

La actualidad más candente (20)

REST API Pentester's perspective
REST API Pentester's perspectiveREST API Pentester's perspective
REST API Pentester's perspective
 
Fraud detection
Fraud detectionFraud detection
Fraud detection
 
Alamo ACE - Threat Hunting with CVAH
Alamo ACE - Threat Hunting with CVAHAlamo ACE - Threat Hunting with CVAH
Alamo ACE - Threat Hunting with CVAH
 
Threat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-onThreat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-on
 
JSMVCOMFG - To sternly look at JavaScript MVC and Templating Frameworks
JSMVCOMFG - To sternly look at JavaScript MVC and Templating FrameworksJSMVCOMFG - To sternly look at JavaScript MVC and Templating Frameworks
JSMVCOMFG - To sternly look at JavaScript MVC and Templating Frameworks
 
Threat hunting and achieving security maturity
Threat hunting and achieving security maturityThreat hunting and achieving security maturity
Threat hunting and achieving security maturity
 
Effective Threat Hunting with Tactical Threat Intelligence
Effective Threat Hunting with Tactical Threat IntelligenceEffective Threat Hunting with Tactical Threat Intelligence
Effective Threat Hunting with Tactical Threat Intelligence
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?
 
Synthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGangerSynthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGanger
 
Trends in AML Compliance
Trends in AML ComplianceTrends in AML Compliance
Trends in AML Compliance
 
AI and Cybersecurity - Food for Thought
AI and Cybersecurity - Food for ThoughtAI and Cybersecurity - Food for Thought
AI and Cybersecurity - Food for Thought
 
Malware Detection using Machine Learning
Malware Detection using Machine Learning	Malware Detection using Machine Learning
Malware Detection using Machine Learning
 
Owasp zap
Owasp zapOwasp zap
Owasp zap
 
OWASP AppSecCali 2015 - Marshalling Pickles
OWASP AppSecCali 2015 - Marshalling PicklesOWASP AppSecCali 2015 - Marshalling Pickles
OWASP AppSecCali 2015 - Marshalling Pickles
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your Network
 
Trends in AML Compliance and Technology
Trends in AML Compliance and TechnologyTrends in AML Compliance and Technology
Trends in AML Compliance and Technology
 
PowerShell for Practical Purple Teaming
PowerShell for Practical Purple TeamingPowerShell for Practical Purple Teaming
PowerShell for Practical Purple Teaming
 
Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management
 
Enterprise JavaScript Error Handling (Ajax Experience 2008)
Enterprise JavaScript Error Handling (Ajax Experience 2008)Enterprise JavaScript Error Handling (Ajax Experience 2008)
Enterprise JavaScript Error Handling (Ajax Experience 2008)
 
Fraud Analytics
Fraud AnalyticsFraud Analytics
Fraud Analytics
 

Similar a GANs for Anti Money Laundering

Predictive analytics semi-supervised learning with GANs
Predictive analytics   semi-supervised learning with GANsPredictive analytics   semi-supervised learning with GANs
Predictive analytics semi-supervised learning with GANsterek47
 
Using GANs to improve generalization in a semi-supervised setting - trying it...
Using GANs to improve generalization in a semi-supervised setting - trying it...Using GANs to improve generalization in a semi-supervised setting - trying it...
Using GANs to improve generalization in a semi-supervised setting - trying it...PyData
 
Semi-supervised learning with GANs
Semi-supervised learning with GANsSemi-supervised learning with GANs
Semi-supervised learning with GANsterek47
 
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™Databricks
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningHoa Le
 
Computation graphs - Tensorflow & CNTK
Computation graphs - Tensorflow & CNTKComputation graphs - Tensorflow & CNTK
Computation graphs - Tensorflow & CNTKA H M Forhadul Islam
 
Everything you need to know about AutoML
Everything you need to know about AutoMLEverything you need to know about AutoML
Everything you need to know about AutoMLArpitha Gurumurthy
 
Machine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeonMachine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeonActiveeon
 
Presentation
PresentationPresentation
Presentationbutest
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesStitch Fix Algorithms
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)ActiveEon
 
GNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptGNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptManiMaran230751
 
AI & ML in Defence Systems - Sunil Chomal
AI & ML in Defence Systems   - Sunil ChomalAI & ML in Defence Systems   - Sunil Chomal
AI & ML in Defence Systems - Sunil ChomalSunil Chomal
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series ForecastingBillTubbs
 

Similar a GANs for Anti Money Laundering (20)

Predictive analytics semi-supervised learning with GANs
Predictive analytics   semi-supervised learning with GANsPredictive analytics   semi-supervised learning with GANs
Predictive analytics semi-supervised learning with GANs
 
Data Parallel Deep Learning
Data Parallel Deep LearningData Parallel Deep Learning
Data Parallel Deep Learning
 
Using GANs to improve generalization in a semi-supervised setting - trying it...
Using GANs to improve generalization in a semi-supervised setting - trying it...Using GANs to improve generalization in a semi-supervised setting - trying it...
Using GANs to improve generalization in a semi-supervised setting - trying it...
 
Semi-supervised learning with GANs
Semi-supervised learning with GANsSemi-supervised learning with GANs
Semi-supervised learning with GANs
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
 
C3 w1
C3 w1C3 w1
C3 w1
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
Computation graphs - Tensorflow & CNTK
Computation graphs - Tensorflow & CNTKComputation graphs - Tensorflow & CNTK
Computation graphs - Tensorflow & CNTK
 
Everything you need to know about AutoML
Everything you need to know about AutoMLEverything you need to know about AutoML
Everything you need to know about AutoML
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Machine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeonMachine learning-for-dummies-andrews-sobral-activeeon
Machine learning-for-dummies-andrews-sobral-activeeon
 
Presentation
PresentationPresentation
Presentation
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML Pipelines
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)
 
GNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptGNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.ppt
 
AI & ML in Defence Systems - Sunil Chomal
AI & ML in Defence Systems   - Sunil ChomalAI & ML in Defence Systems   - Sunil Chomal
AI & ML in Defence Systems - Sunil Chomal
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series Forecasting
 

Más de Jim Dowling

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfJim Dowling
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfJim Dowling
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfJim Dowling
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdfJim Dowling
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022Jim Dowling
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Jim Dowling
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Jim Dowling
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigmJim Dowling
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingJim Dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityJim Dowling
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020Jim Dowling
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines Jim Dowling
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyJim Dowling
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019Jim Dowling
 

Más de Jim Dowling (20)

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdf
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
 

Último

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Último (20)

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

GANs for Anti Money Laundering

  • 1. Jim Dowling CEO / Co-Founder Logical Clocks Associate Prof at KTH – Royal Institute of Technology Anti Money Laundering and GANs Berlin Meetup @jim_dowling
  • 2. ● Problem: Increase detection rate and reduce costs for AML. ● Solution: We used the Hopsworks platform to train GANs to classify transactions as suspected for money laundering or not. We have worked with a large transaction dataset (~40 TB) and the solution uses Spark for Feature Engineering and TensorFlow/GPUs to train a binary classifier, classifying transactions as either clean or dirty. We use the open-source Hopsworks platform to manage features, scale-out training, and manage models. ● Reference: Whitepaper Agenda
  • 3. ● Money laundering involves turning the “dirty” money into “clean” money either through an obscure sequence of banking transfers or through commercial transactions. ● The three broad stages of money laundering* are: ○ Placement (smurf it) ○ Layering (spread it out fast) ○ Integration (buy stuff) What is Money Laundering? *https://towardsdatascience.com/the-art-of-engineering-features-for-a-strong-machine-learning-model-a47a876e654c
  • 4. Rules-Base AML vs Deep Learning AML
  • 5. AML as a Supervised ML Problem ● Anti-money laundering (AML) is a pattern matching problem ● AML systems should automatically flag ‘suspect’ financial transactions ○ Followed by manual investigation ● Historical transaction datasets have massive data imbalance between the number of ‘clean’ transactions versus ‘dirty’ transactions Clean Transactions Dirty Transactions Millions or Billions 100s or 1000s
  • 6. Implications of AML as a Binary Classification Problem True Positive Reality:  A Money Laundering Transaction  Prediction: “Dirty” transaction predicted Result: Good False Positive Reality: Not a Money Laundering Transaction  Prediction: “Dirty” transaction predicted Result: Unnecessary work and cost! False Negative Reality: A Money Laundering Transaction  Prediction: “Clean” transaction predicted Result: Fines/jail by authorities/regulator! True Negative Reality: Not a Money Laundering Transaction  Prediction: “Clean” transaction predicted Result: Good Confusion matrix of our Binary AML Classifier with all possible predictions and their consequences. We use a variant of the F1 score to evaluate models (precision, recall, fallout should not be weighted equally).
  • 7. AML as an Anomaly Detection Problem “Anomaly detection follows quite naturally from a good unsupervised model” Alex Graves (Deep Mind) Traditional unsupervised approaches do not scale: k-means clustering and principal component analysis [Image from Ruff et al, “Deep Semi-Supervised Anomaly Detection”, https://arxiv.org/pdf/1906.02694.pdf
  • 8. AML - Semi-Supervised Anomaly Detection AML is not a classical use-case for anomaly detection as we typically have labelled datasets, albeit imbalanced. “Semi-supervised learning is a class of machine learning tasks and techniques that also make use of unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data.” Wikipedia
  • 9. GANs and Other Methods for Anomaly Detection ● Variational Auto-Encoders for Anomaly Detection ○ Easier to train, performance not state-of-the-art ● Generative Adversarial Networks (GANs) ○ Learn the manifold of normal samples (what to do if anomaly-free dataset is polluted) ○ One-Class Classifier for Novelty Detection GAN ○ BiGAN, BigGAN, BigBiGAN, GANOMALY, f-AnoGAN, GANs for Fraud “[For GANs] the Convolutional Neural Network architecture is more important than how you train them”, Marc Aurelio Ranzato (Facebook) at NeurIPS 2018.
  • 12. GANs are hard to train ● Pick the right GAN Architecture ● Risk of mode-collapse ● Hard to tune Hyperparameters Different Hyperparameter Tuning Strategies
  • 13. GANs are hard to train ● Mode collapse ○ Transactional data distributions are multimodal. There will be multiple types of transactional behaviour that will be perfectly normal. ○ Original GAN is based on the zero-sum non-cooperative game. In these setting when the mini-max game reaches the Nash equilibrium too soon. The generator will learn to produce only a limited number of modes and mode collapse occurs. ● GANs are highly sensitive to the hyperparameters. ○ Finding good hyperparameters takes time, especially for GANs. List of possible hyperparameters and tricks are listed here https://github.com/soumith/ganhacks ○ It is essential to have a good optimization and hyperparameter tuning engine
  • 14. How to address mode collapse problem ● MO-GAAL [Liu, et al] proposed using multiple generators, where different generators will be in charge of learning different modes of distribution. ● Schleg, et al in f-AnoGAN proposed replacing DCGAN with WGAN-GP and introducing an encoder that was trained sequentially for image to latent space mapping. ● Berg, et al improved f-AnoGAN by training Generator and Encoder jointly, as well as employing progressive growing GAN.
  • 15. WGAN-Gradient-Penalty Based Anomaly Detection [Image from Berg et Al - https://arxiv.org/pdf/1905.11034.pdf ]
  • 16. [Image from Berg et Al - https://arxiv.org/pdf/1905.11034.pdf ] WGAN-Gradient-Penalty Based Anomaly Detection
  • 17. Will GANs help improve AML predictions? Expected results from using GANs (Anomaly Detection at Spark/AI EU Summit 2019) “[In China] two commercial banks have reduced losses of about 10 million RMB in twelve weeks and significantly improved their business reputation” GAN-based telecom fraud detection at the receiving bank
  • 18. Online Feature Store Offline Feature Store Train, Batch App Feature Store <10ms TBs/PBs How can we manage the Features between Training/Serving? Recent transaction counts (Steaming App) Streaming App pushes CDC data Pandas App updates every hour Batch PySpark App pushes updates every day Low Latency Features High Latency Features Real-time features (cust IDs, amount, type, datetime) Real-time Data Event Data SQL S3, HDFS Online AML App SQL DW DataFrameAPI
  • 19. HOPSWORKS Offline FS Apache Hive HopsFS Read and Join Features Online FS MySQL Cluster (External) Spark Cluster fs.get_features([“name”, “Pclass”, “Sex”, “Balance”, “Survived”]) Storage (S3, HopsFS, HDFS, ADLS) .npy, .tfrecords, .csv Create AML Training Datasets
  • 21. Hopsworks Conventions /training_datasets /models /logs /notebooks /featurestore Conventions and Implicit Provenance in Hopsworks* *https://www.usenix.org/conference/opml20/presentation/ormenisan In [ ]: dataset = tf.data.Dataset.list_files("training_datasets/resnet/*.tfrecord") tf.saved_model.save(model, ‘models/ResNet’) maggy.lagom(....)
  • 22.
  • 25. Explore and Design Experimentation: Tune and Search Model Training (Distributed) Explainability and Ablation Studies ML Model Dev Lifecycle is Iternative
  • 26. Explore and Design Experimentation: Tune and Search Model Training (Distributed) Explainability and Ablation Studies Rewrite your code at each stage => Iteration is impossible!
  • 27. Ablation StudiesEDA HParam Tuning Training (Dist) It’s the Frameworks’ fault – they make us rewrite it!
  • 28. OBLIVIOUS TRAINING FUNCTION # RUNS ON THE WORKERS def train(): def input_fn(): # return dataset model = … optimizer = … model.compile(…) …. Ablation StudiesEDA HParam Tuning Training (Dist) Obvlious Training Function – Write Once, Reuse Many Times
  • 29. def dataset(batch_size): (x_train, y_train) = load_data() x_train = x_train / np.float32(255) y_train = y_train.astype(np.int64) train_dataset = tf.data.Dataset.from_tensor_slices( (x_train,y_train)).shuffle(60000) .repeat().batch(batch_size) return train_dataset def build_and_compile_cnn_model(lr): model = tf.keras.Sequential([ tf.keras.Input(shape=(28, 28)), tf.keras.layers.Conv2D(32, 3, activation='relu'), tf.keras.layers.Flatten(), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(10) ]) model.compile( loss=SparseCategoricalCrossentropy(from_logits=True), optimizer=SGD(learning_rate=lr)) return model def dataset(batch_size): (x_train, y_train) = load_data() x_train = x_train / np.float32(255) y_train = y_train.astype(np.int64) train_dataset = tf.data.Dataset.from_tensor_slices( (x_train,y_train)).shuffle(60000) .repeat().batch(batch_size) return train_dataset def build_and_compile_cnn_model(lr): model = tf.keras.Sequential([ tf.keras.Input(shape=(28, 28)), tf.keras.layers.Conv2D(32, 3, activation='relu'), tf.keras.layers.Flatten(), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(10) ]) model.compile( loss=SparseCategoricalCrossentropy(from_logits=True), optimizer=SGD(learning_rate=lr)) return model NO CHANGES! What is Transparent Code in Practice?
  • 30. def aml(kernel, pool, dropout, reporter): # This is your training iteration loop for i in range(number_iterations): ... # add the maggy reporter to report the metric to be optimized reporter.broadcast(metric=accuracy) ... # Return the same final metric return accuracy from maggy import experiment, Searchspace sp = Searchspace(kernel=('INTEGER', [2, 8]), pool=('INTEGER', [2, 8])) result = experiment.lagom(train=aml, searchspace=sp, optimizer='randomsearch’, direction='max’, num_trials=15, name='MNIST’ ) Maggy for HParam Optimization
  • 31. Maggy is built on top of PySpark
  • 32. Get Started: Paysim AML Dataset (Kaggle) ● Graph-based Candidate Features, Concatenated Features ○ Link the origin account, destination account, and transaction type to track the problem of smurfing and the higher cash withdrawals ● Frequency Candidate Features ○ Learn how frequently the account is used ● Amount Features ○ Magnitude of the amount of transactions. ● Time-Since Features ○  Learn the speed of transactions ● Velocity-Change Features ○ Identify a sudden change in the behaviour of accounts https://www.kaggle.com/ntnu-testimon/paysim1?select=PS_20174392719_1491204439457_log.csv
  • 33.
  • 34. Hopsworks Cluster Project-Based Multi-Tenant Security API KEY IAM Profile Users Jobs Dev Feature Store Staging Feature Store Prod Feature Store User Login (LDAP, AD, OAuth2, 2FA) databricks SageMaker Kubeflow Amazon EMR Delta LakeSnowflakeAmazon S3 Amazon Redshift
  • 35. Full Featured AGPL-v3 License Model Hopsworks Community Kubernetes Support • Model Serving • Other services for robustness (Jupyter, more coming) Authentication (LDAP, Kerberos, OAuth2) Github support Hopsworks Enterprise Managed SAAS platform (currently only on AWS) Hopsworks.ai Hopsworks – open-source or managed platform