This document discusses how knowledge graphs and graph analytics can be used for anomaly detection in financial services. It describes building time-sequenced graph data models from a base knowledge graph to model customer behavior over time. Champion models are applied to each time window to learn a statistical distribution, and outliers in that distribution that are hard to reproduce can indicate anomalous financial behavior worthy of investigation, such as money laundering. Scaling the graph snapshots by collections of nodes and edges allows analyzing behavior at different levels from micro to macro.
1. Graphs and Financial
ServicesAnalytics
Michael Moore, Ph.D. Executive Director, Enterprise
Knowledge Graphs + AI
EY Performance Improvement Advisory
Omar Azhar, M.S. Manager,
Machine Learning andAdvanced Analytics
EY Financial ServicesOrganization
Miguel Perez, Ph.D. (DND), M.S. Senior,
Machine Learning andAdvanced Analytics
EY Financial ServicesOrganization
8. ► Common use cases for graph analytics
► Recommendation engines
► Supply chain and network optimization
► Fraud networks
► Community detection (social network analysis)
► Impact analysis / network contagion
► Anomaly detection
7
Graph Analytics Use Cases
Focus of this talk
9. 8
Anomalous Behavior Detection in Dynamic Graphs in Financial Services
Anomalies are not always about finding bad behavior. We’re trying to find change in a network or behavior that is
indicative of a significant change in our assumptions
• Customer Behavior: A life event such as new job, new house, marriage. Significant life changes are
indicated by customers behaving in ways that they previously did not. Points of opportunities for
providing new services
• Transaction Networks at Scale: What defines an efficient flow of funds vs. an inefficient? Are they
correlated with the type of behavior?
•
How should we think of structuring this as a graph problem?
problem?
10. 9
Let’s start with a model everyone is familiar with…Customer 360
FA Hub Corporate
Wiki
Call Logs
E-mail Logs
Social network
data
Financial Hub
Accounts
Hub
Transaction
Logs
Now that we’ve got our graph model we now need to consider
scale
11. 10
Scaling determines what snapshots you take of the graph for analysis
Micro. Looking at my graph at an account
level
12. 11
Scaling determines what snapshots you take of the graph for analysis
Moving up the scale. Looking at the customer level
14. 13
How do we think about scaling in a graph problem?
Consider the business defined scale
• Scaling by collections of nodes: clumping nodes together -> household node
• Generally defined by business and domain expertise
• Scaling by collections of edges: clumping edges together -> geometric time-averaging
• Requires both business / domain knowledge as well as a little bit of investigating. How do you
tell what is a full time cycle?
Micro Macro
Account Firm
Coarse versus fine grain Tuning
15. 14
Understanding your graph snapshot. Different data models of the same underlying
knowledge graph
Explore your graph snapshots. You will notice natural separation or clusters / segments in each snapshot. Most of this is already
done through current segmentation models at most firms
Can we use similar graph snapshots to describe expected behavior?
Checking accounts
Credit Cards
Similar customers
by spend
Households with
similar incomes
16. 15
But how is this any different than what is already done today? Why Graph?
They all belong to this
household
The college student
Let’s investigate how a single household shows up through two separate
snapshots
The parents
17. 16
How should change in one snapshot change the nodes in another snapshot?
What does it mean for a node in on snapshot to change it’s data to move to another
location in it’s snapshot?
Can we model that?
18. 17
We should expect diffusion of information across our graph data models (GDMs)
Household moves to a lower cost state -> Household retains income but is wealthier in new state
19. 18
Information should spread across GDMs. It should go both ways but not necessarily
with the same weight
College student graduates and moves back in with his parents
20. 19
Can we now model this as expected change across our GDMs?
Identify node changes
What other types of change have a small impact in one GDM and a large impact in the other
GDM?: One family member moves -> Household income is represented differently in one model versus
another
21. 20
Expressing Behavior with graph snapshots
Compare graph snapshots to identify node behavioral change
• Similar GDMs can give you a context dependent way of expressing behavioral change! This
means we can self-compute it
22. 21
Expressing Behavior with graph snapshots
Compare graph snapshots to identify node behavioral change
• Similar GDMs can give you a context dependent way of expressing behavioral change! This means
we can self-compute it
• Expressing behavioral change is now deeply connected to expressing the structural change on
similar GDMs that are supported by the same underlying knowledge graph
23. 22
Behavioral Change over time
Time-Sequenced Graph Data Models (TSGDM)
• A sequence of graph data models provides the Context for behavioral change over time.
24. 23
TSGDM – Assumptions – Semantic Compatibility
Time-Sequenced Graph Data Models – Necessary Conditions
• (1) Intuitive edges that are semantically compatible with the parent KG and entity resolution
25. 24
TSGDM – Assumptions – Semantic Compatibility
Time-Sequenced Graph Data Models – Necessary Conditions
• (1) Intuitive edges that are semantically compatible with the parent KG and entity resolution
• (2) Obeys information theoretic concerns about “information propagation on a geometric
structure
26. 25
TSGDM – Assumptions – Semantic Compatibility
Time-Sequenced Graph Data Models – Necessary Conditions
• (1) Intuitive edges that are semantically compatible with the parent KG and entity resolution
• (2) Obeys information theoretic concerns about “information propagation on a geometric
structure
• (3) Use an Unsupervised architecture that correctly diffuses information in each time step
27. 26
TSGDM – Assumptions – Semantic Compatibility
Time-Sequenced Graph Data Models – Necessary Conditions
• (1) Intuitive edges that are semantically compatible with the parent KG and entity resolution
• (2) Obeys information theoretic concerns about “information propagation on a geometric
structure
• (3) Use an Unsupervised architecture that correctly diffuses information in each time step
• (4) The architecture learns how we should be describing behavioral change – not the other
way around
28. 27
TSGDM – Assumptions – Semantic Compatibility
Time-Sequenced Graph Data Models – Necessary Conditions
• (1) Intuitive edges that are semantically compatible with the parent KG and entity resolution
• (2) Obeys information theoretic concerns about “information propagation on a geometric
structure
• (3) Use an Unsupervised architecture that correctly diffuses information in each time step
• (4) The architecture learns how we should be describing behavioral change – not the other
way around
• (5) Use the statistical distribution learned to identify outliers
• (6) Rank Those outliers
29. 28
TSGDM - Using a learned statistical distribution to identify outliers
Take your customer transaction data and build a Parent Knowledge Graph
30. 29
Scaling experimentation let’s us study different schema for candidate TSGDM
Comparing two similar GDMs
provides context for
behavioral change of a node
TSGDM - Using a learned statistical distribution to identify outliers
31. 30
Apply the selected schema on each month of data (or another appropriate time scale)
Memory constraints will fix the
number of time windows your
architecture can learn from
TSGDM - Using a learned statistical distribution to identify outliers
Month
1
Month
2
Month
3
Month
4
Month
X
32. 31
Learn a Champion Model on each time window batch
TSGDM - Using a learned statistical distribution to identify outliers
Champion Model
33. 32
Apply Champion Model to each TSGDM and investigate the tail of each distribution
TSGDM - Using a learned statistical distribution to identify outliers
The log scale compression error, or reconstruction error,
tends to follow a power law distribution.
Graph structural changes that are harder to
reproduce tend to be outliers!
34. 33
• Create multiple champion models with some overlap in their
time windows
• The overlap in the cumulative error between champion
models will be the outliers of interest
• Rank all nodes by their cumulative error for each Champion
Model
• Key Takeaway: If a financial behavior is hard to
replicate in this framework, the more likely the
behavior is an anomaly
TSGDM - Using a learned statistical distribution to identify outliers
…
35. 34
Use Case: Anti-Money Laundering
Existing Business Problem: Financial Institutions are responsible for monitoring the transaction activity of client
accounts in order to detect the presence of Money Laundering activity. Rule-based systems generate too many false-
positive alerts that require expensive and subjective manual review. Industry standard performance is 1:1000
36. 35
Aggregated activity in real-world networks can demonstrate the efficiency of money-
flow in certain pockets of our economy
37. 36
Aggregated activity in real-world networks can demonstrate the efficiency of money-
flow in certain pockets of our economy
normal and random
dispersion of money flow
that follows a natural path
38. A few regions of high interconnectivity connected to spoke-like hubs. Low
reproducibility, potentially anomalous
39. 38
A few regions of high interconnectivity connected to spoke-like hubs. Low
reproducibility, potentially anomalous
Potentially higher
connectedness than normal
40. EY Cross-Sector Graph Experience: MDM, 360°, AML/Fraud, Recommenders
Fortune 100 Tech Company
Use Case:
Global B2B Account 360° view and
marketing attribution
Approach:
Neo4j graph with 500M nodes
and 2.2B relationships,
representing all known business
accounts, contacts and marketing
touches. Mastered data from
17disparate transactional sources
in Azure Data Lake. Supported in-
graph analytics for marketing
attribution and next best action
recommendations across global
geographies
Duration:
16 weeks to working graph
Fortune 100 Footwear Company
Use Case:
Converged Brick & Mortar +
Online Shopper 360° View
Approach:
Neo4j graph with 2B nodes and
relationships, representing sales
transactions for 40M shoppers
across 275 physical stores and the
ecommerce platform. Algorithmic
extraction and profiling from raw
XML records in AWS Hadoop,
MDM record concordance and in-
graph analytics for product
associations, store analytics and
recommendation services.
Duration:
12 weeks to working graph
Fortune 500 Cruise Line Company
Use Case:
Shipboard and Shoreside
Recommendation Engine
Approach:
Neo4j graph deployable to
shipboard VM Ware data centers,
with streaming updates from
large shoreside Neo4j graph
integrating data from Azure
Cerebro, Adobe Experience
Manager and legacy transactional
systems. In-graph
analytics,services API,
recommendation engine for next
best activity for passengers
surfaced via mobile app
Duration:
12 weeks to working graph
Fortune 100 Investment Firm
Use Case:
Enhanced Anti-Money Laundering
and Fraud Detection using
Graph+AI
Approach:
Neo4j graph of account 360° view
representing activity of 2M
accounts over 4 years. MDM and
entity extraction for account and
party identity elements from
enterprise Oracle system.
Network clustering, feature
engineering and graph embedding
in TensorFlow deep learning
classifier for suspicious activity
patterns across accounts and
between parties.
Duration:
16 weeks to working graph
Fortune 100 Tech Company
Use Case:
B2B Local Marketing Events
Recommendation Engine
Approach:
Neo4j graph and personalized
next best event recommendation
engine for B2B field marketers.
Reconciles physical and digital
event attendees with corporate
account structures for 10K
accounts and 5M contacts
Entities mastered from
transactional data in SQLServer
and Azure Data Lake.
Microservices APIs support data
syndication to martech
applications and PowerBI
reporting.
Duration:
10 weeks to working graph
Notas del editor
Consider your different snapshots, consider the scalings that make sense for your data and the connectedness available in your data. Customer snapshots might not make sense for too small a time scale, so you have to investigate it
Does the anomalous structural change of a node over a 5 month window mean the same thing as the anomalous structural change over a 13 month window? Clearly not. It’s a contextual window for resolving what the architecture means by anomaly
Month snapshots make sense so use it
This is a non-convex optimization problem on the model weights and learned operators, making model performance very sensitive to initial conditions and complicates reproducibility.
This is hard to understand
If I apply
Here are some toy examples
Active area of research – Here are some of the ideas that guiding our R&D,
Here are some examples of what you might see
Redraw them – blue square – use eraser – make the scales all the same for the red ones
Hard to understand
We expect different
Simplify this down to one champion model – it ascribes what
Why is anomaly detection important
Robust pattern detection -