1. Approach to AML Rule Thresholds
By Mayank Johri, Amin Ahmadi, Kevin Kinkade, Sam Day, Michael Spieler, Erik DeMonte
January 12, 2016
Introduction
Institutions are constantly facing the challenge of managing growing alert volumes from automated transaction
monitoring systems, new money laundering typologies to surveil, and more robust regulatory guidance. The question is
how will BSA/AML departments scale to meet demand while managing compliance cost? In order to effectively set
baseline thresholds for new detection scenario configuration or improve the efficacy of existing scenarios, apply
statistical techniques and industry standards to identify the cut-off between “normal” and “abnormal” or “suspicious”
activity. These estimated thresholds are then either challenged or reinforced by the qualitative judgement of professional
investigators during a simulated ‘pseudo’ investigation or ‘qualitative assessment’.
An effective AML transaction monitoring program includes a standardized process for tuning, optimizing, and testing
AML scenarios/typologies that is understandable, repeatable and consistent.
An appropriately tuned or optimized scenario seeks a balance between maximizing the identification of suspicious
activity while simultaneously maximizing resource efficiency. The two competing objectives of tuning and optimization
which must remain in constant balance are:
(1) Reduce the number of ‘false positives’ or alerts generated on transactions that do not require further investigation
or the filing a Suspicious Activity Report (SAR).
(2) Reduce the number of ‘false negatives’ or ‘transactions that were not alerted’ but that do require further
investigation or the filing a SAR.
Phases
The following outlines the eight phase process for the initial tuning:
Phase 0 | Planning. The Policy Office (PO) works closely with the Analytics team to strategize the scenario,
stratification, and parameters that will be used to conduct a threshold analysis.
Phase 1 | Assess Data. Analytics communicates which data fields will be required to perform this analysis to
Information Technology (IT). IT then determines if the ETL of these fields into Transaction Monitoring System is
a near or long term process.
Phase 2 | Query Data. Analytics queries the required transactional data for analysis.
Phase 3 | Quantitative Analysis. Analytics stratifies the data as required (such as grouping like-attributes or ‘non-
tunable parameters’ such as entity/consumer, cash intensive businesses/non-CIB, Cr/Db, high/medium risk
destinations etc.) to account for like-attribute behavior patterns.
Transformation
Once stratified, Analytics performs transformations to the data as required (such as 90 day rolling count/sum/standard
deviation etc).
2. Exploratory Data Analysis
Analytics performs a variety of visual and statistical exploratory data analysis (EDA) techniques to analyze the dataset to
understand the correlation and impact that one or more parameters may have on the scenario, and therefore ultimately
on the alert-to-case efficacy. The objective of EDA is to further explore the recommended parameters (count, amount,
standard deviation, etc.) proposed during the planning phase to determine with greater statistical precision the best
combination of parameters
Segmentation
Once stratified and transformed, Analytics clusters the data’s ‘tunable parameters’ to account for ‘skewness’ in the data
population caused by outliers in order to yield a statistically accurate threshold that is representative of the 85th
percentile.
The 85th percentile is used as a standard when establishing a new rule to set an initial baseline threshold for defining the
cutoff between “normal” transactional data and “unusual” transactional data. For normally distributed data with a bell-
shaped curve (as depicted in the middle diagram below, figure 1.1), the mean value (i.e., the “expected” value) represents
the central tendency of the data, and the 85th percentile represents one standard deviation (σ) from this central tendency.
The 85th percentile could represent a reasonably conservative cutoff line or “threshold” for unusual activity. This
baseline simply provides a starting point for further analysis, and is later refined through qualitative judgement and alert-
to-case efficacy.
If transactional data were always normally distributed, it would be easy to calculate one standard deviation above the
mean to identify where to draw the line representing the 85th percentile of the data (this technique is often referred to as
‘quantiling’), thus establishing the threshold. However, in real world applications transactional data is often not normally
distributed. Transactional data is frequently skewed by outliers (such as uniquely high-value customers), therefore, if
statistical techniques that assume normal distribution (such as quantile) are applied while determining the 85th percentile
(+1 standard deviation from the mean), the result will yield a misrepresentative ‘threshold’ which is offset by the
outlier(s).
Figure 1.1 Distribution affected by Skewness
Clustering
To account for skewness in the data, employ the clustering technique known as ‘Partition around Medoid’ (PAM), or
more specifically, ‘Clustering Large Application’s (CLARA). Clustering is an alternative method of data segmentation
which is not predicated on the assumption that the data is normally distributed or that it has constant variance.
Clustering works by breaking the dataset into groups of distinct clusters around one common entity of the dataset
(which represents the group). This partition more accurately allows the assignment of a boundary (such as a target
threshold to distinguish normal from unusual activity).
The first step of the clustering model is to understand the number of clusters to partition the data by. The methodology
used to identify the optimal number of clusters takes into account two variables:
3. Approximation – How the clustering model fits to the current data set (“Error Measure”)
Generalization – Cost of how well the clustering model could be re-performed with another similar data set
The model for clustering can be seen in the figure below. As the number of clusters increases, (x-axis) the model will
become more complex and thus less stable. Increasing the number of clusters creates a more customized model which is
catered to the current data set, resulting in a high level of approximation. However, in this situation cost will increase as
the flexibility to re-perform using a similar data set will become more difficult. Inversely, the fewer clusters the less
representative the model is for the current data set, but the more scalable it is for future similar data sets. An objective
function curve is plotted to map the tradeoff between the two competing objectives. This modelling methodology is
used to identify the inflection point of the objective function of the two variables - the optimal number of clusters that
will accommodate both the current data set (approximation) and future data sets (generalization). Refer to figure 1.2
below for the conceptual visual of the modelling methodology used for identifying the optimal number of clusters.
Figure 1.2 Cluster Modeling – Identification of Number of Clusters
The basic approach to CLARA clustering is to partition objects/observations into several similar subsets. Data is
partitioned based on ‘Euclidean’ distance to a common data point (called a medoid). Medoid, rather than being a
calculated quantity (as it is the case with “mean”), is a data point in the cluster which happens to have the minimal
average dissimilarity to all other data points assigned to the same cluster. Euclidean distance is the most common
measure of dissimilarity. The advantage of using medoid-based cluster analysis is the fact that no assumption is made
about the structure of the data. In the case of mean-based cluster analysis, however, one makes the implicit restrictive
assumption that the data follows a Gaussian (bell-shape) distribution.
The next step is to determine the number of dimensions for parameter threshold analysis and to translate the
transactional data into ‘events’. An event is defined as a unique combination of all parameters for the identified scenario
or rule. The full transactional data set is translated into a population of events. Event bands are formed based on the
distribution of total events within the clusters. Event bands can be thought of as the boundaries between the clusters
(such that one or more parameters exhibit similarity).
Event Banding with One Parameter
When a scenario only has one tunable parameter (such as ‘amount’), bands for this parameter are ideally generated in 5%
increments beginning at the 50th percentile, resulting in six bands – P50, P55, P60, P65, P70, P75, P80, P85, P90, and
P95. The 50th percentile is chosen as a starting point to allow room for adjustment towards a more conservative
cluster/threshold, pending the results of the qualitative analysis. In other words, it is important to include clusters well
below, but still within reasonable consideration to the target threshold definition of transaction activity that will be
considered quantitatively suspicious. Refer to Figure 1.3 below.
4. Figure 1.3 85
th
Percentile and BTL/ATL
Some parameters such as ‘transaction count’ have a discrete range of values, and therefore the bands may not be able to
be established exactly at the desired percentile level. In these cases, judgment is necessary to establish reasonable bands.
Depending on the values of the bands, they will often be rounded to nearby numbers of a similar order of magnitude
but that are more easily socialized with internal and external business partners. Each of these bands corresponds to a
parameter value to be tested as a prospective threshold for the scenario.
If the six clusters have ranges that are drastically different from one another, adjustment to the bands may be necessary
to make the clusters more reasonable while still maintaining a relatively evenly distributed volume across the event
bands. This process is subjective and will differ from scenario-to-scenario, especially in cases where a specific value for a
parameter is inherent in the essence of the rule (e.g., $10,000 for cash structuring). In many cases the nature of the
customer segment and activity being monitored may support creating fewer than 6 event bands due to the lack of
volume of activity for that segment.
Figure 1.4 Event Banding of 1 Parameter ‘Amount’
Event Banding with Two Parameters
When a scenario has two tunable parameters (such as ‘count’ and ‘amount’), two independent sets of bands need to be
established for each parameter, similar to the method used for one parameter.
5. Analysis of two tunable parameters may be thought of as ‘two-dimensional’, whereas one parameter event banding is
based only on a single parameter (one axis), event banding with two parameters is affected by two axes (x & y axis). For
example, ‘count’ may represent the x-axis, while ‘amount’ may represent the y-axis. In this sense, the ultimate threshold
is determined by a combination of both axes, and so are the event bands. Including additional parameters will likewise
add additional dimensions and complexity.
As discussed above, while the 85th percentile is used to determine the threshold line, bands are created through
clustering techniques starting at the 50th percentile to account for those data points below, but still within reasonable
consideration to the target threshold definition of transaction activity that will be considered quantitatively suspicious. In
the diagram below, we see banding between two parameters, count and value. Once the data is clustered, the 85th
percentile is identified per the distribution (upper right hand table in Figure 1.5 below) and qualitative judgement is
exercised in order to set exact thresholds within the range that creates a model conducive for re-performance (Refer
above to discussion on “generalization” in the discussion of clustering modelling).
Figure 1.5 Event Banding of 2 Parameters ‘Value’/‘Count’
Event Banding with more than Two Parameters
When the scenario has more than two tunable parameters (such as count, amount and standard deviation), more than
two independent sets of bands need to be established for each parameter, similarly to the method used for two
parameters.
Select Threshold(s)
The output of phase three is the ‘threshold’, or ‘event’ characteristics (combination of thresholds based in the case of
multiple parameters) which serve as the baseline for ‘suspicious’ activity. Too many alerts may be generated which
creates extraneous noise and strains BSA/AML investigative resources. Conversely if the threshold is set too high,
suspicious activity may not generate alerts.
6. Phase 4 | Sampling. Analytics applies the thresholds determined during the quantitative analysis phase to the
historical data in order to identify potential events for Above-the-Line (ATL) and Below-the-Line (BTL) analysis.
These indicators, when flagged as ‘ATL’ are essentially the same thing as alerts, except since they are applied using
historical data they are referred to as ‘pseudo alerts’. The number of transactions which fall into the ATL or BTL
category will determine the number of random samples required for a statistically significant qualitative assessment.
The purpose of the samples is to evaluate the efficacy of Analytics’ calculated thresholds. In other words, if the
threshold is appropriately tuned, then a larger percentage of events marked ‘ATL’ should be classified as ‘suspicious’
by an independent FIU investigator compared to the ‘BTL’ events. Analytics then packages these sample ATL and
BTL transactions into a format which is understandable and readable by an FIU investigator (samples must include
the transactional detail fields required for FIU to determine the nature of the transactions).
Phase 5 | Training. Analytics orients the FIU investigators to the scenario, parameters and overall intent/spirit of
each rule so that during the qualitative analysis phase, the FIU investigators render appropriate independent
judgements for ATL and BTL samples.
Phase 6 | Qualitative Analysis. FIU assesses the sampled transactions from a qualitative perspective. During this
phase, an independent FIU investigator analyzes each sampled pseudo alert as they would treat real alerts (without
any bias regardless of the alert’s classification as ATL or BTL). The investigator’s evaluation must include
consideration for the intent of each rule, and should include an assessment of both the qualitative and quantitative
fields associated with each alert. The FIU investigator will generally evaluate each transaction through a lens akin to
“Given what is known from KYC, origin/destination of funds, beneficiary, etcetera, is it explainable that this
consumer/entity would transact this dollar amount at this ...frequency, velocity, pattern etc...” FIU provides
feedback to Analytics for each pseudo alert classified as (a) ‘Escalate-to-Case’ (b) ‘Alert Cleared – No Investigation
Required (false positive)’, (c) ‘Alert Cleared – Error, or (d) ‘Insufficient Information’. If the efficacy is deemed
appropriate, then Business Review Session is scheduled to vote the rule into production.
Phase 7 | Business Review Session. PO, Analytics and FIU present their findings for business review to voting
members.
Phase 8 | Implementation. Analytics provides functional specifications to IT to implement the scenario within
Transaction Monitoring System.