Shparkley: Scaling Shapley with Apache Spark

Shparkley
Scaling Shapley values with Apache Spark
Cristine Dewar and Xiang Huang

Agenda
Introduction
What is a Shapley value?
How did we implement it?
How does the algorithm perform?

Introduction
Cristine Dewar
is an applied machine learning scientist on Affirm’s
fraud ML team.
Xiang Huang
is an applied machine learning scientist working on
underwriting problems for Affirm.

Introduction
Affirm offers point of sale loans for our customers.
Our applied machine learning team creates models for credit risk
decisioning and for fraud detection and builds recommendation systems
to personalize a customer’s experience.

Introduction
For both fraud and credit, it is extremely important to be able to have a
model that is fair and interpretable.

Introduction
We have millions of rows of data and hundreds of features.
We need a solution that allows us to interpret how our models are
impacting individual users at scale and can serve up results quickly.

What we need
We need a solution that:
▪ Allows us to interpret the effect of features on individual users
▪ Does so in a timely manner

What we need
In cooperative game theory, there is a problem of how to allocate the
surplus of resources generated by the cooperation of the players.

What we need
We want the following properties when allocating marginal contribution
to players:
▪ Symmetry
▪ Dummy
▪ Additivity
▪ Efficiency

What we need
Symmetry - two players that contribute equally will be paid out equally

What we need
Dummy - a player that does not contribute will have a value of zero

Additivity - the player’s average marginal contribution of each game is
the same as evaluating that player on the entire season
What we need
.3 .2 .25 .25
Avg( ) =

What we need
Efficiency - marginal contributions for each feature summed with the
average prediction is that sample’s prediction
Average prediction .5
+ .3
- .4
- .2
+ .1
This user’s prediction .3

What we need
▪ Symmetry
▪ Dummy
▪ Additivity
▪ Efficiency

What is a Shapley value?
A Shapley value is a way to deﬁne payments proportional to each players
marginal contribution for all members of the group.

Four feature model example
FICO
score
Number of
delinquencies
Loan
Amount
Repaid
Affirm

MATH!
marginal
contribution
of feature j
score with
feature j
score prior
to adding
feature j
number of
features
this term is the fraction of
permutations with the features in
that order
Shapley value equation
place in
permutation
order
possible permutation
orders for feature j

Why does permutation order matter?
We are no only trying to see how well a feature works on it’s own
We are trying to measure is how well a feature collaborates

Permutations
|S| = 1
|S| = 2
|S| = 3
|S| = 4

MATH!
marginal
contribution
of feature j
score with
feature j
score prior
to adding
feature j
number of
features
Shapley value equation
place in
permutation
order

Comparing performance
Score of
Score of
Score of
Score of
vs. score of no features
vs. score of
vs. score of
vs. score of

Approximate
Approximate by suppressing the permuted feature’s contribution by making it noise

Make sense, sounds great
A way to get the marginal contribution for individual rows not just a
generalized feature importance.
Even with approximation, it seems super computationally expensive, how
do we deal with that?

Monte-Carlo approximation for Shapley value
Shparkley Implementation
Black Box Model
660
$500
Yes
2
Joe
Shapley Value for
Fico Score

Sampled Order
660
$500
Yes
2
Joe
700
$300
Yes
0
Sally

660
$500
Yes
0
700
$500
Yes
0
From
Joe
Sampled Order
660
$500
Yes
2
Joe
From
Sally
From
Joe
From
Sally
700
$300
Yes
0
Sally

660
$500
Yes
0
700
$500
Yes
0
From
Joe
Sampled Order
660
$500
Yes
2
Joe
From
Sally
From
Joe
From
Sally
Instance with
Joe Fico score
Instance without
Joe Fico score
700
$300
Yes
0
Sally

Data
...
X
X
partition
...
Sampled background dataset
X
instance to investigate
600 300 1No
broadcast

X
Shapley for
For each row
Row in partition
660 1000 0Yes
600 300 1No

X
Shapley for
Permutation Order
Order
by
Order
by
Row in partition
660 1000 0Yes
0 660 Yes1000
600 300 1No
1 600 No300

X
Feature set with Loan Amount
1 600 300 Yes
Shapley for
Row in partition
660 1000 0Yes
0 660 Yes1000
Permutation Order
Order
by
Order
by
600 300 1No
1 600 No300
Feature set without Loan Amount
1 600 Yes1000

Marginal Contribution
from this row
0.8 - 0.7 = 0.1
Output: 0.7
Output: 0.8
Black Box Model
Row in partition
660 1000 0Yes
600 300 1No
Feature set without Loan Amount
1 600 Yes1000
Feature set with Loan Amount
1 600 300 Yes

groupby(feature).agg(...
feature
Shapleyvalue
weighted mean:
MC Feature
Marginal Contribution(MC)
...
MC Feature

▪ Highlights
○ Spark-based implementation that scales with datasets
○ Leverage runtime advantages from batch prediction
○ Reuse predictions to calculate shapley value for all features
○ Shapley Value with weight support

Runtime and convergence
Shparkley convergence for monte carlo error

Runtime and convergence
Runtime comparison with shap BruteForce Explainer vs. Shparkley
Feature Name Value Difference(%)
Rank
Difference
Fico Score 3.7%
0
No. of Delinquencies 1.1%
Length on credit Report 2.9%
No. of inquiries in last six months 5.1%
Loan Amount 0.4%
User has repaid Affirm 2.5%
Merchant Category 0.5%
Cluster Conﬁg
10 machines (1 master 9 workers)
Machine spec
r5.4xlarge EC2 Instance
(16 Cores, 128GB memory)

Conclusion
Our implementation compared to a brute force explanation:
▪ improves the runtime by 50-60x
▪ shows minimal difference in shapley values
Our open source implementation by Niloy Gupta, Isaac Joseph, Adam
Johnston, Xiang Huang, and Cristine Dewar is available at
github.com/Affirm/shparkley

References
● Interpretable Machine Learning: Shapley Values
● An Efficient Explanation of Individual Classiﬁcations using Game
Theory
● SHAP (SHapley Additive exPlanations)

Shparkley: Scaling Shapley with Apache Spark

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Shparkley: Scaling Shapley with Apache Spark

Similar a Shparkley: Scaling Shapley with Apache Spark (20)

Más de Databricks

Más de Databricks (20)

Último

Último (20)

Shparkley: Scaling Shapley with Apache Spark