Shapley algorithm is an interpretation algorithm that is well-recognized by both the industry and academia. However, given its exponential runtime complexity and existing implementations taking a very long time to generate feature contributions for a single instance, it has found limited practical use in the industry.
4. Introduction
Cristine Dewar
is an applied machine learning scientist on Affirm’s
fraud ML team.
Xiang Huang
is an applied machine learning scientist working on
underwriting problems for Affirm.
5. Introduction
Affirm offers point of sale loans for our customers.
Our applied machine learning team creates models for credit risk
decisioning and for fraud detection and builds recommendation systems
to personalize a customer’s experience.
6. Introduction
For both fraud and credit, it is extremely important to be able to have a
model that is fair and interpretable.
7. Introduction
We have millions of rows of data and hundreds of features.
We need a solution that allows us to interpret how our models are
impacting individual users at scale and can serve up results quickly.
8. What we need
We need a solution that:
▪ Allows us to interpret the effect of features on individual users
▪ Does so in a timely manner
9. What we need
In cooperative game theory, there is a problem of how to allocate the
surplus of resources generated by the cooperation of the players.
10. What we need
We want the following properties when allocating marginal contribution
to players:
▪ Symmetry
▪ Dummy
▪ Additivity
▪ Efficiency
11. What we need
Symmetry - two players that contribute equally will be paid out equally
12. What we need
Dummy - a player that does not contribute will have a value of zero
13. Additivity - the player’s average marginal contribution of each game is
the same as evaluating that player on the entire season
What we need
.3 .2 .25 .25
Avg( ) =
14. What we need
Efficiency - marginal contributions for each feature summed with the
average prediction is that sample’s prediction
Average prediction .5
+ .3
- .4
- .2
+ .1
This user’s prediction .3
15. What we need
▪ Symmetry
▪ Dummy
▪ Additivity
▪ Efficiency
17. What is a Shapley value?
A Shapley value is a way to define payments proportional to each players
marginal contribution for all members of the group.
18. Four feature model example
FICO
score
Number of
delinquencies
Loan
Amount
Repaid
Affirm
19. MATH!
marginal
contribution
of feature j
score with
feature j
score prior
to adding
feature j
number of
features
this term is the fraction of
permutations with the features in
that order
Shapley value equation
place in
permutation
order
possible permutation
orders for feature j
20. Why does permutation order matter?
We are no only trying to see how well a feature works on it’s own
We are trying to measure is how well a feature collaborates
27. Make sense, sounds great
A way to get the marginal contribution for individual rows not just a
generalized feature importance.
Even with approximation, it seems super computationally expensive, how
do we deal with that?
29. Monte-Carlo approximation for Shapley value
Shparkley Implementation
Black Box Model
660
$500
Yes
2
Joe
Shapley Value for
Fico Score
30. Monte-Carlo approximation for Shapley value
Shparkley Implementation
Sampled Order
660
$500
Yes
2
Joe
700
$300
Yes
0
Sally
31. Monte-Carlo approximation for Shapley value
Shparkley Implementation
660
$500
Yes
0
700
$500
Yes
0
From
Joe
Sampled Order
660
$500
Yes
2
Joe
From
Sally
From
Joe
From
Sally
700
$300
Yes
0
Sally
32. Monte-Carlo approximation for Shapley value
Shparkley Implementation
660
$500
Yes
0
700
$500
Yes
0
From
Joe
Sampled Order
660
$500
Yes
2
Joe
From
Sally
From
Joe
From
Sally
Instance with
Joe Fico score
Instance without
Joe Fico score
700
$300
Yes
0
Sally
36. Shparkley Implementation
X
Feature set with Loan Amount
1 600 300 Yes
Shapley for
Row in partition
660 1000 0Yes
0 660 Yes1000
Permutation Order
Order
by
Order
by
instance to investigate
600 300 1No
1 600 No300
Feature set without Loan Amount
1 600 Yes1000
37. Shparkley Implementation
Marginal Contribution
from this row
0.8 - 0.7 = 0.1
Output: 0.7
Output: 0.8
Black Box Model
Row in partition
660 1000 0Yes
instance to investigate
600 300 1No
Feature set without Loan Amount
1 600 Yes1000
Feature set with Loan Amount
1 600 300 Yes
39. Shparkley Implementation
▪ Highlights
○ Spark-based implementation that scales with datasets
○ Leverage runtime advantages from batch prediction
○ Reuse predictions to calculate shapley value for all features
○ Shapley Value with weight support
41. Runtime and convergence
Runtime comparison with shap BruteForce Explainer vs. Shparkley
Feature Name Value Difference(%)
Rank
Difference
Fico Score 3.7%
0
No. of Delinquencies 1.1%
Length on credit Report 2.9%
No. of inquiries in last six months 5.1%
Loan Amount 0.4%
User has repaid Affirm 2.5%
Merchant Category 0.5%
Cluster Config
10 machines (1 master 9 workers)
Machine spec
r5.4xlarge EC2 Instance
(16 Cores, 128GB memory)
42. Conclusion
Our implementation compared to a brute force explanation:
▪ improves the runtime by 50-60x
▪ shows minimal difference in shapley values
Our open source implementation by Niloy Gupta, Isaac Joseph, Adam
Johnston, Xiang Huang, and Cristine Dewar is available at
github.com/Affirm/shparkley
44. References
● Interpretable Machine Learning: Shapley Values
● An Efficient Explanation of Individual Classifications using Game
Theory
● SHAP (SHapley Additive exPlanations)