4. Why Explainability?
• More use of ML/AI models by laypersons.
• Laypersons need explanations
• Developers also need quick explanations to debug models
faster
• There may be a legal need for explanations:
• If you deny someone a loan, you may need to explain the
reason for the denial.
7. Explainability vs Performance
Tradeoff
• Some machine learning models are more explainable than
others.
Performance
Explainability
Deep learning models
Linear Models
DecisionTrees
9. What Features?
Interpretable Features
• We need interpretable features.
• Difficult for laypersons to understand raw feature spaces (e.g.
word embeddings)
• Humans are good at understanding presence or absence of
components.
10. Interpretable Instance
• E.g.
• For Text:
• Convert to a binary vector indicating presence or absence
of words
• For images
• Convert to a binary vector indicating presence or absence
of pixels or contiguous regions.
11. Method 1: LIME
From
https://github.com/marcotcr/lime
Locally Interpretable Model-agnostic
Explanations
Ribeiro, M.T., Singh, S. and Guestrin, C., 2016, August. Why Should I Trust
You?: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd
ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining (pp. 1135-1144). ACM.
12. Method 1: LIME
Any classifier
1 1 0 1 1 0 1 0 0 1 0
0 0 0 1 0 1 1 1 1 0 1
-2.1 1.1 -0.5 2.2 -1.2 -1.5 1 -3 0.8 5.6 1.5
Weights for the linear classifier then
give us feature importances
Binary vectors
-2.1 2.2 -3 5.6
Enforce
sparsity
15. Ribeiro, M.T., Singh, S. and Guestrin, C., 2016, August. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144). ACM.
Explanations for Multi-Label
Classifiers
16. Ribeiro, M.T., Singh, S. and Guestrin, C., 2016, August. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144). ACM.
Using LIME for Debugging (E.g. 1)
19. Method 2: SHAP
Unifies many different feature attribution methods and has some
desirable properties.
1. LIME
2. Integrated Gradients
3. Shapley values
4. DeepLift
Lundberg, S.M. and Lee, S.I., 2017. A unified approach to interpreting model predictions. In
Advances in Neural Information Processing Systems (pp. 4765-4774).
20. Method 2: SHAP
• Derives from game-theoretic foundations.
• Shapley values used in game theory to assign values to players
in cooperative games.
21. What are Shapley values?
• Suppose there is a set S of N players
participating in a game with payoff for any S
subset of players participating in the game
given by:
• Shapley values provide one fair
way of dividing up the total
payoff among the N players.
22. ShapleyValue
Payoff for the group
including player i
Shapley value for player i
Payoff for a group without player i
24. SHAP Implementation
(https://github.com/slundberg/shap)
Different kinds of explainers:
1. TreeExplainer: fast and exact SHAP values for tree ensembles
2. KernelExplainer: approximate explainer for black box estimators
3. DeepExplainer: high-speed approximate explainer for deep learning models.
4. ExpectedGradients: SHAP-based extension of integrated gradients
25. XGBoost on UCI Income Dataset
Output is probability of income
over 50k
f87
f23
f23 f3
f34
f41
Base ValueOutput
27. Is This Form of Explainability
Enough?
• Explainability does not provide us with recourse.
• Recourse: Information needed to change a specific prediction to a
desired value.
• “If you had paid your credit card balance in full for the last three
months, you would have got that loan.”
28. Issues with SHAP and LIME
SHAP and LIME values are highly variable for instances that are very similar for
non-linear models.
On the Robustness of Interpretability Methods
https://arxiv.org/abs/1806.08049
29. Issues with SHAP and LIME
SHAP and LIME values are highly variable for instances that are very similar for
non-linear models.
On the Robustness of Interpretability Methods
https://arxiv.org/abs/1806.08049
30. Issues with SHAP and LIME
SHAP and LIME values don’t provide insight into how the model will behave on new instances.
https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16982
High-Precision Model-Agnostic Explanations
31. Take-home message
• Explainability is possible need not come at the cost of
performance.
• Explainability is not enough
• Recourse, etc.
33. Fairness and Bias in Machine
Learning
1. Bias in this context is unfairness (more or less).
2. Note we are not talking about standard statistical bias in machine
learning (the bias in the bias vs. variance tradeoff).
3. For completeness, this is one definition of statistical bias in machine
learning.
• Bias = Expected value of model - true value
34. Definitions of Fairness or Bias
1. Many, many, many definitions exists.
2. Application dependent. No one definition is better.
3. See “21 Definitions of Fairness” tutorial by Arvind Narayanan,ACM
2018 FAT*.
1. Key Point: Dozens of definitions exist (and not just 21)
35. Setting
1. Classifier C with binary output d in {+, -}, a real-valued score s.
1. Instances or data points are generally humans.
2. The + class is desired and the negative - class is not desired.
2. Input X, and
1. one or more sensitive/protected attribute G (e.g. gender) that are part
of the input. E.g. Possible values of G = {m, f}
3. A set of instances sharing a common sensitive attribute is privileged
(receives more + labels).The other is unprivileged (receives less + labels)
4. True output Y
36. 1. Fairness through
Unawareness
• Simple Idea: Do not consider any sensitive attributes when
building the model.
• Advantage: Some support in the law (disparate treatment)?
• Disadvantage:: Other attributes may be correlated with
sensitive attributes (such as job history, geographical location
etc.)
37. 2. Statistical Parity Difference
• Different groups should have the same proportion (or
probability) of positive and negative labels. Ideally the below
value should be close to zero:
• Advantages: Legal support in the form of a rule known as the fourth-fifths rule. May remove
historical bias.
• Disadvantages:
• Trivial classifiers such as classifiers which randomly assign the same of proportion of labels
across different groups satisfy this definition.
• Perfect classifier Y = d may not be allowed if ground truth rates of labels are different across
groups.
38. 3. Equal Opportunity
Difference
• Different groups have the same true positive rate. Ideally the
below value should be close to zero:
• Advantages:
• Perfect classifier allowed.
• Disadvantages:
• May perpetuate historical biases.
• E.g. Hiring application with 100 privileged and 100 unprivileged, but 40 qualified in privileged and 4 in unprivileged.
• By hiring 20 and 2 from each privileged and unprivileged you will satisfy this.
39. 4. False Negative Error
Balance
• If the application is punitive in nature
• Different groups should have the same false negative scores.
• Example:
• The proportion of black defendants who don’t recidivate and receive high risk
scores
Should be the same as
• The proportion of white defendants who don’t recidivate and receive high risk
scores.
41. Impossibility Results
• Core of the debate in COMPAS.
• ProPublica: false negatives should be the same across
different groups
• Northpointe: scores should have the same meaning across
groups. (test fairness)
• Result: If prevalence rates (ground truth proportion of labels
across different groups) are different, and if test fairness is
satisfied then false negatives will differ across groups.
Chouldechova, A., 2017. Fair prediction with disparate impact: A study of bias in recidivism
prediction instruments. Big data, 5(2), pp.153-163.
42. Tools for Measuring Bias
https://github.com/IBM/AIF360
AI Fairness 360 (AIF 360):
Measuring Bias
43. Mitigation: Removing Bias
• Mitigation can be happen in three different places:
• Before the model is built, in the training data
• In the model
• After the model is built, with the predictions:
45. Before the model is built
• Reweighing (roughly at a high-level):
• Increase weights for some
• Unprivileged with positive labels
• Privileged with negative labels
• Decrease weights for some
• Unprivileged with negative labels
• Privileged with positive labels
+ -
- +
47. In the model
Zhang, B.H., Lemoine, B. and Mitchell, M., 2018, December. Mitigating
unwanted biases with adversarial learning. In Proceedings of the 2018
AAAI/ACM Conference on AI, Ethics, and Society (pp. 335-340). ACM.
49. After the model is built
• Reject option classification:
• Assume the classifier outputs a probability score.
• If the classifier score is within a small band around 0.5:
• If unprivileged then predict positive
• If privileged then predict negative
Probability of + label for
unprivileged
0 1
0
1
Probability of - label for
unprivileged
52. Take-home message
• Many forms of fairness and bias exist: most of them are
incompatible with each other.
• Bias can be decreased with algorithms (with usually some
loss in performance)