Manipulating and measuring model interpretability

Manipulating and Measuring Model Interpretability
Microsoft Research NYC
Forough Poursabzi-
Sangdeh
Dan Goldstein Jake Hofman Jenn Wortman
Vaughan
Hanna Wallach

u = k(x, u)
INTERPRETABLE MACHINE LEARNING

u = k(x, u)
e.g., generalized additive models
Lou et al. 2012 and 2013
Simple models

u = k(x, u)
e.g., LIME
Ribiero et al. 2016
Post-hoc explanations
e.g., generalized additive models
Lou et al. 2012 and 2013
Simple models

INTERPRETABILITY?
u = k(x, u)
• What makes a model or explanation interpretable?

DIFFERENT SCENARIOS, DIFFERENT PEOPLE, DIFFERENT NEEDS
u = k(x, u)
Explain a
prediction
Understand
model
Make better
decisions
Debug
model
De-bias
model
Inspire trust
CEOs Approach A
Data
scientists
Approach C
Laypeople
Regulators Approach B

Interpretability
INTERPRETABILITY AS A LATENT PROPERTY

Interpretability
number of features
linearity
black-box vs. clear
visualizations
types of features
…

Interpretability
number of features
linearity
black-box vs. clear
visualizations
types of features
… …
trust
ability to debug
ability to simulate
ability to explain
ability to detect mistakes

Interpretability
number of features
linearity
black-box vs. clear
visualizations
types of features
…
properties of model and
system design
…
trust
ability to debug
ability to simulate
ability to explain

Interpretability
number of features
linearity
black-box vs. clear
visualizations
types of features
…
properties of human
behavior
system design
…
trust
ability to debug
ability to simulate
ability to explain

Interpretability
number of features
linearity
black-box vs. clear
visualizations
types of features
…
properties of human
behavior
We need interdisciplinary approaches
system design
…
trust
ability to debug
ability to simulate
ability to explain

Interpretability
FOCUS ON LAYPEOPLE
number of features
linearity
black-box vs. clear
visualizations
types of features
…
properties of human
behavior
Randomized human-subject experiments
system design
…
trust
ability to debug
ability to simulate
ability to explain

USER EXPERIMENT, PREDICTIVE TASK
u = k(x, u)
• Predict the price of apartments in NYC with the help of a model

EXPERIMENTAL CONDITIONS
CLEAR-2 feature BB-2 feature

TIGHTLY CONTROLLED EXPERIMENTS

USER INTERFACE AND INTERACTIONS
u = k(x, u)
• Training phase: participants get familiar with the model
• Testing phase step 1: simulate the model’s prediction
Simulate the model

u = k(x, u)
• Testing phase step 2: observe the model’s prediction and guess the price
Predict actual selling price

PRE-REGISTERED HYPOTHESES
u = k(x, u)
• CLEAR-2 feature will be easiest for participants to simulate
• Participants will trust CLEAR-2 feature more than BB-8 feature
• Participants’ behaviors will vary when they see unusual examples where the model makes
inaccurate predictions
https://aspredicted.org/xy5s6.pdf

SIMULATION ERROR
u = k(x, u)
CLEAR-2 feature will be easiest for participants to simulate

SIMULATION ERROR
u = k(x, u)
m
$um

SIMULATION ERROR
u = k(x, u)
Simulation error
CLEAR−2 CLEAR−8 BB−2 BB−8
$0k
$100k
$200k
Meansimulationerror
m
$um

TRUST (DEVIATION FROM THE MODEL)
Participants will trust CLEAR-2 feature more than BB-8 feature

m
$ua

Deviation
$0k
$50k
$100k
$150k
Meandeviationfromthemodel
m
$ua

DETECTION OF MISTAKES
Participants’ behaviors will vary when they see unusual examples where the model makes

m
$ua

Apartment 12: 1 bed, 3 bath
$0k
$50k
$100k
$150k
$200k
$250k
$300k
forapartment12 m
$ua

Apartment 12: 1 bed, 3 bath
$0k
$50k
$100k
$150k
$200k
$250k
$300k
forapartment12 m
$ua
When participants see unusual examples, they are less likely to correct inaccurate
predictions made by clear models than black-box models

CONJECTURE: ANCHORING EFFECT
User’s simulation of the model’s prediction

u = k(x, u)
• We remove potential anchors

PRE-REGISTERED HYPOTHESES
u = k(x, u)
• Explicit attention checks on unusual inputs will aﬀect participants’ abilities in detecting
model’s mistakes
• Model transparency aﬀects participants’ abilities in detecting model’s mistakes, both with
and without attention checks
https://aspredicted.org/5xy8y.pdf

Apartment 6: 1 bed, 3 bath, 726 sq ft Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantpredictio
Model's prediction CLEAR BB
Apartment 6: 1 bed, 3 bath Apartment 8: 1 bed, 3 bath, 350 sq ft
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
Meanparticipantprediction

No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
• No attention checks: clear models lower users’ ability to correct model’s
mistakes
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M

No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
• Attention checks improve users’ ability to correct model’s mistakes
mistakes
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M

No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M
• Attention checks improve users’ ability to correct model’s mistakes
mistakes
• With attention checks, there is no diﬀerence between clear and black-box
No attention
check
With attention
check
No attention
check
With attention
check
$0M
$0.5M
$1M
$1.5M

SUMMARY OF RESULTS
u = k(x, u)
• A clear model with a small number of features is easier for participants to simulate
- People have a better understanding of simple and transparent models
• No signiﬁcant diﬀerence in participants’ trust in the model
- Contrary to intuition, people do not necessarily trust simple and transparent models
more
• Participants were less able to correct inaccurate predictions of a clear model than a black-
box model
- Too much transparency can be harmful
- Design implications (e.g., highlighting unusual inputs, display model internals on
demand)

• Interpretability is not a purely computational problem
- We need interdisciplinary research to understand interpretability
• Our surprising results underscore that interpretability research is much more complicated
- We need more empirical studies
- Other scenarios, domains, models, factors, outcomes
TAKEAWAYS

u = k(x, u)
https://csel.cs.colorado.edu/~fopo5620/
forough.poursabzi@microsoft.com
Thanks!

Manipulating and measuring model interpretability

Recomendados

Recomendados

Más contenido relacionado

Similar a Manipulating and measuring model interpretability

Similar a Manipulating and measuring model interpretability (20)

Más de MLconf

Más de MLconf (20)

Último

Último (20)

Manipulating and measuring model interpretability