SlideShare una empresa de Scribd logo
1 de 86
Descargar para leer sin conexión
Explainability
for NLP
Isabelle Augenstein*
augenstein@di.ku.dk
@IAugenstein
http://isabelleaugenstein.github.io/
*partial slide credit: Pepa Atanasova, Nils Rethmeier
ALPS Winter School
22 January 2021
Explainability – what is it and why do we need it?
Architect Trainer End User
Terminology borrowed from Strobelt et al. (2018), “LSTMVis: A tool for visual analysis of hidden state dynamics in recurrent
neural networks. IEEE transactions on visualization and computer graphics.”
true
Explainability – what is it and why do we need it?
Claim: “In the COVID-19 crisis, ‘only 20%
of African Americans had jobs where they
could work from home.’”
Evidence: “20% of black workers said they
could work from home in their primary
job, compared to 30% of white workers.”
Claim: “Children don’t seem to be getting
this virus.”
Evidence: “There have been no reported
incidents of infection in children.”
Claim: “Taylor Swift had a fatal car
accident.”
Reason: overfitting to spurious patterns
(celebrity death hoaxes are common)
Claim: “Michael Jackson is still alive,
appears in daughter’s selfie.”
Reason: overfitting to spurious patterns
(celebrity death hoaxes are common)
Right prediction Wrong prediction
Right reasons
Wrong reasons
Explainability – what is it and why do we need it?
Architect Trainer End User
?
true
Explainability – what is it and why do we need it?
Architect Trainer End User
?
true
Explainability – what is it and why do we need it?
Architect Trainer End User
?
true
• Model Understanding
• What features and parameters has a model learned?
• How do these features and parameters relate to model outputs
generally?
• Decision Understanding
• How does the model arrive at predictions for specific instances?
• Which features and parameters influence a specific prediction?
Types of Explainability
• Black Box Explainability Methods
• No access to the model parameters, only predictions
• Observing output changes via different inputs
• White Box Explainability Methods
• Access to the model features and parameters
• Correlating outputs with specific features and parameters
Types of Explainability
• Joint Explainability Methods
• Explanation produced jointly with target task
• Post-Hoc Explainability Methods
• Explanation produced for a trained model
Types of Explainability
• Model Understanding
• What features and parameters has a model learned?
• Methods:
• Feature visualisation methods (white box)
• Adversarial examples (black or white box)
• Decision Understanding
• How does the model arrive at predictions for specific instances?
• Methods
• Probing tasks (black or white box)
• Correlating inputs with gradients/attention weights/etc. (white box)
• Generating text explaining predictions (white box)
Types of Explainability
Overview of Today’s Talk
• Introduction
• Explainability – what is it and why do we need it?
• Part 1: Decision understanding
• Instance-level explainability for text classification and fact checking
• Language generation based explanations
• Evaluating instance-level explanations
• Part 2: Model understanding
• Model-wide explainability for text classification and fact checking
• Finding model-wide explanations
• Visualising model-wide explanations
Part 1:
Decision Understanding
12
Generating Fact Checking
Explanations
Pepa Atanasova, Jakob Grue Simonsen,
Christina Lioma, Isabelle Augenstein
ACL 2020
13
Fact Checking
https://www.politifact.com/factchecks/2020/apr/30/donald-trump/trumps-boasts-testing-spring-cherry-picking-data/
Terminology
https://www.politifact.com/factchecks/2020/apr/30/donald-trump/trumps-boasts-testing-spring-cherry-picking-data/
Automating Fact Checking
Statement: “The last quarter, it was just
announced, our gross domestic product was below
zero. Who ever heard of this? Its never below zero.”
Speaker: Donald Trump
Context: presidential announcement speech
Label: Pants on Fire
Wang, William Yang. "“Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection." Proceedings ACL’2017.
Automating Fact Checking
Claim:
We’ve tested more than every
country combined.
Metadata:
topic=healthcare, Coronavirus,
speaker=Donald Trump,
job=president,
context=White House briefing,
history={4, 10, 14, 21, 34, 14}
Hybrid CNN Veracity Label
0,204 0,208
0,247 0,274
0
0,1
0,2
0,3
Validation Test
Fact checking macro F1
score
Majority Wang et. al
Wang, William Yang. "“Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection." Proceedings ACL’2017.
Automating Fact Checking
Alhindi, Tariq, Savvas Petridis, and Smaranda Muresan. "Where is your Evidence: Improving Fact-checking by Justification
Modeling." Proceedings of FEVER’2018.
Claim:
We’ve tested more than every country
combined.
Justification:
Trump claimed that the United States has
"tested more than every country combined."
There is no reasonable way to conclude that the
American system has run more diagnostics than
"all other major countries combined." Just by
adding up a few other nations’ totals, you can
quickly see Trump’s claim fall apart.
Plus, focusing on the 5 million figure distracts
from the real issue — by any meaningful metric
of diagnosing and tracking, the United States is
still well behind countries like Germany and
Canada.
The president’s claim is not only inaccurate but
also ridiculous. We rate it Pants on Fire!
Logistic Regression Veracity Label
0,204 0,208
0,247 0,274
0,370 0,370
0
0,2
0,4
Validation Test
Fact checking macro F1
score
Majority Wang et. al Alhindi et. Al
How about generating an explanation?
Claim:
We’ve tested more than every country
combined.
Justification:
Trump claimed that the United States has
"tested more than every country combined."
There is no reasonable way to conclude that the
American system has run more diagnostics than
"all other major countries combined." Just by
adding up a few other nations’ totals, you can
quickly see Trump’s claim fall apart.
Plus, focusing on the 5 million figure distracts
from the real issue — by any meaningful metric
of diagnosing and tracking, the United States is
still well behind countries like Germany and
Canada.
The president’s claim is not only inaccurate but
also ridiculous. We rate it Pants on Fire!
Model
Veracity Label
Explanation
Generating Explanations from Ruling Comments
Claim:
We’ve tested more than every country
combined.
Ruling Comments:
Responding to weeks of criticism over his administration’s
COVID-19 response, President Donald Trump claimed at a
White House briefing that the United States has well
surpassed other countries in testing people for the virus.
"We’ve tested more than every country combined,"
Trump said April 27 […] We emailed the White House for
comment but never heard back, so we turned to the data.
Trump’s claim didn’t stand up to scrutiny.
In raw numbers, the United States has tested more
people than any other individual country — but
nowhere near more than "every country combined" or,
as he said in his tweet, more than "all major countries
combined.”[…] The United States has a far bigger
population than many of the "major countries" Trump often
mentions. So it could have run far more tests but still
have a much larger burden ahead than do nations like
Germany, France or Canada.[…]
Joint Model
Veracity Label
Justification/
Explanation
Related Studies on Generating Explanations
• Camburu et. al; Rajani et. al generate abstractive
explanations
• Short input text and explanations;
• Large amount of annotated data.
• Real world fact checking datasets are of limited size and
the input consists of long documents
• We take advantage of the LIAR-PLUS dataset:
• Use the summary of the ruling comments as a gold explanation;
• Formulate the problem as extractive summarization.
• Camburu, Oana-Maria, Tim Rocktäschel, Thomas Lukasiewicz, and Phil Blunsom. "e-SNLI: Natural language inference with natural language
explanations." In Advances in Neural Information Processing Systems,. 2018.
• Rajani, Nazneen Fatema, Bryan McCann, Caiming Xiong, and Richard Socher. "Explain Yourself! Leveraging Language Models for
Commonsense Reasoning." In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4932-4942. 2019.
Example of an Oracle’s Gold Summary
Claim: “The president promised that if he spent money on a stimulus program that
unemployment would go to 5.7 percent or 6 percent. Those were his words.”
Label: Mostly-False
Just: Bramnick said “the president promised that if he spent money on a stimulus
program that unemployment would go to 5.7 percent or 6 percent. Those werehis
words.” Two economic advisers estimated in a 2009 report that with the stimulus
plan, the unemployment rate would peak near 8 percent before dropping to less than
6 percent by now. Those are critical details Bramnick’s statement ignores. To comment
on this ruling, go to NJ.com.
Oracle: “The president promised that if he spent money on a stimulus program that
unemployment would go to 5.7 percent or 6 percent. Those were his
words,”Bramnick said in a Sept. 7 interview on NJToday. But with the stimulus plan,
the report projected the nation’s jobless rate would peak near 8 percent in 2009
before falling to about 5.5 percent by now. So the estimates in the report were
wrong.
Which sentences should be selected for the
explanation?
What is the veracity of the claim?
Joint Explanation and Veracity Prediction
Ruder, Sebastian, et al. "Latent multi-task architecture learning." Proceedings of the AAAI’2019.
Cross-stitch layer
Multi-task objective
Manual Evaluation
• Explanation Quality
• Coverage. The explanation contains important, salient
information and does not miss any important points that
contribute to the fact check.
• Non-redundancy. The summary does not contain any
information that is redundant/repeated/not relevant to the
claim and the fact check.
• Non-contradiction. The summary does not contain any pieces
of information contradictory to the claim and the fact check.
• Overall. Rank the explanations by their overall quality.
• Explanation Informativeness. Provide a veracity label
for a claim based on a veracity explanation coming
from the justification, the Explain-MT, or the Explain-
Extractive system.
Explanation Quality
1,48 1,48 1,45
1,58
1,89
1,75
1,40
2,03
1,68
1,79
1,48
1,90
-
0,50
1,00
1,50
2,00
2,50
Coverage Non-redundancy Non-contradicton Overall
Mean Average Rank (MAR).
Lower MAR is better! (higher rank)
Justification Extractive Extractive MT
Explanation Informativeness
-
0,100
0,200
0,300
0,400
0,500
Agree-Correct Agree-Not Correct Agree-Nost
Sufficient
Disagree
Manual veracity labelling, given a particular
explanation as percentages of the dis/agreeing
annotator predictions.
Justification Extractive Extractive-MT
Summary
• First study on generating veracity explanations
• Jointly training veracity prediction and explanation
• improves the performance of the classification system
• improves the coverage and overall performance of the
generated explanations
Future Work
• Can we generate better or even abstractive
explanations given limited resources?
• How to automatically evaluate the properties of the
explanations?
• Can explanations be extracted from evidence pages
only (lots of irrelevant and multi-modal results)?
A Diagnostic Study of Explainability
Techniques for Text Classification
Pepa Atanasova, Jakob Grue Simonsen,
Christina Lioma, Isabelle Augenstein
EMNLP 2020
31
Explainability Datasets for Decision Understanding
Post-Hoc Explainability Methods for Decision Understanding
Example: Twitter Sentiment Extraction (TSE)
• How can explainability methods be evaluated?
• Proposal: set of diagnostic properties
• What are characteristics of different explainability methods?
• How do explanations for models with different architectures differ?
• How do automatically and manually generated explanations differ?
Post-Hoc Explainability for Decision Understanding:
Research Questions
• Compute gradient of input w.r.t. output
• Gradient is computed for each element of vector
• Different aggregation methods used to produce one score per input
token (mean average, L2 norm aggregation)
• Common approaches
• Saliency
• see above
• InputX-Gradient
• additionally multiplies gradient with input
• Guided Backpropagation
• over-writes the gradients of ReLU functions so that only non-negative gradients are
backpropagated
Post-Hoc Explainability Methods for Decision Understanding:
Gradient-Based Approaches
• Replace tokens in input with other tokens to compute their relative
contributions
• Common approaches
• Occlusion
• replaces each token with a baseline token and measures change in output
• Shapley Value Sampling
• computes average marginal contribution of each word across word perturbations
Post-Hoc Explainability Methods for Decision Understanding:
Perturbation-Based Approaches
• Train local linear models to approximate local decision boundaries
• Common approaches
• LIME
• train one linear model per instance
Post-Hoc Explainability Methods for Decision Understanding:
Simplification-Based Approaches
• Agreement with Human Rationales (HA)
• Degree of overlap between human and automatic saliency scores
• Confidence Indication (CI)
• Predictive power of produced explanations for model’s confidence
• Faithfulness (F)
• Mask most salient tokens, measure drop in performance
• Rationale Consistency (RC)
• Difference between explanations for models trained with different
random seeds, with model with random weights
• Dataset Consistency (DC)
• Difference between explanations for similar instances
Post-Hoc Explainability Methods for Decision Understanding:
Diagnostic Properties
Post-Hoc Explainability Methods for Decision Understanding:
Selected Results
Spider chart for Transformer model on e-SNLI
HA: Agreement with
human rationales
CI: Confidence indication
F: Faithfulness
RC: Rationale Consistency
DC: Dataset Consistency
Post-Hoc Explainability Methods for Decision Understanding:
Aggregated Results
0,4
0,45
0,5
0,55
0,6
0,65
0,7
Mean of diagnostic property measures for e-SNLI
Transformer CNN LSTM
Summary
• Diagnostic properties allow to assess different aspects of
explainability techniques
• Gradient-based methods outperform perturbation-based and
simplification-based ones for most properties across model
architectures and datasets
• Exception: Shapley Value Sampling and LIME better for Confidence
Indication property
• Gradient-based methods also fastest to compute
Part 2:
Model Understanding
42
Overview of Today’s Talk
• Introduction
• Explainability – what is it and why do we need it?
• Part 1: Decision understanding
• Instance-level explainability for text classification and fact checking
• Language generation based explanations
• Evaluating instance-level explanations
• Part 2: Model understanding
• Model-wide explainability for text classification and fact checking
• Finding model-wide explanations
• Visualising model-wide explanations
Universal Adversarial Trigger
Generation for Fact Checking
Pepa Atanasova*, Dustin Wright*,
Isabelle Augenstein
EMNLP 2020
*equal contributions
44
Generating Adversarial Claims
• Fact checking models can overfit to spurious patterns
• Making the right predictions for the wrong reasons
• This leads to vulnerabilities, which can be exploited by adversaries
(e.g. agents spreading mis- and disinformation)
• How can one reveal such vulnerabilities?
• Generating instance-level explanations for fact checking models
(first part of talk)
• Generating adversarial claims (this work)
Previous Work
• Universal adversarial attacks (Gao and Oates, 2019;
Wallace et al, 2019)
• Single perturbation changes that can be applied to many instances
• Change the meaning of the input instances and thus produce
label-incoherent claims
• Are not per se semantically well-formed
• Rule-based perturbations (Riberio et al., 2018)
• Semantically well-formed, but require hand-crafting patterns
Previous Work
• FEVER 2.0 shared task (Thorne et al., 2019)
• Builders / breakers setup
• Methods of submitted systems:
• Producing claims requiring multi-hop reasoning (Niewinski et al., 2019)
• Generating adversarial claims manually (Kim and Allan, 2019)
Goals of this Work
• Generate claims fully automatically
• Preserve the meaning of the source text
• Produce semantically well-formed claims
Model
RoBERTa-based
FEVER model to
predict FC label
1) HotFlip attack
model to find triggers
2) STS auxiliary model
to preserve FC label
Claim generation
conditioned on
evidence and triggers
Examples of Generated Claims
50
EMNLP 2020 Submission ***. Confidential Review Copy. DO NOT DISTRIBUTE.
Evidence Triggers Generated Claim
SUPPORTS Claims
Since the 19th century, some
Romani have also migrated
to the Americas.
don,already,more,during,home Romani have moved to the
Americas during the 19th
century.
Cyprus is a major tourist
destination in the Mediter-
ranean .
foreign,biggest,major,every,friends Cyprus is a major tourist des-
tination.
The first Nobel Prize in
Chemistry was awarded in
1901 to Jacobus Henricus
van’t Hoff, of the Nether-
lands, “for his discovery of
the laws of chemical dynam-
ics and osmotic pressure in
solutions.”
later,already,quite,altern,whereas Henricus Van’t Hoff was al-
ready awarded the Nobel
Prize.
REFUTES Claims
California Attorney General phys,incarn,not,occasionally,something Kamala Harris did not defeat
Since the 19th century, some
Romani have also migrated
to the Americas.
don,already,more,during,home Romani have moved to the
Americas during the 19th
century.
Cyprus is a major tourist
destination in the Mediter-
ranean .
foreign,biggest,major,every,friends Cyprus is a major tourist des-
tination.
The first Nobel Prize in
Chemistry was awarded in
1901 to Jacobus Henricus
van’t Hoff, of the Nether-
lands, “for his discovery of
the laws of chemical dynam-
ics and osmotic pressure in
solutions.”
later,already,quite,altern,whereas Henricus Van’t Hoff was al-
ready awarded the Nobel
Prize.
REFUTES Claims
California Attorney General
Kamala Harris defeated
Sanchez , 61.6% to 38.4%.
phys,incarn,not,occasionally,something Kamala Harris did not defeat
Sanchez, 61.6% to 38.4%.
Uganda is in the African
Great Lakes region.
unless,endorsed,picks,pref,against Uganda is against the
African Great Lakes region.
Times Higher Education
World University Rankings
is an annual publication
of university rankings by
Times Higher Education
(THE) magazine.
interested,reward,visit,consumer,conclusion Times Higher Education
World University Rankings
is a consumer magazine.
NOT ENOUGH INFO Claims
ranean .
The first Nobel Prize in
Chemistry was awarded in
1901 to Jacobus Henricus
van’t Hoff, of the Nether-
lands, “for his discovery of
the laws of chemical dynam-
ics and osmotic pressure in
solutions.”
later,already,quite,altern,whereas Henricus Van’t Hoff was al-
ready awarded the Nobel
Prize.
REFUTES Claims
California Attorney General
Kamala Harris defeated
Sanchez , 61.6% to 38.4%.
phys,incarn,not,occasionally,something Kamala Harris did not defeat
Sanchez, 61.6% to 38.4%.
Uganda is in the African
Great Lakes region.
unless,endorsed,picks,pref,against Uganda is against the
African Great Lakes region.
Times Higher Education
World University Rankings
is an annual publication
of university rankings by
Times Higher Education
(THE) magazine.
interested,reward,visit,consumer,conclusion Times Higher Education
World University Rankings
is a consumer magazine.
NOT ENOUGH INFO Claims
The KGB was a military ser-
vice and was governed by
army laws and regulations ,
similar to the Soviet Army
or MVD Internal Troops.
nowhere,only,none,no,nothing The KGB was only con-
trolled by a military service.
The first Nobel Prize in
Chemistry was awarded in
1901 to Jacobus Henricus
van’t Hoff, of the Nether-
lands, “for his discovery of
the laws of chemical dynam-
ics and osmotic pressure in
solutions.”
later,already,quite,altern,whereas Henricus Van’t Hoff was al-
ready awarded the Nobel
Prize.
REFUTES Claims
California Attorney General
Kamala Harris defeated
Sanchez , 61.6% to 38.4%.
phys,incarn,not,occasionally,something Kamala Harris did not defeat
Sanchez, 61.6% to 38.4%.
Uganda is in the African
Great Lakes region.
unless,endorsed,picks,pref,against Uganda is against the
African Great Lakes region.
Times Higher Education
World University Rankings
is an annual publication
of university rankings by
Times Higher Education
(THE) magazine.
interested,reward,visit,consumer,conclusion Times Higher Education
World University Rankings
is a consumer magazine.
NOT ENOUGH INFO Claims
The KGB was a military ser-
vice and was governed by
army laws and regulations ,
similar to the Soviet Army
or MVD Internal Troops.
nowhere,only,none,no,nothing The KGB was only con-
trolled by a military service.
The series revolves around says,said,take,say,is Take Me High is about
Results: Manual Evaluation
• How well can we generate universal adversarial triggers?
• Trade-off between how potent the attack is (reduction in F1) vs. how
semantically coherent the claim is (STS)
• Reduction in F1 for both trigger generation methods
• Macro F1 of generated w.r.t. original claim: 56.6 (FC Objective); 60.7
(FC + STS Objective) -- STS Objective preserves meaning more often
0
5
Overall SUPPORTS REFUTES NEI
Semantic Textual Similarity
with original claim
(higher = better)
FC Objective FC + STS Objective
0
0,5
1
Overall SUPPORTS REFUTES NEI
F1 of FC model
(lower = better)
FC Objective FC + STS Objective
Key Take-Aways
• Novel extension to the HotFlip attack for universarial
adversarial trigger generation (Ebrahimi et al., 2018)
• Conditional language model, which takes trigger tokens
and evidence, and generates a semantically coherent claim
• Resulting model generates semantically coherent claims
containing universal triggers, which preserve the label
• Trade-off between how well-formed the claim is and how
potent the attack is
TX-Ray: Quantifying and Explaining
Model-Knowledge Transfer in
(Un-)Supervised NLP
Nils Rethmeier, Vageesh Kumar Saxena,
Isabelle Augenstein
UAI 2020
53
Problems of supervised probing task evaluation setup
● only measures expected (probed) model knowledge and semantics
● is misleading when model and probe domains mismatch
● probing annotation can not scale to uncover unforeseen semantics
Goal: instead can we visualise, quantify and explore how a (language) model
● learns → RQ (1) How does self-supervision abstract knowledge?
● applies → RQ (2) How is knowledge (zero-shot) applied to new inputs X?
● adapts → RQ (3) How is knowledge adapted by supervision?
Approach: un-/ self-supervised interpretability
● visualise what input (features) each neuron prefers (maximally activates on)
-- i.e. activation maximisation by Erhan, 2009 -- used on RBMs [1]
Motivation: observe knowledge acquisition of
neural nets
!
Goal: Visualise & measure transfer in neural nets
(0): random init
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
text in E
random encoder E
Goal: Visualise & measure transfer in neural nets
Xpre
pretrain E on Xpre
(1): pre-training
E
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
learning encoder E
E trained as
language model fits
knowledge
(0): random init
Xpre
E
random encoder E
I like cake better than biscuits.
The learning-cake has has cherries inside.
Supervision is the cake’s frosting.
Self-supervision is the cake’s corpus.
The cake is a lie.
Goal: Visualise & measure transfer in neural nets
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
I like cake better than biscuits.
The learning-cake has has cherries inside.
Supervision is the cake’s frosting.
Self-supervision is the cake’s corpus.
The cake is a lie.
Xpre
pretrain E on Xpre
(1): pre-training
E
learning encoder E
Loss: drops/ changes
Goal: Visualise & measure transfer in neural nets
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
cake
pk = token activation probability, while training
Neuron nn := token-preference distribution
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
I like cake better than biscuits.
The learning-cake has has cherries inside.
Supervision is the cake’s frosting.
Self-supervision is the cake’s corpus.
The cake is a lie.
biscuits
Xpre
pretrain E on Xpre
(1): pre-training
E
learning encoder E
Loss: drops/ changes
Knowledge
abstraction
starts
Per token, only record the
maximally activated (preferred)
neuron -- i.e. activation-
maximization (Erhan, 2009)
Goal: Visualise & measure transfer in neural nets
cake
Xpre
pretrain E on Xpre
(1): pre-training
E
pk = token activation probability, while training
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
I like cake better than biscuits.
The learning - cake contains cherries.
Supervision is the cake’s frosting.
Self-supervision is the cake’s corpus.
The cake is a lie.
cookies
learning encoder E
Neuron nn := token-preference distribution
Loss: drops/ changes
Knowledge
abstraction
changes
Goal: Visualise & measure transfer in neural nets
cake
Xpre
pretrain E on Xpre
(1): pre-training
E
pk = token activation probability, while training
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
I like cake better than cookies.
The learning - cake contains cherries.
Supervision is the cake’s icing.
Self-supervision is the cake’s corpus.
The cake is a lie.
cookies
supervision
learning encoder E
Neuron nn := token-preference distribution
Loss: drops/ changes
Knowledge
abstraction
changes
Goal: Visualise & measure transfer in neural nets
cake
Xpre
pretrain E on Xpre
(1): pre-training
E
pk = token activation probability, post training
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
converged encoder E
I like cake better and biscuits.
The learning - cake contains cherries.
Supervision is the cake’s icing.
Self - supervision is the cake’s corpus.
The cake is a lie.
cookies
supervision self
Neuron nn := token-preference distribution
Loss: drops/ changes
Knowledge abstraction
converges
Goal: Visualise & measure transfer in neural nets
f1 f2 f3 fk f_
Xpre
pretrain E on Xpre
(1): pre-training
E
pk
nn:= token-prefer dist.
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
pre-trained encoder E
Take-away (1): ____________
(pre)-trained builds neural
knowledge (feature activation
distributions)
Token-activation distributions
visualize the knowledge
abstraction of each
neuron
Goal: Visualise & measure transfer in neural nets
f1 f2 f3 fk f_
Xpre
pretrain E on Xpre
(1): pre-training
E
pk
nn:= token-prefer dist.
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
pre-trained encoder E
Take-away (1): ____________
(pre)-trained builds neural
knowledge (feature activation
distributions)
Token-activation distributions
visualize the knowledge
abstraction of each
neuron
Goal: Visualise & measure transfer in neural nets
f1 f2 f3 fk f_
Xpre
pretrain E on Xpre
(1): pre-training
E
pk
nn:= token-activate dist.
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
RQ (2) apply learned knowledge
● to new domain text Xend
nn:= token-prefer dist.
Goal: Visualise & measure transfer in neural nets
f1 f2 f3 fk f_
Xpre
pretrain E on Xpre
f1 f2 f3 fk f_
Xend
eval. E on new Xend
(2): zero-shot
(1): pre-training
E
pk
fk := token / POS
nn:= token-activate dist.
E
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
RQ (2) apply learned knowledge
● to new domain text Xend
● no neuron transfer if
large change vs.
● neuron transfers if
small change vs.
● activation
distribution changes
● feed new text Xend
to frozen pretrained
encoder
We use Hellinger H( , )
distance as change
measure -- i.e. a
symmetrical KLD
nn:= token-prefer dist.
Goal: Visualise & measure transfer in neural nets
f1 f2 f3 fk f_
Xpre
pretrain E on Xpre
f1 f2 f3 fk f_
Xend
eval. E on new Xend
(2): zero-shot
(1): pre-training
E
pk
fk := token / POS
nn:= token-activate dist.
E
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
RQ (2) apply learned knowledge
● to new domain text Xend
● no neuron transfer if
large change vs.
● neuron transfers if
small change vs.
● activation
distribution changes
● feed new text Xend
to frozen pretrained
encoder
We use Hellinger H( , )
distance as change
measure -- i.e. a
symmetrical KLD
nn:= token-prefer dist.
Goal: Visualize & measure transfer in neural nets
zero-shot transfer
f1 f2 f3 fk f_
Xpre
pretrain E on Xpre
f1 f2 f3 fk f_
Xend
eval. E on new Xend
(2): zero-shot
(1): pre-training
E
pk
fk := token / POS
nn:= token-activate dist.
E
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
RQ (2) apply learned knowledge
● to new domain text Xend
● no neuron transfer if
large change vs.
● neuron transfers if
small change vs.
● activation
distribution changes
● feed new text Xend
to frozen pretrained
encoder
We use Hellinger H( , )
distance as change
measure -- i.e. a
symmetrical KLD
nn:= token-prefer dist.
We use Hellinger H( , )
distance as change/
transfer measure
-- i.e. a symmetric KLD
Goal: Visualise & measure transfer in neural nets
zero-shot transfer
f1 f2 f3 fk f_
Xpre
pretrain E on Xpre
f1 f2 f3 fk f_
Xend
eval. E on new Xend
(2): zero-shot
(1): pre-training
E
pk
fk := token / POS
nn:= token-activate dist.
E
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
RQ (2) apply learned knowledge
● to new domain text Xend
We use Hellinger H( , )
distance as change
measure -- i.e. a
symmetrical KLD
nn:= token-prefer dist.
● no neuron transfer if
large change vs.
● neuron transfers if
small change vs.
● activation
distribution changes
● feed new text Xend
to frozen pretrained
encoder
Over-specialisation:
OOD-generalisation:
Goal: Visualise & measure transfer in neural nets
zero-shot transfer
f1 f2 f3 fk f_
Xpre
pretrain E on Xpre
f1 f2 f3 fk f_
Xend
eval. E on new Xend
(2): zero-shot
(1): pre-training
E
pk
fk := token / POS
nn:= token-activate dist.
E
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
RQ (2) apply learned knowledge
● to new domain text Xend
Transfer if encoder
generalises well (activates
similarly) on the new text
Shows how able each neuron is
generalise to new data(-distrib.)!
Take-away (2): _____________
nn:= token-prefer dist.
Goal: Visualise & measure transfer in neural nets
zero-shot transfer
supervised transfer
f1 f2 f3 fk f_
Xpre
pretrain E on Xpre
f1 f2 f3 fk f_
Xend
eval. E on new Xend
f1 f2 f3 fk f_
Xend
eval. Eend on Xend
Yend
Eend : E fit on Yend
(2): zero-shot (3): supervised
(1): pre-training
E
pk p1= .05
fk := token / POS pk := token probability
nn:= token-activate dist.
E E
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
RQ (2) apply learned knowledge
● to new domain text Xend
RQ (3) adapt its knowledge
● to suit a supervision
signal Yend on the new
domain Xend
(no more LM loss/ objective)
nn:= token-prefer dist.
Goal: Visualise & measure transfer in neural nets
zero-shot transfer
supervised transfer
f1 f2 f3 fk f_
Xpre
pretrain E on Xpre
f1 f2 f3 fk f_
Xend
eval. E on new Xend
f1 f2 f3 fk f_
Xend
eval. Eend on Xend
Yend
Eend : E fit on Yend
(2): zero-shot (3): supervised
(1): pre-training
E
pk p1= .05
fk := token / POS pk := token probability
nn:= token-activate dist.
E E
How does each neuron
RQ (1) abstract/ learn (textual) knowledge?
● during initial/ pre-training on text Xpre
RQ (2) apply learned knowledge
● to new domain text Xend
RQ (3) adapt its knowledge
● to suit a supervision
signal Yend on the new
domain Xend
(no more LM loss/ objective)
supervision specialises (re-fits) knowl.
adds new knowledge
and ‘avoids’ (sparsifies) old knowledge
Take-away (3): _____________
nn:= token-prefer dist.
Experiment: XAI to guide pruning
Pretrain, then supervise
● pretrain language model on Wikitext-2 (1)
● fine tune to IMDB binary reviews (3)
Experiment: XAI to guide pruning
Pretrain, then supervise
● pretrain language model on Wikitext-2 (1)
● fine tune to IMDB binary reviews (3)
Prune neurons that post supervision …
● were specialized (re-fit)
● became preferred (activated)
● were ‘avoided’ (now empty distrib.)
→ i.e, neuron has no max activations post supervision
Take-Aways
TX-Ray can explore generalisation and specialisation at individual neuron-level
● self-supervised pre-training builds general knowledge
○ preference spread across many neurons -- 89% maximally active (preferred) neurons
● zero-shot application shows match of model knowledge vs new domain
○ preference less spread -- 88% of neurons preferred, partial generalisation
● supervised knowledge fine-tuning sparsifies (concentrates) activation
○ preference peaked -- only 45% of neurons preferred, many become domain over-specialised
preference activation sum (mass) per neuron -- sorted by sum
!
Wrap-Up
75
Overall Take-Aways
• Why explainability?
• understanding if a model is right for the right reasons
• Generated explanations can help users understand:
• inner workings of a model (model understanding)
• how a model arrived at a prediction (decision understanding)
• Explainability can enable:
• human-in-the-loop model development
• human-in-the-loop data selection
Overall Take-Aways
• Caveats:
• There can be more than one correct explanation
• Different explainability methods provide different explanations
• Different streams of explainability methods have different
benefits and downsides
• Black box vs. white box
• Hypothesis testing vs. bottom-up understanding
• Requirements for annotated training data
• Joint vs. post-hoc explanation generation
• One-time analysis vs. continuous monitoring
• Perform well w.r.t. different properties
Where to from here?
• Making explanations useful to model trainers and
end users
• How to interpret different explanations?
• Explanations for models with large number of parameters
• What-if analyses
• Some research on generating explanations, relatively little
work on understanding in what context they are useful
• (Automatically) evaluating explanations
• Human-in-the-loop development
Thank you!
isabelleaugenstein.github.io
augenstein@di.ku.dk
@IAugenstein
github.com/isabelleaugenstein
12/01/2021 79
Thanks to my PhD students and collaborators!
80
CopeNLU
https://copenlu.github.io/
Pepa Atanasova Dustin Wright Jakob Grue Simonsen Christina Lioma
Vageesh Kumar Saxena
Nils Rethmeier
Presented Papers
Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle
Augenstein. Generating Fact Checking Explanations. In Proceedings of
ACL 2020.
Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle
Augenstein. A Diagnostic Study of Explainability Techniques for Text
Classification. EMNLP 2020.
Pepa Atanasova*, Dustin Wright*, Isabelle Augenstein. Generating Label
Cohesive and Well-Formed Adversarial Claims. EMNLP 2020.
Nils Rethmeier, Vageesh Kumar Saxena, Isabelle Augenstein. TX-Ray:
Quantifying and Explaining Model-Knowledge Transfer in
(Un)Supervised NLP. In Proceedings of the Conference on UAI 2020.
*equal contributions
Hiring
2 PhD students, 1 postdoc – explainable stance detection
funded by Danish Research Council (DFF) Sapere Aude grant
application deadline PhD: 31 January 2021
start date: Summer/Autumn 2021
PhD: https://employment.ku.dk/faculty/?show=153150
Postdoc: https://tinyurl.com/yd6jw5nm
1 postdoc – explainable IE & QA
funded by Innovation Foundation Denmark
application deadline: 28 February 2021
start date: Autumn 2020
Postdoc: https://employment.ku.dk/faculty/?show=153308
82
Reading Materials on Explainability
Decision Understanding
• Camburu, O. M., Shillingford, B., Minervini, P., Lukasiewicz, T., &
Blunsom, P. (2020, July). Make Up Your Mind! Adversarial Generation
of Inconsistent Natural Language Explanations. In Proceedings of the
58th Annual Meeting of the Association for Computational Linguistics
(pp. 4157-4165).
- Reverse explanations to help prevent inconsistencies
• Slack, D., Hilgard, S., Jia, E., Singh, S., & Lakkaraju, H. (2020, February).
Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation
Methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics,
and Society (pp. 180-186).
- Perturbation-based explanation methods be fooled easily, are potentially unreliable
Reading Materials on Explainability
Model Understanding
- Geva, M., Schuster, R., Berant, J., & Levy, O. (2020). Transformer
Feed-Forward Layers Are Key-Value Memories. arXiv preprint
arXiv:2012.14913.
- how to use activation maximisation to analyse memorisation in Transformers
- Durrani, N., Sajjad, H., Dalvi, F., & Belinkov, Y. (2020). Analyzing
Individual Neurons in Pre-trained Language Models. arXiv preprint
arXiv:2010.02695.
- pruning neurons and how many neurons are used per task
Reading Materials on Explainability
Surveys, Opinion Papers & Benchmarks
• Eraser benchmark with explainability datasets:
https://www.eraserbenchmark.com/
• Jacovi, A., & Goldberg, Y. (2020). Towards Faithfully Interpretable NLP
Systems: How should we define and evaluate faithfulness? In ACL
2020.
• Jacovi, A., Marasović, A., Miller, T., & Goldberg, Y. (2020). Formalizing
Trust in Artificial Intelligence: Prerequisites, Causes and Goals of
Human Trust in AI. arXiv preprint arXiv:2010.07487.
• Kotonya, N., & Toni, F. (2020). Explainable Automated Fact-Checking:
A Survey. In Coling 2020.
Reading Materials on Explainability
Tutorials
• Yonatan Belinkov, Sebastian Gehrmann, Ellie Pavlick. Interpretability
and Analysis in Neural NLP. Tutorial at ACL 2020.
https://www.aclweb.org/anthology/2020.acl-tutorials.1/
• Eric Wallace, Matt Gardner, Sameer Singh. Interpreting Predictions of
NLP Models. Tutorial at EMNLP 2020.
https://www.aclweb.org/anthology/2020.emnlp-tutorials.3/
• Hima Lakkaraju, Julius Adebayo, Sameer. Explaining Machine Learning
Predictions -- State-of-the-art, Challenges, and Opportunities.
Tutorials at NeurIPS 2020, AAAI 2021. https://explainml-
tutorial.github.io/
• Jay Alammar. Interfaces for Explaining Transformer Language Models.
Blog, 2020. https://jalammar.github.io/explaining-transformers/

Más contenido relacionado

La actualidad más candente

Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Dataconomy Media
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Krishnaram Kenthapadi
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMsJim Steele
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Krishnaram Kenthapadi
 
Towards Human-Centered Machine Learning
Towards Human-Centered Machine LearningTowards Human-Centered Machine Learning
Towards Human-Centered Machine LearningSri Ambati
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Sri Ambati
 
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Krishnaram Kenthapadi
 
Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework Deep Learning Italia
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroNumenta
 
An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!Mansour Saffar
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
 
10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation Options10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation OptionsMihai Criveti
 
Explainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableAditya Bhattacharya
 

La actualidad más candente (20)

Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
 
Towards Human-Centered Machine Learning
Towards Human-Centered Machine LearningTowards Human-Centered Machine Learning
Towards Human-Centered Machine Learning
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
 
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
 
Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!
 
How will development change with LLMs
How will development change with LLMsHow will development change with LLMs
How will development change with LLMs
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation Options10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation Options
 
Explainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretable
 

Similar a Explainability for NLP

Towards Explainable Fact Checking
Towards Explainable Fact CheckingTowards Explainable Fact Checking
Towards Explainable Fact CheckingIsabelle Augenstein
 
Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)Isabelle Augenstein
 
Fake News Detector
Fake News DetectorFake News Detector
Fake News DetectorIrisYoon5
 
AAPOR 2012 Langer Probability
AAPOR 2012 Langer ProbabilityAAPOR 2012 Langer Probability
AAPOR 2012 Langer ProbabilityLangerResearch
 
2017 Edelman Trust Barometer - Technology
2017 Edelman Trust Barometer - Technology2017 Edelman Trust Barometer - Technology
2017 Edelman Trust Barometer - TechnologyEdelman
 
Top 14 Public Relations Insights of 2019
Top 14 Public Relations Insights of 2019Top 14 Public Relations Insights of 2019
Top 14 Public Relations Insights of 2019Sarah Jackson
 
Sentiment Analysis Symposium 2010, welcoming address, Seth Grimes
Sentiment Analysis Symposium 2010, welcoming address, Seth GrimesSentiment Analysis Symposium 2010, welcoming address, Seth Grimes
Sentiment Analysis Symposium 2010, welcoming address, Seth GrimesSentiment Analysis Symposium
 
Qualitative Research Analysis Of Irritable Bowel Syndrome...
Qualitative Research Analysis Of Irritable Bowel Syndrome...Qualitative Research Analysis Of Irritable Bowel Syndrome...
Qualitative Research Analysis Of Irritable Bowel Syndrome...Anna Shaw
 
Facts, Figures & Fictions
Facts, Figures & Fictions Facts, Figures & Fictions
Facts, Figures & Fictions Johnbillett.com
 
In plain language and in plain sight around the globe
In plain language and in plain sight around the globeIn plain language and in plain sight around the globe
In plain language and in plain sight around the globeDeborah S. Bosley
 
IE_expressyourself_EssayH
IE_expressyourself_EssayHIE_expressyourself_EssayH
IE_expressyourself_EssayHjk6653284
 
Note On Unemployment Rates Are Drawn From The Bureau Of...
Note On Unemployment Rates Are Drawn From The Bureau Of...Note On Unemployment Rates Are Drawn From The Bureau Of...
Note On Unemployment Rates Are Drawn From The Bureau Of...Navy Savchenko
 
Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)
Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)
Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)Flupa
 
New Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max WellingNew Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max WellingTextkernel
 
Country United StatesOnce you have selected your country you wi.docx
Country United StatesOnce you have selected your country you wi.docxCountry United StatesOnce you have selected your country you wi.docx
Country United StatesOnce you have selected your country you wi.docxmelvinjrobinson2199
 

Similar a Explainability for NLP (20)

Towards Explainable Fact Checking
Towards Explainable Fact CheckingTowards Explainable Fact Checking
Towards Explainable Fact Checking
 
Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)
 
Fake News Detector
Fake News DetectorFake News Detector
Fake News Detector
 
AAPOR 2012 Langer Probability
AAPOR 2012 Langer ProbabilityAAPOR 2012 Langer Probability
AAPOR 2012 Langer Probability
 
2017 Edelman Trust Barometer - Technology
2017 Edelman Trust Barometer - Technology2017 Edelman Trust Barometer - Technology
2017 Edelman Trust Barometer - Technology
 
Top 14 booklet 2019
Top 14 booklet 2019 Top 14 booklet 2019
Top 14 booklet 2019
 
Top 14 booklet 2019
Top 14 booklet 2019Top 14 booklet 2019
Top 14 booklet 2019
 
Top 14 booklet 2019
Top 14 booklet 2019Top 14 booklet 2019
Top 14 booklet 2019
 
Top 14 booklet 2019
Top 14 booklet 2019 Top 14 booklet 2019
Top 14 booklet 2019
 
Top 14 Public Relations Insights of 2019
Top 14 Public Relations Insights of 2019Top 14 Public Relations Insights of 2019
Top 14 Public Relations Insights of 2019
 
Sentiment Analysis Symposium 2010, welcoming address, Seth Grimes
Sentiment Analysis Symposium 2010, welcoming address, Seth GrimesSentiment Analysis Symposium 2010, welcoming address, Seth Grimes
Sentiment Analysis Symposium 2010, welcoming address, Seth Grimes
 
Qualitative Research Analysis Of Irritable Bowel Syndrome...
Qualitative Research Analysis Of Irritable Bowel Syndrome...Qualitative Research Analysis Of Irritable Bowel Syndrome...
Qualitative Research Analysis Of Irritable Bowel Syndrome...
 
Facts, Figures & Fictions
Facts, Figures & Fictions Facts, Figures & Fictions
Facts, Figures & Fictions
 
In plain language and in plain sight around the globe
In plain language and in plain sight around the globeIn plain language and in plain sight around the globe
In plain language and in plain sight around the globe
 
IE_expressyourself_EssayH
IE_expressyourself_EssayHIE_expressyourself_EssayH
IE_expressyourself_EssayH
 
Note On Unemployment Rates Are Drawn From The Bureau Of...
Note On Unemployment Rates Are Drawn From The Bureau Of...Note On Unemployment Rates Are Drawn From The Bureau Of...
Note On Unemployment Rates Are Drawn From The Bureau Of...
 
Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)
Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)
Flupa UX Days 2018 | Sara Wachter-Boettcher (EN)
 
Bias in AI
Bias in AIBias in AI
Bias in AI
 
New Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max WellingNew Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max Welling
 
Country United StatesOnce you have selected your country you wi.docx
Country United StatesOnce you have selected your country you wi.docxCountry United StatesOnce you have selected your country you wi.docx
Country United StatesOnce you have selected your country you wi.docx
 

Más de Isabelle Augenstein

Beyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific CommunicationBeyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific CommunicationIsabelle Augenstein
 
Automatically Detecting Scientific Misinformation
Automatically Detecting Scientific MisinformationAutomatically Detecting Scientific Misinformation
Automatically Detecting Scientific MisinformationIsabelle Augenstein
 
Accountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact CheckingAccountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact CheckingIsabelle Augenstein
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science CommunicationIsabelle Augenstein
 
Tracking False Information Online
Tracking False Information OnlineTracking False Information Online
Tracking False Information OnlineIsabelle Augenstein
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...Isabelle Augenstein
 
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...Isabelle Augenstein
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondIsabelle Augenstein
 
Learning to read for automated fact checking
Learning to read for automated fact checkingLearning to read for automated fact checking
Learning to read for automated fact checkingIsabelle Augenstein
 
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...Isabelle Augenstein
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...Isabelle Augenstein
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...Isabelle Augenstein
 
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Isabelle Augenstein
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine ReadingIsabelle Augenstein
 
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with AutoencodersUSFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with AutoencodersIsabelle Augenstein
 
Distant Supervision with Imitation Learning
Distant Supervision with Imitation LearningDistant Supervision with Imitation Learning
Distant Supervision with Imitation LearningIsabelle Augenstein
 
Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...Isabelle Augenstein
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked DataIsabelle Augenstein
 
Lodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured TextLodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured TextIsabelle Augenstein
 
Relation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant SupervisionRelation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant SupervisionIsabelle Augenstein
 

Más de Isabelle Augenstein (20)

Beyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific CommunicationBeyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific Communication
 
Automatically Detecting Scientific Misinformation
Automatically Detecting Scientific MisinformationAutomatically Detecting Scientific Misinformation
Automatically Detecting Scientific Misinformation
 
Accountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact CheckingAccountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact Checking
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
 
Tracking False Information Online
Tracking False Information OnlineTracking False Information Online
Tracking False Information Online
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...
 
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
 
Learning to read for automated fact checking
Learning to read for automated fact checkingLearning to read for automated fact checking
Learning to read for automated fact checking
 
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
 
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine Reading
 
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with AutoencodersUSFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
 
Distant Supervision with Imitation Learning
Distant Supervision with Imitation LearningDistant Supervision with Imitation Learning
Distant Supervision with Imitation Learning
 
Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked Data
 
Lodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured TextLodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured Text
 
Relation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant SupervisionRelation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant Supervision
 

Último

Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingFrancesco Corti
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FESTBillieHyde
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...DianaGray10
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsDianaGray10
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3DianaGray10
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechProduct School
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024Brian Pichman
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarThousandEyes
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Libraryshyamraj55
 

Último (20)

Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is going
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FEST
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? Webinar
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
 

Explainability for NLP

  • 1. Explainability for NLP Isabelle Augenstein* augenstein@di.ku.dk @IAugenstein http://isabelleaugenstein.github.io/ *partial slide credit: Pepa Atanasova, Nils Rethmeier ALPS Winter School 22 January 2021
  • 2. Explainability – what is it and why do we need it? Architect Trainer End User Terminology borrowed from Strobelt et al. (2018), “LSTMVis: A tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE transactions on visualization and computer graphics.” true
  • 3. Explainability – what is it and why do we need it? Claim: “In the COVID-19 crisis, ‘only 20% of African Americans had jobs where they could work from home.’” Evidence: “20% of black workers said they could work from home in their primary job, compared to 30% of white workers.” Claim: “Children don’t seem to be getting this virus.” Evidence: “There have been no reported incidents of infection in children.” Claim: “Taylor Swift had a fatal car accident.” Reason: overfitting to spurious patterns (celebrity death hoaxes are common) Claim: “Michael Jackson is still alive, appears in daughter’s selfie.” Reason: overfitting to spurious patterns (celebrity death hoaxes are common) Right prediction Wrong prediction Right reasons Wrong reasons
  • 4. Explainability – what is it and why do we need it? Architect Trainer End User ? true
  • 5. Explainability – what is it and why do we need it? Architect Trainer End User ? true
  • 6. Explainability – what is it and why do we need it? Architect Trainer End User ? true
  • 7. • Model Understanding • What features and parameters has a model learned? • How do these features and parameters relate to model outputs generally? • Decision Understanding • How does the model arrive at predictions for specific instances? • Which features and parameters influence a specific prediction? Types of Explainability
  • 8. • Black Box Explainability Methods • No access to the model parameters, only predictions • Observing output changes via different inputs • White Box Explainability Methods • Access to the model features and parameters • Correlating outputs with specific features and parameters Types of Explainability
  • 9. • Joint Explainability Methods • Explanation produced jointly with target task • Post-Hoc Explainability Methods • Explanation produced for a trained model Types of Explainability
  • 10. • Model Understanding • What features and parameters has a model learned? • Methods: • Feature visualisation methods (white box) • Adversarial examples (black or white box) • Decision Understanding • How does the model arrive at predictions for specific instances? • Methods • Probing tasks (black or white box) • Correlating inputs with gradients/attention weights/etc. (white box) • Generating text explaining predictions (white box) Types of Explainability
  • 11. Overview of Today’s Talk • Introduction • Explainability – what is it and why do we need it? • Part 1: Decision understanding • Instance-level explainability for text classification and fact checking • Language generation based explanations • Evaluating instance-level explanations • Part 2: Model understanding • Model-wide explainability for text classification and fact checking • Finding model-wide explanations • Visualising model-wide explanations
  • 13. Generating Fact Checking Explanations Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein ACL 2020 13
  • 16. Automating Fact Checking Statement: “The last quarter, it was just announced, our gross domestic product was below zero. Who ever heard of this? Its never below zero.” Speaker: Donald Trump Context: presidential announcement speech Label: Pants on Fire Wang, William Yang. "“Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection." Proceedings ACL’2017.
  • 17. Automating Fact Checking Claim: We’ve tested more than every country combined. Metadata: topic=healthcare, Coronavirus, speaker=Donald Trump, job=president, context=White House briefing, history={4, 10, 14, 21, 34, 14} Hybrid CNN Veracity Label 0,204 0,208 0,247 0,274 0 0,1 0,2 0,3 Validation Test Fact checking macro F1 score Majority Wang et. al Wang, William Yang. "“Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection." Proceedings ACL’2017.
  • 18. Automating Fact Checking Alhindi, Tariq, Savvas Petridis, and Smaranda Muresan. "Where is your Evidence: Improving Fact-checking by Justification Modeling." Proceedings of FEVER’2018. Claim: We’ve tested more than every country combined. Justification: Trump claimed that the United States has "tested more than every country combined." There is no reasonable way to conclude that the American system has run more diagnostics than "all other major countries combined." Just by adding up a few other nations’ totals, you can quickly see Trump’s claim fall apart. Plus, focusing on the 5 million figure distracts from the real issue — by any meaningful metric of diagnosing and tracking, the United States is still well behind countries like Germany and Canada. The president’s claim is not only inaccurate but also ridiculous. We rate it Pants on Fire! Logistic Regression Veracity Label 0,204 0,208 0,247 0,274 0,370 0,370 0 0,2 0,4 Validation Test Fact checking macro F1 score Majority Wang et. al Alhindi et. Al
  • 19. How about generating an explanation? Claim: We’ve tested more than every country combined. Justification: Trump claimed that the United States has "tested more than every country combined." There is no reasonable way to conclude that the American system has run more diagnostics than "all other major countries combined." Just by adding up a few other nations’ totals, you can quickly see Trump’s claim fall apart. Plus, focusing on the 5 million figure distracts from the real issue — by any meaningful metric of diagnosing and tracking, the United States is still well behind countries like Germany and Canada. The president’s claim is not only inaccurate but also ridiculous. We rate it Pants on Fire! Model Veracity Label Explanation
  • 20. Generating Explanations from Ruling Comments Claim: We’ve tested more than every country combined. Ruling Comments: Responding to weeks of criticism over his administration’s COVID-19 response, President Donald Trump claimed at a White House briefing that the United States has well surpassed other countries in testing people for the virus. "We’ve tested more than every country combined," Trump said April 27 […] We emailed the White House for comment but never heard back, so we turned to the data. Trump’s claim didn’t stand up to scrutiny. In raw numbers, the United States has tested more people than any other individual country — but nowhere near more than "every country combined" or, as he said in his tweet, more than "all major countries combined.”[…] The United States has a far bigger population than many of the "major countries" Trump often mentions. So it could have run far more tests but still have a much larger burden ahead than do nations like Germany, France or Canada.[…] Joint Model Veracity Label Justification/ Explanation
  • 21. Related Studies on Generating Explanations • Camburu et. al; Rajani et. al generate abstractive explanations • Short input text and explanations; • Large amount of annotated data. • Real world fact checking datasets are of limited size and the input consists of long documents • We take advantage of the LIAR-PLUS dataset: • Use the summary of the ruling comments as a gold explanation; • Formulate the problem as extractive summarization. • Camburu, Oana-Maria, Tim Rocktäschel, Thomas Lukasiewicz, and Phil Blunsom. "e-SNLI: Natural language inference with natural language explanations." In Advances in Neural Information Processing Systems,. 2018. • Rajani, Nazneen Fatema, Bryan McCann, Caiming Xiong, and Richard Socher. "Explain Yourself! Leveraging Language Models for Commonsense Reasoning." In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4932-4942. 2019.
  • 22. Example of an Oracle’s Gold Summary Claim: “The president promised that if he spent money on a stimulus program that unemployment would go to 5.7 percent or 6 percent. Those were his words.” Label: Mostly-False Just: Bramnick said “the president promised that if he spent money on a stimulus program that unemployment would go to 5.7 percent or 6 percent. Those werehis words.” Two economic advisers estimated in a 2009 report that with the stimulus plan, the unemployment rate would peak near 8 percent before dropping to less than 6 percent by now. Those are critical details Bramnick’s statement ignores. To comment on this ruling, go to NJ.com. Oracle: “The president promised that if he spent money on a stimulus program that unemployment would go to 5.7 percent or 6 percent. Those were his words,”Bramnick said in a Sept. 7 interview on NJToday. But with the stimulus plan, the report projected the nation’s jobless rate would peak near 8 percent in 2009 before falling to about 5.5 percent by now. So the estimates in the report were wrong.
  • 23. Which sentences should be selected for the explanation?
  • 24. What is the veracity of the claim?
  • 25. Joint Explanation and Veracity Prediction Ruder, Sebastian, et al. "Latent multi-task architecture learning." Proceedings of the AAAI’2019. Cross-stitch layer Multi-task objective
  • 26. Manual Evaluation • Explanation Quality • Coverage. The explanation contains important, salient information and does not miss any important points that contribute to the fact check. • Non-redundancy. The summary does not contain any information that is redundant/repeated/not relevant to the claim and the fact check. • Non-contradiction. The summary does not contain any pieces of information contradictory to the claim and the fact check. • Overall. Rank the explanations by their overall quality. • Explanation Informativeness. Provide a veracity label for a claim based on a veracity explanation coming from the justification, the Explain-MT, or the Explain- Extractive system.
  • 27. Explanation Quality 1,48 1,48 1,45 1,58 1,89 1,75 1,40 2,03 1,68 1,79 1,48 1,90 - 0,50 1,00 1,50 2,00 2,50 Coverage Non-redundancy Non-contradicton Overall Mean Average Rank (MAR). Lower MAR is better! (higher rank) Justification Extractive Extractive MT
  • 28. Explanation Informativeness - 0,100 0,200 0,300 0,400 0,500 Agree-Correct Agree-Not Correct Agree-Nost Sufficient Disagree Manual veracity labelling, given a particular explanation as percentages of the dis/agreeing annotator predictions. Justification Extractive Extractive-MT
  • 29. Summary • First study on generating veracity explanations • Jointly training veracity prediction and explanation • improves the performance of the classification system • improves the coverage and overall performance of the generated explanations
  • 30. Future Work • Can we generate better or even abstractive explanations given limited resources? • How to automatically evaluate the properties of the explanations? • Can explanations be extracted from evidence pages only (lots of irrelevant and multi-modal results)?
  • 31. A Diagnostic Study of Explainability Techniques for Text Classification Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein EMNLP 2020 31
  • 32. Explainability Datasets for Decision Understanding
  • 33. Post-Hoc Explainability Methods for Decision Understanding Example: Twitter Sentiment Extraction (TSE)
  • 34. • How can explainability methods be evaluated? • Proposal: set of diagnostic properties • What are characteristics of different explainability methods? • How do explanations for models with different architectures differ? • How do automatically and manually generated explanations differ? Post-Hoc Explainability for Decision Understanding: Research Questions
  • 35. • Compute gradient of input w.r.t. output • Gradient is computed for each element of vector • Different aggregation methods used to produce one score per input token (mean average, L2 norm aggregation) • Common approaches • Saliency • see above • InputX-Gradient • additionally multiplies gradient with input • Guided Backpropagation • over-writes the gradients of ReLU functions so that only non-negative gradients are backpropagated Post-Hoc Explainability Methods for Decision Understanding: Gradient-Based Approaches
  • 36. • Replace tokens in input with other tokens to compute their relative contributions • Common approaches • Occlusion • replaces each token with a baseline token and measures change in output • Shapley Value Sampling • computes average marginal contribution of each word across word perturbations Post-Hoc Explainability Methods for Decision Understanding: Perturbation-Based Approaches
  • 37. • Train local linear models to approximate local decision boundaries • Common approaches • LIME • train one linear model per instance Post-Hoc Explainability Methods for Decision Understanding: Simplification-Based Approaches
  • 38. • Agreement with Human Rationales (HA) • Degree of overlap between human and automatic saliency scores • Confidence Indication (CI) • Predictive power of produced explanations for model’s confidence • Faithfulness (F) • Mask most salient tokens, measure drop in performance • Rationale Consistency (RC) • Difference between explanations for models trained with different random seeds, with model with random weights • Dataset Consistency (DC) • Difference between explanations for similar instances Post-Hoc Explainability Methods for Decision Understanding: Diagnostic Properties
  • 39. Post-Hoc Explainability Methods for Decision Understanding: Selected Results Spider chart for Transformer model on e-SNLI HA: Agreement with human rationales CI: Confidence indication F: Faithfulness RC: Rationale Consistency DC: Dataset Consistency
  • 40. Post-Hoc Explainability Methods for Decision Understanding: Aggregated Results 0,4 0,45 0,5 0,55 0,6 0,65 0,7 Mean of diagnostic property measures for e-SNLI Transformer CNN LSTM
  • 41. Summary • Diagnostic properties allow to assess different aspects of explainability techniques • Gradient-based methods outperform perturbation-based and simplification-based ones for most properties across model architectures and datasets • Exception: Shapley Value Sampling and LIME better for Confidence Indication property • Gradient-based methods also fastest to compute
  • 43. Overview of Today’s Talk • Introduction • Explainability – what is it and why do we need it? • Part 1: Decision understanding • Instance-level explainability for text classification and fact checking • Language generation based explanations • Evaluating instance-level explanations • Part 2: Model understanding • Model-wide explainability for text classification and fact checking • Finding model-wide explanations • Visualising model-wide explanations
  • 44. Universal Adversarial Trigger Generation for Fact Checking Pepa Atanasova*, Dustin Wright*, Isabelle Augenstein EMNLP 2020 *equal contributions 44
  • 45. Generating Adversarial Claims • Fact checking models can overfit to spurious patterns • Making the right predictions for the wrong reasons • This leads to vulnerabilities, which can be exploited by adversaries (e.g. agents spreading mis- and disinformation) • How can one reveal such vulnerabilities? • Generating instance-level explanations for fact checking models (first part of talk) • Generating adversarial claims (this work)
  • 46. Previous Work • Universal adversarial attacks (Gao and Oates, 2019; Wallace et al, 2019) • Single perturbation changes that can be applied to many instances • Change the meaning of the input instances and thus produce label-incoherent claims • Are not per se semantically well-formed • Rule-based perturbations (Riberio et al., 2018) • Semantically well-formed, but require hand-crafting patterns
  • 47. Previous Work • FEVER 2.0 shared task (Thorne et al., 2019) • Builders / breakers setup • Methods of submitted systems: • Producing claims requiring multi-hop reasoning (Niewinski et al., 2019) • Generating adversarial claims manually (Kim and Allan, 2019)
  • 48. Goals of this Work • Generate claims fully automatically • Preserve the meaning of the source text • Produce semantically well-formed claims
  • 49. Model RoBERTa-based FEVER model to predict FC label 1) HotFlip attack model to find triggers 2) STS auxiliary model to preserve FC label Claim generation conditioned on evidence and triggers
  • 50. Examples of Generated Claims 50 EMNLP 2020 Submission ***. Confidential Review Copy. DO NOT DISTRIBUTE. Evidence Triggers Generated Claim SUPPORTS Claims Since the 19th century, some Romani have also migrated to the Americas. don,already,more,during,home Romani have moved to the Americas during the 19th century. Cyprus is a major tourist destination in the Mediter- ranean . foreign,biggest,major,every,friends Cyprus is a major tourist des- tination. The first Nobel Prize in Chemistry was awarded in 1901 to Jacobus Henricus van’t Hoff, of the Nether- lands, “for his discovery of the laws of chemical dynam- ics and osmotic pressure in solutions.” later,already,quite,altern,whereas Henricus Van’t Hoff was al- ready awarded the Nobel Prize. REFUTES Claims California Attorney General phys,incarn,not,occasionally,something Kamala Harris did not defeat Since the 19th century, some Romani have also migrated to the Americas. don,already,more,during,home Romani have moved to the Americas during the 19th century. Cyprus is a major tourist destination in the Mediter- ranean . foreign,biggest,major,every,friends Cyprus is a major tourist des- tination. The first Nobel Prize in Chemistry was awarded in 1901 to Jacobus Henricus van’t Hoff, of the Nether- lands, “for his discovery of the laws of chemical dynam- ics and osmotic pressure in solutions.” later,already,quite,altern,whereas Henricus Van’t Hoff was al- ready awarded the Nobel Prize. REFUTES Claims California Attorney General Kamala Harris defeated Sanchez , 61.6% to 38.4%. phys,incarn,not,occasionally,something Kamala Harris did not defeat Sanchez, 61.6% to 38.4%. Uganda is in the African Great Lakes region. unless,endorsed,picks,pref,against Uganda is against the African Great Lakes region. Times Higher Education World University Rankings is an annual publication of university rankings by Times Higher Education (THE) magazine. interested,reward,visit,consumer,conclusion Times Higher Education World University Rankings is a consumer magazine. NOT ENOUGH INFO Claims ranean . The first Nobel Prize in Chemistry was awarded in 1901 to Jacobus Henricus van’t Hoff, of the Nether- lands, “for his discovery of the laws of chemical dynam- ics and osmotic pressure in solutions.” later,already,quite,altern,whereas Henricus Van’t Hoff was al- ready awarded the Nobel Prize. REFUTES Claims California Attorney General Kamala Harris defeated Sanchez , 61.6% to 38.4%. phys,incarn,not,occasionally,something Kamala Harris did not defeat Sanchez, 61.6% to 38.4%. Uganda is in the African Great Lakes region. unless,endorsed,picks,pref,against Uganda is against the African Great Lakes region. Times Higher Education World University Rankings is an annual publication of university rankings by Times Higher Education (THE) magazine. interested,reward,visit,consumer,conclusion Times Higher Education World University Rankings is a consumer magazine. NOT ENOUGH INFO Claims The KGB was a military ser- vice and was governed by army laws and regulations , similar to the Soviet Army or MVD Internal Troops. nowhere,only,none,no,nothing The KGB was only con- trolled by a military service. The first Nobel Prize in Chemistry was awarded in 1901 to Jacobus Henricus van’t Hoff, of the Nether- lands, “for his discovery of the laws of chemical dynam- ics and osmotic pressure in solutions.” later,already,quite,altern,whereas Henricus Van’t Hoff was al- ready awarded the Nobel Prize. REFUTES Claims California Attorney General Kamala Harris defeated Sanchez , 61.6% to 38.4%. phys,incarn,not,occasionally,something Kamala Harris did not defeat Sanchez, 61.6% to 38.4%. Uganda is in the African Great Lakes region. unless,endorsed,picks,pref,against Uganda is against the African Great Lakes region. Times Higher Education World University Rankings is an annual publication of university rankings by Times Higher Education (THE) magazine. interested,reward,visit,consumer,conclusion Times Higher Education World University Rankings is a consumer magazine. NOT ENOUGH INFO Claims The KGB was a military ser- vice and was governed by army laws and regulations , similar to the Soviet Army or MVD Internal Troops. nowhere,only,none,no,nothing The KGB was only con- trolled by a military service. The series revolves around says,said,take,say,is Take Me High is about
  • 51. Results: Manual Evaluation • How well can we generate universal adversarial triggers? • Trade-off between how potent the attack is (reduction in F1) vs. how semantically coherent the claim is (STS) • Reduction in F1 for both trigger generation methods • Macro F1 of generated w.r.t. original claim: 56.6 (FC Objective); 60.7 (FC + STS Objective) -- STS Objective preserves meaning more often 0 5 Overall SUPPORTS REFUTES NEI Semantic Textual Similarity with original claim (higher = better) FC Objective FC + STS Objective 0 0,5 1 Overall SUPPORTS REFUTES NEI F1 of FC model (lower = better) FC Objective FC + STS Objective
  • 52. Key Take-Aways • Novel extension to the HotFlip attack for universarial adversarial trigger generation (Ebrahimi et al., 2018) • Conditional language model, which takes trigger tokens and evidence, and generates a semantically coherent claim • Resulting model generates semantically coherent claims containing universal triggers, which preserve the label • Trade-off between how well-formed the claim is and how potent the attack is
  • 53. TX-Ray: Quantifying and Explaining Model-Knowledge Transfer in (Un-)Supervised NLP Nils Rethmeier, Vageesh Kumar Saxena, Isabelle Augenstein UAI 2020 53
  • 54. Problems of supervised probing task evaluation setup ● only measures expected (probed) model knowledge and semantics ● is misleading when model and probe domains mismatch ● probing annotation can not scale to uncover unforeseen semantics Goal: instead can we visualise, quantify and explore how a (language) model ● learns → RQ (1) How does self-supervision abstract knowledge? ● applies → RQ (2) How is knowledge (zero-shot) applied to new inputs X? ● adapts → RQ (3) How is knowledge adapted by supervision? Approach: un-/ self-supervised interpretability ● visualise what input (features) each neuron prefers (maximally activates on) -- i.e. activation maximisation by Erhan, 2009 -- used on RBMs [1] Motivation: observe knowledge acquisition of neural nets !
  • 55. Goal: Visualise & measure transfer in neural nets (0): random init How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre text in E random encoder E
  • 56. Goal: Visualise & measure transfer in neural nets Xpre pretrain E on Xpre (1): pre-training E How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre learning encoder E E trained as language model fits knowledge (0): random init Xpre E random encoder E I like cake better than biscuits. The learning-cake has has cherries inside. Supervision is the cake’s frosting. Self-supervision is the cake’s corpus. The cake is a lie.
  • 57. Goal: Visualise & measure transfer in neural nets How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre I like cake better than biscuits. The learning-cake has has cherries inside. Supervision is the cake’s frosting. Self-supervision is the cake’s corpus. The cake is a lie. Xpre pretrain E on Xpre (1): pre-training E learning encoder E Loss: drops/ changes
  • 58. Goal: Visualise & measure transfer in neural nets How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre cake pk = token activation probability, while training Neuron nn := token-preference distribution How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre I like cake better than biscuits. The learning-cake has has cherries inside. Supervision is the cake’s frosting. Self-supervision is the cake’s corpus. The cake is a lie. biscuits Xpre pretrain E on Xpre (1): pre-training E learning encoder E Loss: drops/ changes Knowledge abstraction starts Per token, only record the maximally activated (preferred) neuron -- i.e. activation- maximization (Erhan, 2009)
  • 59. Goal: Visualise & measure transfer in neural nets cake Xpre pretrain E on Xpre (1): pre-training E pk = token activation probability, while training How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre I like cake better than biscuits. The learning - cake contains cherries. Supervision is the cake’s frosting. Self-supervision is the cake’s corpus. The cake is a lie. cookies learning encoder E Neuron nn := token-preference distribution Loss: drops/ changes Knowledge abstraction changes
  • 60. Goal: Visualise & measure transfer in neural nets cake Xpre pretrain E on Xpre (1): pre-training E pk = token activation probability, while training How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre I like cake better than cookies. The learning - cake contains cherries. Supervision is the cake’s icing. Self-supervision is the cake’s corpus. The cake is a lie. cookies supervision learning encoder E Neuron nn := token-preference distribution Loss: drops/ changes Knowledge abstraction changes
  • 61. Goal: Visualise & measure transfer in neural nets cake Xpre pretrain E on Xpre (1): pre-training E pk = token activation probability, post training How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre converged encoder E I like cake better and biscuits. The learning - cake contains cherries. Supervision is the cake’s icing. Self - supervision is the cake’s corpus. The cake is a lie. cookies supervision self Neuron nn := token-preference distribution Loss: drops/ changes Knowledge abstraction converges
  • 62. Goal: Visualise & measure transfer in neural nets f1 f2 f3 fk f_ Xpre pretrain E on Xpre (1): pre-training E pk nn:= token-prefer dist. How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre pre-trained encoder E Take-away (1): ____________ (pre)-trained builds neural knowledge (feature activation distributions) Token-activation distributions visualize the knowledge abstraction of each neuron
  • 63. Goal: Visualise & measure transfer in neural nets f1 f2 f3 fk f_ Xpre pretrain E on Xpre (1): pre-training E pk nn:= token-prefer dist. How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre pre-trained encoder E Take-away (1): ____________ (pre)-trained builds neural knowledge (feature activation distributions) Token-activation distributions visualize the knowledge abstraction of each neuron
  • 64. Goal: Visualise & measure transfer in neural nets f1 f2 f3 fk f_ Xpre pretrain E on Xpre (1): pre-training E pk nn:= token-activate dist. How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre RQ (2) apply learned knowledge ● to new domain text Xend nn:= token-prefer dist.
  • 65. Goal: Visualise & measure transfer in neural nets f1 f2 f3 fk f_ Xpre pretrain E on Xpre f1 f2 f3 fk f_ Xend eval. E on new Xend (2): zero-shot (1): pre-training E pk fk := token / POS nn:= token-activate dist. E How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre RQ (2) apply learned knowledge ● to new domain text Xend ● no neuron transfer if large change vs. ● neuron transfers if small change vs. ● activation distribution changes ● feed new text Xend to frozen pretrained encoder We use Hellinger H( , ) distance as change measure -- i.e. a symmetrical KLD nn:= token-prefer dist.
  • 66. Goal: Visualise & measure transfer in neural nets f1 f2 f3 fk f_ Xpre pretrain E on Xpre f1 f2 f3 fk f_ Xend eval. E on new Xend (2): zero-shot (1): pre-training E pk fk := token / POS nn:= token-activate dist. E How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre RQ (2) apply learned knowledge ● to new domain text Xend ● no neuron transfer if large change vs. ● neuron transfers if small change vs. ● activation distribution changes ● feed new text Xend to frozen pretrained encoder We use Hellinger H( , ) distance as change measure -- i.e. a symmetrical KLD nn:= token-prefer dist.
  • 67. Goal: Visualize & measure transfer in neural nets zero-shot transfer f1 f2 f3 fk f_ Xpre pretrain E on Xpre f1 f2 f3 fk f_ Xend eval. E on new Xend (2): zero-shot (1): pre-training E pk fk := token / POS nn:= token-activate dist. E How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre RQ (2) apply learned knowledge ● to new domain text Xend ● no neuron transfer if large change vs. ● neuron transfers if small change vs. ● activation distribution changes ● feed new text Xend to frozen pretrained encoder We use Hellinger H( , ) distance as change measure -- i.e. a symmetrical KLD nn:= token-prefer dist. We use Hellinger H( , ) distance as change/ transfer measure -- i.e. a symmetric KLD
  • 68. Goal: Visualise & measure transfer in neural nets zero-shot transfer f1 f2 f3 fk f_ Xpre pretrain E on Xpre f1 f2 f3 fk f_ Xend eval. E on new Xend (2): zero-shot (1): pre-training E pk fk := token / POS nn:= token-activate dist. E How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre RQ (2) apply learned knowledge ● to new domain text Xend We use Hellinger H( , ) distance as change measure -- i.e. a symmetrical KLD nn:= token-prefer dist. ● no neuron transfer if large change vs. ● neuron transfers if small change vs. ● activation distribution changes ● feed new text Xend to frozen pretrained encoder Over-specialisation: OOD-generalisation:
  • 69. Goal: Visualise & measure transfer in neural nets zero-shot transfer f1 f2 f3 fk f_ Xpre pretrain E on Xpre f1 f2 f3 fk f_ Xend eval. E on new Xend (2): zero-shot (1): pre-training E pk fk := token / POS nn:= token-activate dist. E How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre RQ (2) apply learned knowledge ● to new domain text Xend Transfer if encoder generalises well (activates similarly) on the new text Shows how able each neuron is generalise to new data(-distrib.)! Take-away (2): _____________ nn:= token-prefer dist.
  • 70. Goal: Visualise & measure transfer in neural nets zero-shot transfer supervised transfer f1 f2 f3 fk f_ Xpre pretrain E on Xpre f1 f2 f3 fk f_ Xend eval. E on new Xend f1 f2 f3 fk f_ Xend eval. Eend on Xend Yend Eend : E fit on Yend (2): zero-shot (3): supervised (1): pre-training E pk p1= .05 fk := token / POS pk := token probability nn:= token-activate dist. E E How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre RQ (2) apply learned knowledge ● to new domain text Xend RQ (3) adapt its knowledge ● to suit a supervision signal Yend on the new domain Xend (no more LM loss/ objective) nn:= token-prefer dist.
  • 71. Goal: Visualise & measure transfer in neural nets zero-shot transfer supervised transfer f1 f2 f3 fk f_ Xpre pretrain E on Xpre f1 f2 f3 fk f_ Xend eval. E on new Xend f1 f2 f3 fk f_ Xend eval. Eend on Xend Yend Eend : E fit on Yend (2): zero-shot (3): supervised (1): pre-training E pk p1= .05 fk := token / POS pk := token probability nn:= token-activate dist. E E How does each neuron RQ (1) abstract/ learn (textual) knowledge? ● during initial/ pre-training on text Xpre RQ (2) apply learned knowledge ● to new domain text Xend RQ (3) adapt its knowledge ● to suit a supervision signal Yend on the new domain Xend (no more LM loss/ objective) supervision specialises (re-fits) knowl. adds new knowledge and ‘avoids’ (sparsifies) old knowledge Take-away (3): _____________ nn:= token-prefer dist.
  • 72. Experiment: XAI to guide pruning Pretrain, then supervise ● pretrain language model on Wikitext-2 (1) ● fine tune to IMDB binary reviews (3)
  • 73. Experiment: XAI to guide pruning Pretrain, then supervise ● pretrain language model on Wikitext-2 (1) ● fine tune to IMDB binary reviews (3) Prune neurons that post supervision … ● were specialized (re-fit) ● became preferred (activated) ● were ‘avoided’ (now empty distrib.) → i.e, neuron has no max activations post supervision
  • 74. Take-Aways TX-Ray can explore generalisation and specialisation at individual neuron-level ● self-supervised pre-training builds general knowledge ○ preference spread across many neurons -- 89% maximally active (preferred) neurons ● zero-shot application shows match of model knowledge vs new domain ○ preference less spread -- 88% of neurons preferred, partial generalisation ● supervised knowledge fine-tuning sparsifies (concentrates) activation ○ preference peaked -- only 45% of neurons preferred, many become domain over-specialised preference activation sum (mass) per neuron -- sorted by sum !
  • 76. Overall Take-Aways • Why explainability? • understanding if a model is right for the right reasons • Generated explanations can help users understand: • inner workings of a model (model understanding) • how a model arrived at a prediction (decision understanding) • Explainability can enable: • human-in-the-loop model development • human-in-the-loop data selection
  • 77. Overall Take-Aways • Caveats: • There can be more than one correct explanation • Different explainability methods provide different explanations • Different streams of explainability methods have different benefits and downsides • Black box vs. white box • Hypothesis testing vs. bottom-up understanding • Requirements for annotated training data • Joint vs. post-hoc explanation generation • One-time analysis vs. continuous monitoring • Perform well w.r.t. different properties
  • 78. Where to from here? • Making explanations useful to model trainers and end users • How to interpret different explanations? • Explanations for models with large number of parameters • What-if analyses • Some research on generating explanations, relatively little work on understanding in what context they are useful • (Automatically) evaluating explanations • Human-in-the-loop development
  • 80. Thanks to my PhD students and collaborators! 80 CopeNLU https://copenlu.github.io/ Pepa Atanasova Dustin Wright Jakob Grue Simonsen Christina Lioma Vageesh Kumar Saxena Nils Rethmeier
  • 81. Presented Papers Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein. Generating Fact Checking Explanations. In Proceedings of ACL 2020. Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein. A Diagnostic Study of Explainability Techniques for Text Classification. EMNLP 2020. Pepa Atanasova*, Dustin Wright*, Isabelle Augenstein. Generating Label Cohesive and Well-Formed Adversarial Claims. EMNLP 2020. Nils Rethmeier, Vageesh Kumar Saxena, Isabelle Augenstein. TX-Ray: Quantifying and Explaining Model-Knowledge Transfer in (Un)Supervised NLP. In Proceedings of the Conference on UAI 2020. *equal contributions
  • 82. Hiring 2 PhD students, 1 postdoc – explainable stance detection funded by Danish Research Council (DFF) Sapere Aude grant application deadline PhD: 31 January 2021 start date: Summer/Autumn 2021 PhD: https://employment.ku.dk/faculty/?show=153150 Postdoc: https://tinyurl.com/yd6jw5nm 1 postdoc – explainable IE & QA funded by Innovation Foundation Denmark application deadline: 28 February 2021 start date: Autumn 2020 Postdoc: https://employment.ku.dk/faculty/?show=153308 82
  • 83. Reading Materials on Explainability Decision Understanding • Camburu, O. M., Shillingford, B., Minervini, P., Lukasiewicz, T., & Blunsom, P. (2020, July). Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 4157-4165). - Reverse explanations to help prevent inconsistencies • Slack, D., Hilgard, S., Jia, E., Singh, S., & Lakkaraju, H. (2020, February). Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 180-186). - Perturbation-based explanation methods be fooled easily, are potentially unreliable
  • 84. Reading Materials on Explainability Model Understanding - Geva, M., Schuster, R., Berant, J., & Levy, O. (2020). Transformer Feed-Forward Layers Are Key-Value Memories. arXiv preprint arXiv:2012.14913. - how to use activation maximisation to analyse memorisation in Transformers - Durrani, N., Sajjad, H., Dalvi, F., & Belinkov, Y. (2020). Analyzing Individual Neurons in Pre-trained Language Models. arXiv preprint arXiv:2010.02695. - pruning neurons and how many neurons are used per task
  • 85. Reading Materials on Explainability Surveys, Opinion Papers & Benchmarks • Eraser benchmark with explainability datasets: https://www.eraserbenchmark.com/ • Jacovi, A., & Goldberg, Y. (2020). Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness? In ACL 2020. • Jacovi, A., Marasović, A., Miller, T., & Goldberg, Y. (2020). Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI. arXiv preprint arXiv:2010.07487. • Kotonya, N., & Toni, F. (2020). Explainable Automated Fact-Checking: A Survey. In Coling 2020.
  • 86. Reading Materials on Explainability Tutorials • Yonatan Belinkov, Sebastian Gehrmann, Ellie Pavlick. Interpretability and Analysis in Neural NLP. Tutorial at ACL 2020. https://www.aclweb.org/anthology/2020.acl-tutorials.1/ • Eric Wallace, Matt Gardner, Sameer Singh. Interpreting Predictions of NLP Models. Tutorial at EMNLP 2020. https://www.aclweb.org/anthology/2020.emnlp-tutorials.3/ • Hima Lakkaraju, Julius Adebayo, Sameer. Explaining Machine Learning Predictions -- State-of-the-art, Challenges, and Opportunities. Tutorials at NeurIPS 2020, AAAI 2021. https://explainml- tutorial.github.io/ • Jay Alammar. Interfaces for Explaining Transformer Language Models. Blog, 2020. https://jalammar.github.io/explaining-transformers/