SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
Explaining Character-Aware Neural
Networks for Word-Level Prediction
Frederic Godin, Kris Demuynck, Joni Dambre, Wesley Deneve and Thomas Demeester
Department of Electronics and Information Systems
Ghent University, Belgium
Do They Discover Linguistic Rules?
Introduction
2
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Example: Rule-based tagger for PoS tagging
Brill (1994)’s transformation-based error-driven tagger
3
Template
Change the most-likely tag X to
Y if the last (1,2,3,4) characters
of the word are x
Rule
Change the tag common noun to
plural common noun if the word has
suffix -s
Easily interpretable
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Interpretability in NLP used to be easy
Rule-based/Tree-based models
Shallow statistical models (E.g., Logistic regression, CRF)
4
Very transparent: follow the trace
Essentially: weight + feature
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Current NLP interpretability...
5
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Our proposed method
6
We present contextual decomposition (CD) for CNNs
- Extends CD for LSTMs (Murdoch et al. 2018)
- White box approach to interpretability
We trace back morphological tagging decisions to the
character-level
- Which characters are important?
- Same patterns as linguistically known?
- Difference CNN and BiLSTM?
Contextual decomposition
for CNNs
7
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition
Idea: every output value can be “decomposed” in
- Relevant contributions originating from the input we are interested in
(E.g., some characters)
- Irrelevant contributions originating from all the other inputs (E.g., all
the other characters in a word)
8
CNNeconomicas plural
economicas
economicas
economicas
economicas
Relevant
relevant irrelevantrelevant
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition for CNNs
Three main components of CNN
̶ Convolution
̶ Activation function
̶ Max-over-time pooling
Classification layer
9
^ e c o n o m i c a s $
...
Max over time
FC
Gender = feminine
CNN filters
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition for CNNs: Convolution
Output of single convolutional filter at timestep t:
10
Relevant Irrelevant
n = filter size
S = Indexes of of relevant inputs
Wi = i-th column of filter W
^ e c o n o m i c a s $
Indexes: 8, 9, 10, 11
9 8, 10, 11
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition for CNNs: Activation func.
Goal: Linearize activation function to be able to split output.
Linearization formula:
11
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition for CNNs: Max pooling
Max-over-time pooling:
Determine t first and just copy that split:
12
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition of classification layer
Probability of certain class:
13
We simplify:
Relevant contribution to class j
Experiments
14
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Task
15
Morphological tagging: predict morphological labels for a word (gender,
tense, singular/plural,..)
economicas
For a subset of words, we have manual segmentations and
annotations
lemma=económico
gender=feminine
number=plural
economicas
lemma=económico
gender=feminine
number=pluraleconomicas
economicas
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Datasets
Universal dependencies 1.4:
̶ Finnish, Spanish and Swedish
̶ Select all unique words and their morphological labels
Manual annotations and segmentations of 300 test set words
16
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Architectures: CNN vs BiLSTM
17
^ e c o n o m i c a s $
FC
Gender = feminine
^ e c o n o m i c a s $
...
Max over time
FC
Gender = feminine
CNN filters
CNN BiLSTM
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Do the NN patterns follow manual segmentations?
18
All = every possible combination of characters
Cons = all consecutive character n-grams
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Visualizing contributions: 1 character
19
Spanish
^ g r a t u i t a $
Label: Gender=feminine
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Visualizing contributions: 2 characters (Swedish)
20
CNN BiLSTM
^ k r o n o r $ ^ k r o n o r $
^
k
r
o
n
o
r
$
^
k
r
o
n
o
r
$
Label: number=plural
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Most important patterns per language: Spanish
21
Linguistic rules for feminine gender:
- Feminine adjectives often end with “a”
- Nouns ending with “dad” or “ión” are often feminine
Found pattern:
- “a” is a very important pattern
- “dad” and “sió” are import trigrams
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Most important patterns per language: Swedish
22
Linguistic rules for plural form:
- 5 suffixes: or, ar, (e)r, n, and no ending
“na” is definite article in plural forms
Found pattern:
- “or” and “ar”
- But also “na” and “rn”
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Interactions/compositions of patterns
How do positive and negative patterns interact?
Consider the Spanish verb “gusta”
- Gender=Not Applicable (NA)
- We know that suffix “a” is indicator for gender=feminine
23
Consider most positive/negative set of characters per class:
The stem provides counterevidence for gender=feminine
Conclusion
24
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Summary
We introduced a white box approach to understanding CNNs
We showed that:
̶ BiLSTMs and CNNs sometimes choose different patterns
̶ The learned patterns coincide with our linguistic knowledge
̶ Sometimes other plausible patterns are used
25
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Questions?
26
Fréderic Godin
Ph.D. Researcher Deep Learning and NLP
IDLab
E frederic.godin@ugent.be
@frederic_godin
www.fredericgodin.com
idlab.technology / idlab.ugent.be

Más contenido relacionado

La actualidad más candente

Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityModular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Jie Bao
 

La actualidad más candente (9)

NLP using Deep learning
NLP using Deep learningNLP using Deep learning
NLP using Deep learning
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
Probabilistic content models,
Probabilistic content models,Probabilistic content models,
Probabilistic content models,
 
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityModular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
 
Language Interaction and Quality Issues: An Exploratory Study
Language Interaction and Quality Issues: An Exploratory StudyLanguage Interaction and Quality Issues: An Exploratory Study
Language Interaction and Quality Issues: An Exploratory Study
 

Similar a Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
Mustafa Jarrar
 
Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...
Ana Luísa Pinho
 
Segmenting dna sequence into words
Segmenting dna sequence into wordsSegmenting dna sequence into words
Segmenting dna sequence into words
Liang Wang
 

Similar a Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules? (20)

CS571: Distributional semantics
CS571: Distributional semanticsCS571: Distributional semantics
CS571: Distributional semantics
 
Framester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data HubFramester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data Hub
 
Zadeh Bisc2004
Zadeh Bisc2004Zadeh Bisc2004
Zadeh Bisc2004
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Basics of coding theory
Basics of coding theoryBasics of coding theory
Basics of coding theory
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...
 
Research Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberResearch Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, Gruber
 
Segmenting dna sequence into words
Segmenting dna sequence into wordsSegmenting dna sequence into words
Segmenting dna sequence into words
 
Crash-course in Natural Language Processing
Crash-course in Natural Language ProcessingCrash-course in Natural Language Processing
Crash-course in Natural Language Processing
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural Logic
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question Answering
 
Lean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedLean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction Revisited
 

Último

The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 

Último (20)

The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

  • 1. Explaining Character-Aware Neural Networks for Word-Level Prediction Frederic Godin, Kris Demuynck, Joni Dambre, Wesley Deneve and Thomas Demeester Department of Electronics and Information Systems Ghent University, Belgium Do They Discover Linguistic Rules?
  • 3. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Example: Rule-based tagger for PoS tagging Brill (1994)’s transformation-based error-driven tagger 3 Template Change the most-likely tag X to Y if the last (1,2,3,4) characters of the word are x Rule Change the tag common noun to plural common noun if the word has suffix -s Easily interpretable
  • 4. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Interpretability in NLP used to be easy Rule-based/Tree-based models Shallow statistical models (E.g., Logistic regression, CRF) 4 Very transparent: follow the trace Essentially: weight + feature
  • 5. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Current NLP interpretability... 5
  • 6. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Our proposed method 6 We present contextual decomposition (CD) for CNNs - Extends CD for LSTMs (Murdoch et al. 2018) - White box approach to interpretability We trace back morphological tagging decisions to the character-level - Which characters are important? - Same patterns as linguistically known? - Difference CNN and BiLSTM?
  • 8. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition Idea: every output value can be “decomposed” in - Relevant contributions originating from the input we are interested in (E.g., some characters) - Irrelevant contributions originating from all the other inputs (E.g., all the other characters in a word) 8 CNNeconomicas plural economicas economicas economicas economicas Relevant relevant irrelevantrelevant
  • 9. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition for CNNs Three main components of CNN ̶ Convolution ̶ Activation function ̶ Max-over-time pooling Classification layer 9 ^ e c o n o m i c a s $ ... Max over time FC Gender = feminine CNN filters
  • 10. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition for CNNs: Convolution Output of single convolutional filter at timestep t: 10 Relevant Irrelevant n = filter size S = Indexes of of relevant inputs Wi = i-th column of filter W ^ e c o n o m i c a s $ Indexes: 8, 9, 10, 11 9 8, 10, 11
  • 11. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition for CNNs: Activation func. Goal: Linearize activation function to be able to split output. Linearization formula: 11
  • 12. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition for CNNs: Max pooling Max-over-time pooling: Determine t first and just copy that split: 12
  • 13. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition of classification layer Probability of certain class: 13 We simplify: Relevant contribution to class j
  • 15. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Task 15 Morphological tagging: predict morphological labels for a word (gender, tense, singular/plural,..) economicas For a subset of words, we have manual segmentations and annotations lemma=económico gender=feminine number=plural economicas lemma=económico gender=feminine number=pluraleconomicas economicas
  • 16. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Datasets Universal dependencies 1.4: ̶ Finnish, Spanish and Swedish ̶ Select all unique words and their morphological labels Manual annotations and segmentations of 300 test set words 16
  • 17. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Architectures: CNN vs BiLSTM 17 ^ e c o n o m i c a s $ FC Gender = feminine ^ e c o n o m i c a s $ ... Max over time FC Gender = feminine CNN filters CNN BiLSTM
  • 18. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Do the NN patterns follow manual segmentations? 18 All = every possible combination of characters Cons = all consecutive character n-grams
  • 19. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Visualizing contributions: 1 character 19 Spanish ^ g r a t u i t a $ Label: Gender=feminine
  • 20. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Visualizing contributions: 2 characters (Swedish) 20 CNN BiLSTM ^ k r o n o r $ ^ k r o n o r $ ^ k r o n o r $ ^ k r o n o r $ Label: number=plural
  • 21. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Most important patterns per language: Spanish 21 Linguistic rules for feminine gender: - Feminine adjectives often end with “a” - Nouns ending with “dad” or “ión” are often feminine Found pattern: - “a” is a very important pattern - “dad” and “sió” are import trigrams
  • 22. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Most important patterns per language: Swedish 22 Linguistic rules for plural form: - 5 suffixes: or, ar, (e)r, n, and no ending “na” is definite article in plural forms Found pattern: - “or” and “ar” - But also “na” and “rn”
  • 23. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Interactions/compositions of patterns How do positive and negative patterns interact? Consider the Spanish verb “gusta” - Gender=Not Applicable (NA) - We know that suffix “a” is indicator for gender=feminine 23 Consider most positive/negative set of characters per class: The stem provides counterevidence for gender=feminine
  • 25. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Summary We introduced a white box approach to understanding CNNs We showed that: ̶ BiLSTMs and CNNs sometimes choose different patterns ̶ The learned patterns coincide with our linguistic knowledge ̶ Sometimes other plausible patterns are used 25
  • 26. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Questions? 26
  • 27. Fréderic Godin Ph.D. Researcher Deep Learning and NLP IDLab E frederic.godin@ugent.be @frederic_godin www.fredericgodin.com idlab.technology / idlab.ugent.be