The document discusses building computational models that can robustly process noisy text like the human brain. It presents a character-level recurrent neural network model called semi-Character RNN (scRNN) that can accurately recognize words even when letters are reordered or inserted/deleted. The scRNN shows similar robustness as humans in an experiment where reading difficulty increased from intact text to progressively noisier text. It then discusses extending this work to the word level by incorporating repair actions into a dependency parser to handle grammatical errors in text.
10. 1. Character-level robust processing
Robsut Wrod Reocginiton
via semi-Character Recurrent Neural Network.
(AAAI 2017)
Keisuke Sakaguchi, Kevin Duh, Matt Post, Benjamin Van Durme
10
11. Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t
mttaer in waht oredr the ltteers in a wrod are, the olny
iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit
pclae. The rset can be a toatl mses and you can sitll raed it
wouthit porbelm. Tihs is bcuseae the huamn mnid deos not
raed ervey lteter by istlef, but the wrod as a wlohe.
(Cambridge University Effect: Davis 2003)
11
13. Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t
mttaer in waht oredr the ltteers in a wrod are, the olny
iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit
pclae. The rset can be a toatl mses and you can sitll raed it
wouthit porbelm. Tihs is bcuseae the huamn mnid deos not
raed ervey lteter by istlef, but the wrod as a wlohe.
(Cambridge University Effect: Davis 2003)
Human (brain) is good at dealing with noisy input robustly.
13
14. Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t
mttaer in waht oredr the ltteers in a wrod are, the olny
iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit
pclae. The rset can be a toatl mses and you can sitll raed it
wouthit porbelm. Tihs is bcuseae the huamn mnid deos not
raed ervey lteter by istlef, but the wrod as a wlohe.
(Cambridge University Effect: Davis 2003)
Question: Can we build a computational model which
replicates this robust mechanism?
Human (brain) is good at dealing with noisy input robustly.
14
15. Masked priming study
Swap (Forster et al. 1987) gadren-GARDEN
Shuffle (Perea and Lupker. 2004) caniso-CASINO
Delete (Humphreys et al. 1990) blck-BLACK
Insert (Van Assche and Grainger. 2006) juastice-JUSTICE
15
16. Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t
mttaer in waht oredr the ltteers in a wrod are, the olny
iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit
pclae. The rset can be a toatl mses and you can sitll raed it
wouthit porbelm. Tihs is bcuseae the huamn mnid deos not
raed ervey lteter by istlef, but the wrod as a wlohe.
(Cambridge University Effect: Davis 2003)
Question: Can we build a computational model which
replicates this robust mechanism?
Human (brain) is good at dealing with noisy input robustly.
16
17. semi-Character RNN (scRNN)
Simple RNN except …
xn =
2
4
bn
in
en
3
5
e.g., “University” is represented as
bn = {U = 1}
en = {y = 1}
in = {e = 1, i = 2, n = 1, s = 1, t = 1, v = 1}
17
19. Exp1: Spelling Correction
Three conditions in test time
- Jumble (Cambridge à Cmbarigde)
- Delete (Cambridge à Camridge)
- Insert (Cambridge à Cambpridge)
Results (accuracy):
Jumble Delete Insert
CharCNN 16.2 19.8 35.5
Enchant 57.6 35.4 89.6
Commercial A 54.8 60.2 93.5
Commercial B 54.3 71.7 73.5
scRNN 99.4 85.6 97.0
19
20. Exp1: Spelling Correction
Three conditions in test time
- Jumble (Cambridge à Cmbarigde)
- Delete (Cambridge à Camridge)
- Insert (Cambridge à Cambpridge)
Results (accuracy):
Jumble Delete Insert
CharCNN 16.2 19.8 35.5
Enchant 57.6 35.4 89.6
Commercial A 54.8 60.2 93.5
Commercial B 54.3 71.7 73.5
scRNN 99.4 85.6 97.0
place
à pace
miss, mass, mess
à mss
20
23. Eye tracking studyCondition
Example
#fixation
Regression
(%)
Avg.Fixation
(ms)
N The boy could not solve the problem so he asked for help.
INT The boy cuold not slove the probelm so he aksed for help.
END The boy coudl not solev the problme so he askde for help.
BEG The boy oculd not oslve the rpoblem so he saked for help.
Rayner et al. (2006)
23
24. Eye tracking studyCondition
Example
#fixation
Regression
(%)
Avg.Fixation
(ms)
N The boy could not solve the problem so he asked for help. 10.4 15.0 236
INT The boy cuold not slove the probelm so he aksed for help. 11.4* 17.6* 244*
END The boy coudl not solev the problme so he askde for help. 12.6† 17.5* 246†
BEG The boy oculd not oslve the rpoblem so he saked for help. 13.0‡ 21.5† 259‡
Rayner et al. (2006)
p<0.01 respectively
24
25. Eye tracking studyCondition
Example
#fixation
Regression
(%)
Avg.Fixation
(ms)
N The boy could not solve the problem so he asked for help. 10.4 15.0 236
INT The boy cuold not slove the probelm so he aksed for help. 11.4* 17.6* 244*
END The boy coudl not solev the problme so he askde for help. 12.6† 17.5* 246†
BEG The boy oculd not oslve the rpoblem so he saked for help. 13.0‡ 21.5† 259‡
Rayner et al. (2006)
Reading difficulty: N < INT ≤ END < BEG
p<0.01 respectively
25
26. Exp2: Comparison with eye tracking
Reading difficulty (human) : N < INT ≤ END < BEG
Trained and tested with 4 conditions:
INT: same as the exp.1
BEG: last char is fixed
END: first char is fixed
ALL: bag of characters
26
27. Exp2: Comparison with eye tracking
Reading difficulty (human) : N < INT ≤ END < BEG
Condition
Example
accuracy
INT As a relust , the lnik beewetn the fureuts and sctok mretkas rpiped arapt . 98.96
END As a rtelus , the lkni betwene the feturus and soctk msatrek rpepid atarp . 98.68*
BEG As a lesurt , the lnik bweteen the utufers and tocsk makrtes pipred arpat . 98.12†
ALL As a strule , the lnik eewtneb the eftusur and okcst msretak ipdepr prtaa . 96.79‡
*: p = 0.07, †.‡: p<0.01 respectively
27
28. Exp2: Comparison with eye tracking
Reading difficulty (human) : N < INT ≤ END < BEG
Condition
Example
accuracy
INT As a relust , the lnik beewetn the fureuts and sctok mretkas rpiped arapt . 98.96
END As a rtelus , the lkni betwene the feturus and soctk msatrek rpepid atarp . 98.68*
BEG As a lesurt , the lnik bweteen the utufers and tocsk makrtes pipred arpat . 98.12†
ALL As a strule , the lnik eewtneb the eftusur and okcst msretak ipdepr prtaa . 96.79‡
Reading difficulty (scRNN) : INT ≤ END < BEG < ALL
*: p = 0.07, †.‡: p<0.01 respectively
28
29. Summary so far …
1. Huamn mnid deos not raed
ervey lteter by istlef, but the
wrod as a wlohe.
2. scRNN recognizes noisy
words robustly.
3. There is a similarity
between scRNN and human
word recognition mechanism.
Forward Mask
(500 milliseconds)
GARDEN
gadren
########
Prime
(60 milliseconds)
Target
29
31. 2. Word-level robust processing
Error-repair Dependency Parsing for Ungrammatical
Texts (ACL 2017)
Keisuke Sakaguchi, Matt Post, Benjamin Van Durme
31
32. Dependency Parsing
Text à Tree (with labels)
Economic news had little effect on financial markets .
32
33. Background & Motivation
I look in forward hear from you.
I look forward to hearing from you.
Error correction
↓
Parsing
Pipeline
Error-repair
parsing
Joint training
33
34. Error-repair Dependency Parsing
1. Non-directional Easy-first parsing
(Goldberg and Elhadad, 2010)
2. Three new actions to repair errors
34
37. Non-directional Easy-first Parsing
a brown fox jumped with joy
a brown joywith
joy
fox
a brown
ATTACHRIGHT(𝑖)
ATTACHLEFT(𝑖)
Iteratively take actions until a complete tree is built.
37
50. Three new actions to repair errors
SUBSTITUTE (𝑤%) replaces a token to another
(grammatically more probable) token
DELETE (𝑤%) removes an unnecessary token
INSERT (𝑤%) inserts a new token at an index i.
50
51. Three new actions to repair errors
I look in forward xhearx from you
I youyou
51
52. I look in forward xhearx from you
I youyou
Three new actions to repair errors
ATTACHRIGHT
ATTACHLEFT
52
53. I look in forward xhearx from you
I youyou
Three new actions to repair errors
SUBSTITUTE / DELETE / INSERT
53
54. ATTACHRIGHT
I look in forward xhearx from you
I youyou
Three new actions to repair errors 54
55. I look in forward xhearx from you
I youyou
Three new actions to repair errors 55
56. ATTACHLEFT
I look in forward xhearx from you
I youyou
Three new actions to repair errors 56
57. Three new actions to repair errors
I look in forward xhearx from you
I youyou
57
58. Three new actions to repair errors
SUBSTITUTE
I look in forward xhearx from you
I youyou
58
59. Three new actions to repair errors
I look in forward hearing from you
I youyou
59
60. Three new actions to repair errors
DELETE
I look in forward hearing from you
I youyou
60
61. Three new actions to repair errors
I look forward hearing from from you
I youyou
61
62. Three new actions to repair errors
INSERT
I look forward hearing from from you
I youyou
62
63. Three new actions to repair errors
I look forward to hearing from you
I youyou
63
64. Three new actions to repair errors
ATTACHLEFT
I look forward to hearing from you
I youyou
64
65. Three new actions to repair errors
I look look to hearing from you
I youyouI forward
65
66. We are ready to parse noisy texts … ?
Wait!! The new actions may cause infinite loops.
SUB à SUB à SUB à …
INS à DEL à INS à DEL à ...
66
67. We are ready to parse noisy texts … ?
Wait!! The new actions may cause infinite loops.
SUB à SUB à SUB à …
INS à DEL à INS à DEL à ...
Heuristic constraints to avoid infinite loops
1. Limiting the number of new action operations
2. Substituted token cannot be substituted again
67
68. Training the parser
Model learns which action to take at each time step.
structured perceptron + learning with exploration
(Goldberg and Nivre, 2013)
features: basic linguistic features
(Goldberg and Elhadad 2010)
68
69. Training the parser
How to know which action is good (i.e., oracle, valid)?
ATTACHLEFT & ATTACHRIGHT (Goldberg and Elhadad, 2010)
1. proposed edge is in the gold parse and
2. the child (to be attached) already has all its children
SUBSTITUTE, DELETE, & INSERT
3. proposed action decreases the (word) edit distance
to the gold (grammatical) sentence.
69
70. Experiment 1 (simulated data)
Dependency parsing on noisy Penn Treebank
Errors injected similarly to Foster and Andersen (2009)
5 most frequent grammatical errors (CoNLL13)
• Determiner (substitution, deletion, insertion)
• Preposition (substitution, deletion, insertion)
• Noun number (singular vs. plural)
• Verb form (tense and aspect)
• Subject verb agreement
Eval: UAS by SParseval (Roark et al., 2006, Favre et al., 2010)
Baseline: pipeline approach (error correction à parsing)
70
72. Experiment 2 (real data)
Grammaticality improvement on real ESL corpus
Treebank of Learner English (Berzak et al., 2016)
Grammaticality score (Heilman et al., 2014)
Regression model with linguistic features
1 (incomprehensible) ~ 4 (perfect)
72
74. Summary so far
Error-repair Dependency Parsing
1. Non-directional Easy-first Parsing
2. Three new actions to repair errors
Experimental results
1. more robust against grammatical errors
2. improves grammaticality
I look in forward xhearx from you
I youyou
74
76. 3. Sentence-level robust processing
3.3. Building a GEC model
Grammatical Error Correction with Neural
Reinforcement Learning (IJCNLP 2017)
Keisuke Sakaguchi, Matt Post, Benjamin Van Durme
76
78. Grammatical Error Correction (GEC)
Ungrammatical
sentence
Grammatical
& Fluent
sentence
o Rule based model
o Classifiers
o Phrase-based MT
o Neural MT
78
79. Grammatical Error Correction (GEC)
Ungrammatical
sentence
Grammatical
& Fluent
sentence
o Rule based model
o Classifiers
o Phrase-based MT
o Neural MT
79
80. Neural MT for GEC (Encoder-decoder with attention)
・・・
x2 xS-1 xSx1
Encoder
80
81. Neural MT for GEC (Encoder-decoder with attention)
・・・
x2 xS-1 xSx1
NULL
y1
Encoder
Decoder
81
82. Neural MT for GEC (Encoder-decoder with attention)
・・・
x2 xS-1 xSx1
+
NULL
y1 y2
Encoder
Decoder
82
84. Neural MT for GEC (Encoder-decoder with attention)
Training objective: Maximum Likelihood Estimation
・・・
log 𝑝(𝑦,)
log 𝑝(𝑦-./)
log 𝑝(𝑦-)
gold label
log 𝑝(𝑦/)
NULL
Decoder
84
85. Two Drawbacks in MLE
#1 Word level optimization (not sentence-level)
・・・
log 𝑝(𝑦,)
log 𝑝(𝑦-./)
log 𝑝(𝑦-)
gold label
log 𝑝(𝑦/)
NULL
Decoder
85
86. Two Drawbacks in MLE
#2 Exposure Bias (gold in training, argmax in test)
・・・
gold label
NULL
Predicted word (might be erroneous) is fed during test time.
y’1 = y1
y’2
y2
y’T-1
yT-1
yT
y’T
Decoder
86
90. REINFORCE (Williams, 1992)
Maximize the expected reward (metric score)
Learning Rate
Relevance to Minimum Risk Training in NMT:
Learning rate 𝜶 in REINFORCE corresponds to
the smoothing parameter in MRT.
See the appendix.
90
91. Experiment
Data:
Training: Cambridge Learner Corpus (FCE)
NUCLE Corpus
Lang8 Corpus
Dev & Test: JFLEG Corpus
Model (hyper-)parameters:
Embedding: 512, Hidden: 1000, Dropout: 0.2,
(for NRL)
Sample size: 20, warm start: after 600k updates in MLE
Metric (= score, reward):
GLEU (Napoles et al., 2015)
91
97. Summary so far…
Grammatical Error Correction with NRL
ü Sentence-level objective.
ü Direct optimization toward the metric.
ü NRL > Maximum Likelihood Estimation
97
98. Conclusions
Robust Text Correction for Grammar and Fluency
1. Character-level
2. Word-level
3. Sentence (phrase)-level
I look in forward xhearx from you
I youyou
Fluency
98