文法および流暢性を考慮した頑健なテキスト誤り訂正 (第15回ステアラボ人工知能セミナー)

文法および流暢性を考慮した
頑健なテキスト誤り訂正
坂口慶祐
1

Natural Language Processing
Algorithm
Algorithm
Algorithm
Algorithm
Algorithm
…
POS
Parse
Sentiment
Paraphrase
Translation
…
2

Natural Language Processing
Algorithm
Algorithm
Algorithm
Algorithm
Algorithm
…
POS
Parse
Sentiment
Paraphrase
Translation
…
3

Outline
Robust Text Correction for Grammar and Fluency
1. Character-level
2. Word-level
3. Sentence (phrase)-level
8

Outline
1. Character-level
2. Word-level
9

1. Character-level robust processing
Robsut Wrod Reocginiton
via semi-Character Recurrent Neural Network.
(AAAI 2017)
Keisuke Sakaguchi, Kevin Duh, Matt Post, Benjamin Van Durme
10

Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t
mttaer in waht oredr the ltteers in a wrod are, the olny
iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit
pclae. The rset can be a toatl mses and you can sitll raed it
wouthit porbelm. Tihs is bcuseae the huamn mnid deos not
raed ervey lteter by istlef, but the wrod as a wlohe.
(Cambridge University Effect: Davis 2003)
11

Human (brain) is good at dealing with noisy input robustly.
13

Question: Can we build a computational model which
replicates this robust mechanism?
14

Masked priming study
Swap (Forster et al. 1987) gadren-GARDEN
Shuffle (Perea and Lupker. 2004) caniso-CASINO
Delete (Humphreys et al. 1990) blck-BLACK
Insert (Van Assche and Grainger. 2006) juastice-JUSTICE
15

Question: Can we build a computational model which
replicates this robust mechanism?
16

semi-Character RNN (scRNN)
Simple RNN except …
xn =
2
4
bn
in
en
3
5
e.g., “University” is represented as
bn = {U = 1}
en = {y = 1}
in = {e = 1, i = 2, n = 1, s = 1, t = 1, v = 1}
17

Exp1: Spelling Correction
Corpus: Penn TreeBank, (10k vocabulary)
Parameters:
- hidden unit size: 650
- mini-batch size 20
Comparison:
- Enchant
- 2 commercial products
- char aware neural LM
(Kim et al., 2016, AAAI)
18

Three conditions in test time
- Jumble (Cambridge à Cmbarigde)
- Delete (Cambridge à Camridge)
- Insert (Cambridge à Cambpridge)
Results (accuracy):
Jumble Delete Insert
CharCNN 16.2 19.8 35.5
Enchant 57.6 35.4 89.6
Commercial A 54.8 60.2 93.5
Commercial B 54.3 71.7 73.5
scRNN 99.4 85.6 97.0
19

Three conditions in test time
- Jumble (Cambridge à Cmbarigde)
- Delete (Cambridge à Camridge)
- Insert (Cambridge à Cambpridge)
Results (accuracy):
Jumble Delete Insert
CharCNN 16.2 19.8 35.5
Enchant 57.6 35.4 89.6
Commercial A 54.8 60.2 93.5
Commercial B 54.3 71.7 73.5
scRNN 99.4 85.6 97.0
place
à pace
miss, mass, mess
à mss
20

Exp2: Comparison with eye tracking 21

Eye tracking studyCondition
Example
#fixation
Regression
(%)
Avg.Fixation
(ms)
N The boy could not solve the problem so he asked for help.
INT The boy cuold not slove the probelm so he aksed for help.
END The boy coudl not solev the problme so he askde for help.
BEG The boy oculd not oslve the rpoblem so he saked for help.
Rayner et al. (2006)
23

Example
#fixation
Regression
(%)
Avg.Fixation
(ms)
N The boy could not solve the problem so he asked for help. 10.4 15.0 236
INT The boy cuold not slove the probelm so he aksed for help. 11.4* 17.6* 244*
END The boy coudl not solev the problme so he askde for help. 12.6† 17.5* 246†
BEG The boy oculd not oslve the rpoblem so he saked for help. 13.0‡ 21.5† 259‡
p<0.01 respectively
24

Example
#fixation
Regression
(%)
Avg.Fixation
(ms)
N The boy could not solve the problem so he asked for help. 10.4 15.0 236
INT The boy cuold not slove the probelm so he aksed for help. 11.4* 17.6* 244*
END The boy coudl not solev the problme so he askde for help. 12.6† 17.5* 246†
BEG The boy oculd not oslve the rpoblem so he saked for help. 13.0‡ 21.5† 259‡
Reading difficulty: N < INT ≤ END < BEG
p<0.01 respectively
25

Exp2: Comparison with eye tracking
Reading difficulty (human) : N < INT ≤ END < BEG
Trained and tested with 4 conditions:
INT: same as the exp.1
BEG: last char is fixed
END: first char is fixed
ALL: bag of characters
26

Condition
Example
accuracy
INT As a relust , the lnik beewetn the fureuts and sctok mretkas rpiped arapt . 98.96
END As a rtelus , the lkni betwene the feturus and soctk msatrek rpepid atarp . 98.68*
BEG As a lesurt , the lnik bweteen the utufers and tocsk makrtes pipred arpat . 98.12†
ALL As a strule , the lnik eewtneb the eftusur and okcst msretak ipdepr prtaa . 96.79‡
*: p = 0.07, †.‡: p<0.01 respectively
27

Condition
Example
accuracy
INT As a relust , the lnik beewetn the fureuts and sctok mretkas rpiped arapt . 98.96
END As a rtelus , the lkni betwene the feturus and soctk msatrek rpepid atarp . 98.68*
BEG As a lesurt , the lnik bweteen the utufers and tocsk makrtes pipred arpat . 98.12†
ALL As a strule , the lnik eewtneb the eftusur and okcst msretak ipdepr prtaa . 96.79‡
Reading difficulty (scRNN) : INT ≤ END < BEG < ALL
*: p = 0.07, †.‡: p<0.01 respectively
28

Summary so far …
1. Huamn mnid deos not raed
ervey lteter by istlef, but the
wrod as a wlohe.
2. scRNN recognizes noisy
words robustly.
3. There is a similarity
between scRNN and human
word recognition mechanism.
Forward Mask
(500 milliseconds)
GARDEN
gadren
########
Prime
(60 milliseconds)
Target
29

Outline
1. Character-level
2. Word-level
30

2. Word-level robust processing
Error-repair Dependency Parsing for Ungrammatical
Texts (ACL 2017)
Keisuke Sakaguchi, Matt Post, Benjamin Van Durme
31

Dependency Parsing
Text à Tree (with labels)
Economic news had little effect on financial markets .
32

Background & Motivation
I look in forward hear from you.
I look forward to hearing from you.
Error correction
↓
Parsing
Pipeline
Error-repair
parsing
Joint training
33

Error-repair Dependency Parsing
1. Non-directional Easy-first parsing
(Goldberg and Elhadad, 2010)
2. Three new actions to repair errors
34

Non-directional Easy-first Parsing
a brown fox jumped with joy
a brown joywith
joy
fox
a brown
35

a brown joywith
joy
fox
a brown
Pending List
36

a brown joywith
joy
fox
a brown
ATTACHRIGHT(𝑖)
ATTACHLEFT(𝑖)
Iteratively take actions until a complete tree is built.
37

a brown joywith
joy
fox
a brown
38

ATTACHRIGHT
a brown joywith
joy
fox
a brown
39

a a fox jumped with joy
a brown joywith
joy
fox
a brown
40

ATTACHRIGHT
a a fox jumped with joy
a brown joywith
joy
fox
a brown
41

a brown joywith
joy
fox
a brown
42

ATTACHLEFT
a brown joywith
joy
fox
a brown
43

a brown joywith
joy
fox
a brown
44

ATTACHLEFT
a brown joywith
joy
fox
a brown
45

a brown joywith
joy
fox
a brown
46

ATTACHRIGHT
a brown joywith
joy
fox
a brown
47

a brown joywith
joy
fox
a brown
48

a brown joywith
joy
fox
a brown
root
49

Three new actions to repair errors
SUBSTITUTE (𝑤%) replaces a token to another
(grammatically more probable) token
DELETE (𝑤%) removes an unnecessary token
INSERT (𝑤%) inserts a new token at an index i.
50

I look in forward xhearx from you
I youyou
51

I youyou
ATTACHRIGHT
ATTACHLEFT
52

I youyou
SUBSTITUTE / DELETE / INSERT
53

ATTACHRIGHT
I youyou
Three new actions to repair errors 54

I youyou

ATTACHLEFT
I youyou

I youyou
57

SUBSTITUTE
I youyou
58

I look in forward hearing from you
I youyou
59

DELETE
I look in forward hearing from you
I youyou
60

I look forward hearing from from you
I youyou
61

INSERT
I look forward hearing from from you
I youyou
62

I look forward to hearing from you
I youyou
63

ATTACHLEFT
I look forward to hearing from you
I youyou
64

I look look to hearing from you
I youyouI forward
65

We are ready to parse noisy texts … ?
Wait!! The new actions may cause infinite loops.
SUB à SUB à SUB à …
INS à DEL à INS à DEL à ...
66

We are ready to parse noisy texts … ?
Wait!! The new actions may cause infinite loops.
SUB à SUB à SUB à …
INS à DEL à INS à DEL à ...
Heuristic constraints to avoid infinite loops
1. Limiting the number of new action operations
2. Substituted token cannot be substituted again
67

Training the parser
Model learns which action to take at each time step.
structured perceptron + learning with exploration
(Goldberg and Nivre, 2013)
features: basic linguistic features
(Goldberg and Elhadad 2010)
68

Training the parser
How to know which action is good (i.e., oracle, valid)?
ATTACHLEFT & ATTACHRIGHT (Goldberg and Elhadad, 2010)
1. proposed edge is in the gold parse and
2. the child (to be attached) already has all its children
SUBSTITUTE, DELETE, & INSERT
3. proposed action decreases the (word) edit distance
to the gold (grammatical) sentence.
69

Experiment 1 (simulated data)
Dependency parsing on noisy Penn Treebank
Errors injected similarly to Foster and Andersen (2009)
5 most frequent grammatical errors (CoNLL13)
• Determiner (substitution, deletion, insertion)
• Preposition (substitution, deletion, insertion)
• Noun number (singular vs. plural)
• Verb form (tense and aspect)
• Subject verb agreement
Eval: UAS by SParseval (Roark et al., 2006, Favre et al., 2010)
Baseline: pipeline approach (error correction à parsing)
70

Experiment 2 (real data)
Grammaticality improvement on real ESL corpus
Treebank of Learner English (Berzak et al., 2016)
Grammaticality score (Heilman et al., 2014)
Regression model with linguistic features
1 (incomprehensible) ~ 4 (perfect)
72

Result (Grammaticality on learner corpus)
*
*
73

Summary so far
Error-repair Dependency Parsing
1. Non-directional Easy-first Parsing
2. Three new actions to repair errors
Experimental results
1. more robust against grammatical errors
2. improves grammaticality
I youyou
74

Outline
1. Character-level
2. Word-level
75

3. Sentence-level robust processing
3.3. Building a GEC model
Grammatical Error Correction with Neural
Reinforcement Learning (IJCNLP 2017)
Keisuke Sakaguchi, Matt Post, Benjamin Van Durme
76

Grammatical Error Correction (GEC)
Ungrammatical
sentence
Grammatical
& Fluent
sentence
GEC algorithms
77

Ungrammatical
sentence
Grammatical
& Fluent
sentence
o Rule based model
o Classifiers
o Phrase-based MT
o Neural MT
78

Ungrammatical
sentence
Grammatical
& Fluent
sentence
o Rule based model
o Classifiers
o Phrase-based MT
o Neural MT
79

Neural MT for GEC (Encoder-decoder with attention)
・・・
x2 xS-1 xSx1
Encoder
80

・・・
x2 xS-1 xSx1
NULL
y1
Encoder
Decoder
81

・・・
x2 xS-1 xSx1
+
NULL
y1 y2
Encoder
Decoder
82

・・・
x2 xS-1 xSx1
+
NULL
・・・
y1 y2 yT-1 yT
Encoder
Decoder
83

Training objective: Maximum Likelihood Estimation
・・・
log 𝑝(𝑦,)
log 𝑝(𝑦-./)
log 𝑝(𝑦-)
gold label
log 𝑝(𝑦/)
NULL
Decoder
84

Two Drawbacks in MLE
#1 Word level optimization (not sentence-level)
・・・
log 𝑝(𝑦,)
log 𝑝(𝑦-./)
log 𝑝(𝑦-)
gold label
log 𝑝(𝑦/)
NULL
Decoder
85

Two Drawbacks in MLE
#2 Exposure Bias (gold in training, argmax in test)
・・・
gold label
NULL
Predicted word (might be erroneous) is fed during test time.
y’1 = y1
y’2
y2
y’T-1
yT-1
yT
y’T
Decoder
86

Sentence level (direct) optimization
Decode a sentence and compute the score
Decoder
87

Sentence level (direct) optimization
. . .
. . .
Maximize the expected reward (metric score)
Decoder
88

REINFORCE (Williams, 1992)
Learning Rate (arbitrary) Baseline
89

REINFORCE (Williams, 1992)
Learning Rate
Relevance to Minimum Risk Training in NMT:
Learning rate 𝜶 in REINFORCE corresponds to
the smoothing parameter in MRT.
See the appendix.
90

Experiment
Data:
Training: Cambridge Learner Corpus (FCE)
NUCLE Corpus
Lang8 Corpus
Dev & Test: JFLEG Corpus
Model (hyper-)parameters:
Embedding: 512, Hidden: 1000, Dropout: 0.2,
(for NRL)
Sample size: 20, warm start: after 600k updates in MLE
Metric (= score, reward):
GLEU (Napoles et al., 2015)
91

Results
40
45
50
55
60
65
SRC CAMB14 NUS AMU CAMB16 MLE NRL Human
SRC
40.5
92

Results
40
45
50
55
60
65
SRC
40.5
PBMT
46.0~51.4
93

Results
40
45
50
55
60
65
SRC
40.5
PBMT
46.0~51.4
NMT (MLE)
52.0~52.7
94

Results
40
45
50
55
60
65
PBMT
46.0~51.4
NMT (MLE)
52.0~52.7
SRC
40.5
NMT
(NRL)
53.9
95

Results
40
45
50
55
60
65
PBMT
46.0~51.4
NMT (MLE)
52.0~52.7
SRC
40.5
NMT
(NRL)
53.9
Human
62.3
96

Summary so far…
Grammatical Error Correction with NRL
ü Sentence-level objective.
ü Direct optimization toward the metric.
ü NRL > Maximum Likelihood Estimation
97

Conclusions
1. Character-level
2. Word-level
I youyou
Fluency
98

文法および流暢性を考慮した頑健なテキスト誤り訂正 (第15回ステアラボ人工知能セミナー)

Recomendados

Recomendados

Más contenido relacionado

Similar a 文法および流暢性を考慮した頑健なテキスト誤り訂正 (第15回ステアラボ人工知能セミナー)

Similar a 文法および流暢性を考慮した頑健なテキスト誤り訂正 (第15回ステアラボ人工知能セミナー) (20)

Más de STAIR Lab, Chiba Institute of Technology

Más de STAIR Lab, Chiba Institute of Technology (20)

Último

Último (20)

文法および流暢性を考慮した頑健なテキスト誤り訂正 (第15回ステアラボ人工知能セミナー)