A characteristic of information processing performed by humans is that it consists of both System 1 that performs fast automatic processing and System 2 that performs slow conscious processing. In this lecture, I will introduce computational models for A) natural language understanding/generation and B) understanding the state of mind from the perspective of Systems 1 and 2, providing an opportunity to think about what intelligence is.
(The lecture is given in Japanese, but most slides are written in English.)
人が行う情報処理の特徴は、自動的な速い処理を行う System 1 と意識的な遅い処理を行う System 2 の両方で成り立っていることである。本講義では、A)自然言語理解/生成、B)心の状態の理解―それぞれの計算モデルをSystem 1, System 2の観点から紹介し、知能とは何かを考えるきっかけを提供する。
(講義は日本語で行いますが、主要なスライドは英語で書かれています。)
Use of mutants in understanding seedling development.pptx
Theory of Mind and Language Processing, Fast and Slow
1. Theory of Mind and Language
Processing, Fast and Slow
Oka Natsuki
Faculty of Information and Human Sciences
Kyoto Institute of Technology
Cognitive Interaction Design
August 3, 2020
1
2. Summary
A characteristic of information processing
performed by humans is that it consists of both
System 1 that performs fast automatic processing
and System 2 that performs slow conscious
processing. In this lecture, I will introduce
computational models for A) natural language
understanding/generation and B) understanding
the state of mind from the perspective of Systems 1
and 2, providing an opportunity to think about
what intelligence is.
(The lecture is given in Japanese, but most slides
are written in English.)
2
3. 講義の概要
人が行う情報処理の特徴は、自動的な速い処
理を行う System 1 と意識的な遅い処理を行う
System 2 の両方で成り立っていることである。
本講義では、A)自然言語理解/生成、B)心の
状態の理解―それぞれの計算モデルをSystem
1, System 2の観点から紹介し、知能とは何かを
考えるきっかけを提供する。
(講義は日本語で行いますが、主要なスライド
は英語で書かれています。)
3
4. Context-Free Grammar 文脈自由文法
Syntax tree 構文木
N PREP N CONJ N N
theory of mind and language processing
PP
NP
NP
NP
N PREP N CONJ N N
theory of mind and language processing
PP
NP
NP
NP
NPNP
NP
NP
NP → NP PP
NP → N
NP → NP CONJ NP
PP → PREP NP
・・・
4
7. System 1 operates
automatically and quickly,
with little or no effort and no
sense of voluntary control.
System 2 allocates attention to
the effortful mental activities
that demand it, including
complex computations. The
operations of System 2 are
often associated with the
subjective experience of
agency, choice, and
concentration.
7
8. System 1 operates
automatically and quickly,
with little or no effort and no
sense of voluntary control.
System 2 allocates attention to
the effortful mental activities
that demand it, including
complex computations. The
operations of System 2 are
often associated with the
subjective experience of
agency, choice, and
concentration.
8
9. Report Assignment
Deadline: August 17; Submit a report of about 1000 words in pdf format.
Answer the following questions 1 and 2 by choosing either A or B,
where A is “Understanding of the state of mind" and B is
"Natural language understanding/generation“. You may choose
both A and B.
1. In many cases, both System 1 and System 2 are likely to be
involved in the execution of A and B. List specific situations in
which both systems are likely to be involved, and describe
your hypothesis in as much detail as possible about what
each system does and how they interact.
2. Be as specific as possible and as detailed as possible how to
check if the hypothesis is correct.
9
11. Outline
Introducing computational models (in a broad sense)
1. Natural Language Processing
– Computational model of System 2
• Top-down parser
– Computational models of System 1
• RNN, LSTM, Transformer, BERT
– Integration
2. Theory of Mind
– Computational model of System 2
• “Rational quantitative attribution of beliefs, desires and percepts
in human mentalizing”
– Computational model of System 1
• “Machine Theory of Mind”
– Integration
11
12. Outline
Introducing computational models (in a broad sense)
1. Natural Language Processing
– Computational model of System 2
• Top-down parser
– Computational models of System 1
• RNN, LSTM, Transformer, BERT
– Integration
2. Theory of Mind
– Computational model of System 2
• “Rational quantitative attribution of beliefs, desires and percepts
in human mentalizing”
– Computational model of System 1
• “Machine Theory of Mind”
– Integration
12
14. Context-free grammar
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with
S: 文(sentence), NP: 名詞句(noun phrase),
VP: 動詞句(verb phrase), D: 限定詞(determiner),
N: 名詞(noun), PP: 前置詞句(prepositional phrase),
V: 動詞(verb), P: 前置詞(preposition) 14
縦棒はORです
15. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 15
NP
S
VP
A boy saw a girl with a telescope
16. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 16
NP
S
VP
A boy saw a girl with a telescope
D N
17. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 17
NP
S
VP
A boy saw a girl with a telescope
D N
A
18. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 18
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
19. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 19
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
20. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 20
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
21. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 21
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
D N
22. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 22
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
D N
a
23. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 23
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
D N
a boy
24. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 24
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
D N
a girl
25. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 25
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
D N
a
26. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 26
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
D N
27. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 27
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
NP
PP
28. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 28
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
NP
PP
D N
29. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 29
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
NP
PP
D N
a girl
30. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 30
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
NP
PP
D N
a girl
P
NP
31. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 31
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
NP
PP
D N
a girl
P
NP
with
32. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 32
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
NP
PP
D N
a girl
P
NP
with
D N
33. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 33
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
NP
PP
D N
a girl
P
NP
with
D N
a telescope
34. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 34
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
NP
PP
D N
a girl
P
NP
with
35. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 35
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
NP
PP
36. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 36
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
NP
PP
NP PP
D N P NP
D N
37. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 37
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
saw
38. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 38
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
NP
V
PP
39. Top-down parser
S → NP VP
NP → D N|NP PP
VP → V NP|V NP PP
PP → P NP
D → a
N → boy|girl|telescope
V → saw
P → with 39
NP
S
VP
A boy saw a girl with a telescope
D N
A boy
V
a girl awith
PP
telescope
D N DP N
NP NP
saw
40. A boy saw a girl awith
NP
NP
PP
S
telescope
VP
D N V D N DP N
NP NP
少年は望遠鏡を
持った少女を見た。
40
A boy saw a girl awith
NP
PP
S
telescope
VP
D N V D N DP N
NP NP
少年は望遠鏡で
少女を見た。
41. Parsing with grammar rules
文法規則による構文解析の特徴
• Infinite depth trees with a finite number of
rules 有限個の規則で、無限の深さの木が生
成/解析できる
• You can parse even an unknown language. 知
らない言語でも規則に従えば生成/解析でき
る
• Effortful 努力を要する
→ これらは、System 2 による処理の特徴
41
43. Semantic constraints 意味的な制約
A boy saw a girl with a telescope.
A boy saw a girl with a book.
43
本では見ることはできない。
でも、丸めると覗くことができる。
すべての制約を書き尽くすのは困難
→ good old-fashioned AIの前に立ちはだかった壁
44. Outline
Introducing computational models (in a broad sense)
1. Natural Language Processing
– Computational model of System 2
• Top-down parser
– Computational models of System 1
• RNN, LSTM, Transformer, BERT
– Integration
2. Theory of Mind
– Computational model of System 2
• “Rational quantitative attribution of beliefs, desires and percepts
in human mentalizing”
– Computational model of System 1
• “Machine Theory of Mind”
– Integration
44
50. Long Short-Term Memory
50
cell state
recurrent information
Yu+, A review of recurrent neural networks, Neural Computation 31(7) 1235-1270, 2019.
51. original LSTM
51
Yu+, A review of recurrent neural networks, Neural Computation 31(7) 1235-1270, 2019.
cell state
recurrent
information
input
output
52. [Vinyals+, Show and Tell: A Neural Image Caption Generator, 2015]
キャプション生成
開始記号
次の単語の
確率分布
one-hot vector
word embedding
vector 512次元
画像分類の学習をした
CNNの最終隠れ層
この和を最大に
するよう学習
確率分布に従って1つサンプリングするのでなく、サイズ20のbeam search
キャプション付きデータは
少ないので、過学習防ぐた
めに固定
512次元
53. sequence-to-sequence learning
framework with attention
stacked LSTM, residual connections, bidirectional LSTM, sub-word units, …
知识就是力量
Knowledge is power
それまでに読み込んだ全ての単
語の意味を表現しているベクトル
1単語ずつ出力
どこに注目
するかを変
化させつつ
Y. Wu et al., Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, arXiv 2016
56. Transformer
The Batch: How did the idea of self-attention evolve?
Shazeer: I’d been working with LSTMs, the state-of-the-art language
architecture before transformer. There were several frustrating
things about them, especially computational problems. Arithmetic is
cheap and moving data is expensive on today’s hardware. If you
multiply an activation vector by a weight matrix, you spend 99
percent of the time reading the weight matrix from memory. You
need to process a whole lot of examples simultaneously to make
that worthwhile. Filling up memory with all those activations limits
the size of your model and the length of the sequences you can
process. Transformers can solve those problems because you
process the entire sequence simultaneously. I heard a few of my
colleagues in the hallway saying, “Let’s replace LSTMs with
attention.” I said, “Heck yeah!”
56
THE BATCH, June 17, 2020
57. Attention Is All You Need
arXiv:1706.03762
Figure 1: The Transformer - model architecture.
64. BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding
64
https://arxiv.org/abs/1810.04805
Task #1: Masked LM
Task #2: Next Sentence Prediction
65. BERT Rediscovers the Classical NLP
Pipeline (ACL 2019)
Ian Tenney, Dipanjan Das, Ellie Pavlick
Pre-trained text encoders have rapidly advanced the
state of the art on many NLP tasks. We focus on one
such model, BERT, and aim to quantify where linguistic
information is captured within the network. We find
that the model represents the steps of the traditional
NLP pipeline in an interpretable and localizable way,
and that the regions responsible for each step appear in
the expected sequence: POS tagging, parsing, NER,
semantic roles, then coreference. Qualitative analysis
reveals that the model can and often does adjust this
pipeline dynamically, revising lower-level decisions on
the basis of disambiguating information from higher-
level representations. 65https://arxiv.org/abs/1905.05950
66. Summary statistics on BERT-large
66
part-of-speech (POS), constituents (Consts.), dependencies (Deps.), entities,
semantic role labeling (SRL), coreference (Coref.), semantic proto-roles (SPR;
Reisinger et al., 2015), and relation classification (SemEval).
https://arxiv.org/abs/1905.05950
67. Theoretical studies
• Merrill (2019) showed that—in the finite
precision setting—LSTMs recognize a subset of
the counter languages, whereas GRUs and
simple RNNs recognize regular languages.
• Korsky and Berwick (2019) showed that
arbitrary-precision RNNs can emulate
pushdown automata, and can therefore
recognize all deterministic context-free
languages.
67
Hahn, Theoretical Limitations of Self-Attention in Neural Sequence Models, Transactions of the
Association for Computational Linguistics 2020 Vol. 8, 156-171.
68. Theoretical studies
• Siegelman and Sontag (1995) states that—
given unlimited computation time—recurrent
networks can emulate the computation of
Turing machines.
• P𝑒rez et al. (2019) have shown the same result
for both (argmax-attention) Transformers and
Neural GPUs.
68
Hahn, Theoretical Limitations of Self-Attention in Neural Sequence Models, Transactions of the
Association for Computational Linguistics 2020 Vol. 8, 156-171.
69. Theoretical Limitations of Self-Attention in
Neural Sequence Models
Hahn, Transactions of the Association for Computational Linguistics 2020 Vol. 8, 156-171
• Self-attention cannot model periodic finite-
state languages, nor hierarchical structure,
unless the number of layers or heads
increases with input length.
⇔ the practical success of self-attention
→ Natural language can be approximated well
with models that are too weak for the formal
languages typically assumed in theoretical
linguistics.
69
70. ここまでのまとめ
• RNN, LSTM, Transformer, BERTは、明示的な文法規
則により構文解析するのとは異なる方法で、構文
情報をかなり正確にとらえている。ただし、有限の
計算資源では、無限に深い再帰構造は扱えない。
• 人の System 1 による言語処理も同様の性質を持っ
ているように思われる(私見)。
• 人は、System 2 によって、無限に深い再帰構造を
扱うこともできる。が、日常の言語処理における
System 2 の働きとして重要なのは、そこではなくて、
one shotで学習できるとか、次ページに示す点の方
ではないか(私見)。
70
73. Outline
Introducing computational models (in a broad sense)
1. Natural Language Processing
– Computational model of System 2
• Top-down parser
– Computational models of System 1
• RNN, LSTM, Transformer, BERT
– Integration(網羅的ではなく3例だけ紹介)
2. Theory of Mind
– Computational model of System 2
• “Rational quantitative attribution of beliefs, desires and percepts
in human mentalizing”
– Computational model of System 1
• “Machine Theory of Mind”
– Integration
73
74. Building End-To-End Dialogue Systems Using
Generative Hierarchical Neural Network Models
74
Published in AAAI 2016 (Special Track on Cognitive Systems)
76. Systematicity in a Recurrent Neural Network by
Factorizing Syntax and Semantics
Standard methods in deep learning fail to capture
compositional or systematic structure in their training
data, as shown by their inability to generalize outside of
the training distribution. However, human learners
readily generalize in this way, e.g. by applying known
grammatical rules to novel words. …
76
https://cognitivesciencesociety.org/cogsci20/papers/0027/index.html
77. 後半の話題 Theory of mind
自然言語処理と似ている点/違う点
• 「心の理論」も再帰構造を扱う点(私があなたを好きだ
とあなたは知っているということを私は知らないとあなたは思っているだ
ろうけど・・・)
• System 1, 2 が並行して動いているだろう点
• 「心の理論」については、System 2 による説明
や納得が日常生活で起こりそうだが、自然言
語の構文や文法を説明したいのは、文法学
者と教員だけ。
77
似てる
似てる
違う
78. Outline
Introducing computational models (in a broad sense)
1. Natural Language Processing
– Computational model of System 2
• Top-down parser
– Computational models of System 1
• RNN, LSTM, Transformer, BERT
– Integration
2. Theory of Mind
– Computational model of System 2
• “Rational quantitative attribution of beliefs, desires and percepts
in human mentalizing”
– Computational model of System 1
• “Machine Theory of Mind”
– Integration
78
82. Implicit or spontaneous ToM
• The Social Sense: Susceptibility to Others’ Beliefs in Human Infants and
Adults
– https://science.sciencemag.org/content/sci/330/6012/1830.full.pdf
– https://science.sciencemag.org/content/suppl/2010/12/20/330.6012.1830.DC
1
• Do 18-Month-Olds Really Attribute Mental States to Others?: A Critical
Test
– http://brainmind.umin.jp/PDF/wt12/Senju2011PsycholSci.pdf
– https://journals.sagepub.com/doi/suppl/10.1177/0956797611411584
• Brain activation for spontaneous and explicit false belief tasks overlaps:
new fMRI evidence on belief processing and violation of expectation
– https://academic.oup.com/scan/article/12/3/391/2593935
• Measuring spontaneous mentalizing with a ball detection task: putting
the attention-check hypothesis by Phillips and colleagues (2015) to the
test
– https://link.springer.com/article/10.1007/s00426-019-01181-7
82
83. Outline
Introducing computational models (in a broad sense)
1. Natural Language Processing
– Computational model of System 2
• Top-down parser
– Computational models of System 1
• RNN, LSTM, Transformer, BERT
– Integration
2. Theory of Mind
– Computational model of System 2
• “Rational quantitative attribution of beliefs, desires and percepts
in human mentalizing”
– Computational model of System 1
• “Machine Theory of Mind”
– Integration
83
84. Baker, Chris & Jara-Ettinger, Julian & Saxe,
Rebecca & B. Tenenbaum, Joshua. (2017).
Rational quantitative attribution of beliefs,
desires and percepts in human mentalizing.
Nature Human Behaviour. 1. 0064.
DOI: 10.1038/s41562-017-0064.
mentalize: To understand the behavior of others as a product of their mental state
84
87. Outline
Introducing computational models (in a broad sense)
1. Natural Language Processing
– Computational model of System 2
• Top-down parser
– Computational models of System 1
• RNN, LSTM, Transformer, BERT
– Integration
2. Theory of Mind
– Computational model of System 2
• “Rational quantitative attribution of beliefs, desires and percepts
in human mentalizing”
– Computational model of System 1
• “Machine Theory of Mind”
– Integration
87
88. Machine Theory of Mind
Neil C. Rabinowitz, Frank Perbet, H. Francis
Song, Chiyuan Zhang, S.M. Ali Eslami, Matthew
Botvinick
arXiv:1802.07740v2
88
89. character embedding mental state embedding
next-step action probabilities
probabilities of whether certain objects will be consumed
predicted
successor
representations
89