Презентация к магистерской диссертации Екатерины Выломовой "Нейросетевое моделирование вербального сознания", которая защищалась на английском яыке (#iu5, #bmstu).
Neural modeling of verbal consciousness based on the results of the associative experiment
1. Neural modeling of verbal
consciousness based on the results
of the associative experiment
Researcher : Katerina Vylomova
Scientific adviser: Yuri Philippovich
3/18/2013 1
2. Goals and tasks
Theme actuality:
Syntax->Semantic: search engines, machine translation, NL texts
generation
Neural system modeling: CMU, University of CA, Irvine
Goal:
Development of the neural network model of the verbal consciousness
Tasks:
◦ The associative verbal thesaurus analysis
◦ The associative verbal network analysis
◦ Analysis of the formal models of the associative experiment
◦ Development of the neural network models of the verbal
consciousness
◦ Research of the neural network
◦ Practical implementation of the research and results’ visualization
3/18/2013 2
3. Source data
Ю.Н. Караулов, Е.Ф. Тарасов, Ю.А. Сорокин, Н.В.Уфимцева, Г.А. Черкасова. (1999).
Ассоциативный тезаурус современного русского языка. РАН
Example: серьезный человек 25
Main parameters:
Time period: 1988-1998
Participants: 11,000 1-3year students of 34 specialities
Number of stimuli: 6,624
Number of cue-reaction pairs: 1,032,522
Different pairs: 462,500
Different reactions: 102,926
Current subset :
Number of stimuli: 6,577
Number of reactions: 21,312
Different cue-reaction pairs: 102,516
Conversion: the relative frequency(weight; reactions per cue):
𝑓𝑟𝑒𝑞 𝑖𝑗
𝑤𝑒𝑖𝑔𝑡 𝑖𝑗 = 𝑛 𝑓𝑟𝑒𝑞 𝑖𝑗
, 𝑓𝑟𝑒𝑞 𝑖𝑗 , ∣ 𝑓𝑟𝑒𝑞 𝑖𝑗 ∣= 102516 – associative pairs
𝑗=1
frequency
3/18/2013 3
4. The thesaurus analysis
Analogues:
USA (Jenkins & Palermo, 1964; Deese, 1965; Cramer, 1968; Nelson, 1999) ,
Russia (Леонтьев, 1977) , Belgium(De Groot, 1988;De Deyne & Storms,
2008), Japan (Okamoto & Ishizaki, 2001; Joyce, 2005), South Korea(Jung et
al.,2010), Great Britain (Kiss et al., 1973)
Reactions’ parts of speech analysis (lemmatization – Mystem utility):
~77%(~55%) - nouns, 16%(~25%) - adj., 6% (~18%) – verbs, 0.4%(~0.9%) –
adv., others- 0.6% (~0.9)*.
The most frequent reactions’ comparison:
RAT Sharov’s dict.** Intersection RAT full KorWA
Друг (13154.93) Год (2718.78) Друг Человек Деньги
Вода (7402.37) Человек (2369.34) Вода Дом Любовь
Дурак (7309.50) Время (1662.10) Дело Деньги Друг
Дело (7062.12) Дело (1175.12) Язык День Человек
Язык (6409.32) Жизнь (1155.78) Ребенок Друг Вода
Ребенок (6373.7) День (970.49) Вопрос Домой Мечта
Вопрос (5261.97) Рука (969.75) Стол Мужчина Армия
Стол (5218.45) Работа (904.43) Время Дурак Разум
Время (5163.06) Слово (817.80) Море Дело Дом
Свет (4858.42) Вопрос (751.74) Ответ Жизнь Слеза
* Values in brackets – only for different words
4
** Lyashevskaya & Sharov dictionary based on Russian National Corpora
5. The associative-verbal network
analysis
Words as vertices: ∣ 𝑉 ∣= 23195,connections between
them(associations) as edges: ∣ 𝐸 ∣= 102516.
3 types of the vertices: output edges only(stimuli),∣ 𝑆 ∣= 1883; input
edges only (reactions),∣ 𝑅 ∣= 16618; input and output edges (stimuli-
reactions) ,∣ 𝑆𝑅 ∣= 4694.
Graph parameters (Steyvers and Tenenbaum, 2005):
Sign Description Directed Undirected
n Number of vertices 23195 23195
|E| Number of edges 102516 95518
L Average length of the 3.989461 3.836189
shortest path between pair of
nodes
D Graph diameter 9 8
𝛾 Nodes power distribution 2,200 1,850
function parameter
<k> Average node power 4,42 8,839
3/18/2013 5
6. Characteristics of the associative
graph
«small-world» networks(Milgram, 1967)
6 degrees of separation: 𝐿 ∝ log 𝑁
World Wide Web (WWW; Adamic, 1999; Albert, Jeong, &
Barabási, 1999), networks of scientific collaboration(Newman,
2001), metabolic networks in biology (Jeong, Tombor, Albert,
Oltval, & Barabási, 2000)
Scale-free networks (Amaral, Scala, Barthélémy, Stanley,
2000)
𝑃(𝑘) ≈ 𝑘 (−𝛾) , where
𝛾 ∈ (2. . 4)
3/18/2013 6
8. Model based on vector space
Clustering (k-means)
𝑘
Min 𝑖=1 𝑥 𝑗 ∈𝑆 𝑖(𝑥 𝑗 − 𝜇 𝑖 )2 ,where k – number of clusters, 𝑆 𝑖 – current
clusters, 𝜇 𝑖 – centers of the clusters 𝑥 𝑗 ∈ 𝑆 𝑖 .
𝑟
𝑛 𝑟
Distance metric: 𝑑 𝑖𝑗 = 𝑘=1 𝑥 𝑖𝑘 − 𝑥 𝑗𝑘
Search of the closest center :
Model advantages:
Possibility of creating
Non-existent concept;
C1 Possibility to visualize and
C3 to change dimensionality;
2 Model disadvantages:
C2
Have to find optimal clusters
number;
Accurate clustering;
Have to find optimal dimensionality.
3/18/2013 8
9. Model based on the concept space
Core of the network: Overall number of concepts– 4,692, connections
– 59,392
𝑆 𝑖𝑡 = 𝑆 𝑖𝑡−1 + 𝑗 𝑤 𝑗𝑖 ∗ 𝑆 𝑗𝑡−1 , where 𝑆 𝑖𝑡 – activity of ith neuron at the
moment t,
𝑤 𝑗𝑖 − weight of the connection between neuron 𝑗 and neuron 𝑖
РЫБА
РЫБА
РЕКА
РЕКА
МАТЬ
МАТЬ
ВОЛГА
ВОЛГА
МОТО
МОТО
РР
АВТО
АВТО
МАШИ
МАШИ
МОБИ
МОБИ
НА
НА
ЛЬ
ЛЬ
ДВИГА
ДВИГА СКОРО
СКОРО
ТЕЛЬ
ТЕЛЬ СТЬ
СТЬ
ТЕЛЕГ
ТЕЛЕГ
А
А
КРАСИ
КРАСИ
ВЫЙ
ВЫЙ
ТАЧКА
ТАЧКА
ЛЕГКО
ЛЕГКО
ВОЙ
ВОЙ
3/18/2013 9
17. Conclusion
Subject area analysis
Comparative analysis with Russian Frequency Dictionary
Comparative analysis with the results of other experiments
Associative graph analysis
Data preprocessing
Models of the network : 2 vector space + 1 concept space
Neural Network experiments
Web-application to work with associative network and the
model
3/18/2013 17
18. Thank you for your attention!
Questions?
3/18/2013 18