eSoftTools IMAP Backup Software and migration tools
Dynamic Memory Networks for Dialogue Topic Tracking
1. Dynamic Memory Networks for Dialogue Topic Tracking
Seokhwan Kim
Adobe Research, San Jose, CA, USA
Dialogue Topic Tracking
Categorizing the topic state at each time step
f(t) =
B-{c ∈ C} if ut is at the beginning of a segment belongs to c,
I-{c ∈ C} else if ut is inside a segment belongs to c,
O otherwise,
Examples of dialogue topic tracking
Speaker Utterance (ut) f(t)
Guide How can I help you? B-OPEN
Tourist Can you recommend some good places to visit in Singa-
pore?
B-ATTR
Guide Well if you like to visit an icon of Singapore, Merlion will
be a nice place to visit.
I-ATTR
Tourist Okay. But I’m particularly interested in amusement
parks.
B-ATTR
Guide Then, what about Universal Studio? I-ATTR
Tourist Good! How can I get there from Orchard Road by public
transportation?
B-TRSP
Guide You can take the red line train from Orchard and transfer
to the purple line at Dhoby Ghaut. Then, you could reach
HarbourFront where Sentosa Express departs.
I-TRSP
Tourist How long does it take in total? I-TRSP
Guide It’ll take around half an hour. I-TRSP
Tourist Alright. I-TRSP
Guide You could spend a whole afternoon at the park by its
closing time at 6pm.
B-ATTR
Tourist Sounds good! I-ATTR
Guide Then, I recommend you enjoy dinner at the riverside on
the way back.
B-FOOD
Tourist What do you recommend there? I-FOOD
Guide If you like spicy foods, you must try chili crab which is
one of our favorite dishes.
I-FOOD
Tourist Great! I’ll try that. I-FOOD
Baselines: CNN and RCNN (Kim et al., 2016)
CNN RCNN
Inputs
ut-1
ut
ut-w+1
…
ut-2ut-2
Convolutional
layer
Max pooling
layer
Prediction
yt
Inputs
ut-w+1
…
ut-1
ut
ut-2ut-2
Convolutional
layer
Max pooling
layer
Max pooling
layer
Recurrent
layer
ht-w+1
…
ht-2
ht-1
ht
Prediction
yt
Prediction
yt
Convolutional Neural Network (CNN) for Dialogue Topic Tracking
Representing an utterance as a matrix with n rows of k-dimensional word vectors
A convolutional filter has the same width k and a window size m as its height
The maximum value is selected from each feature map in max pooling layer
The values from max pooling are forwarded to the fully-connected softmax layer
Recurrent CNN (RCNN) for Dialogue Topic Tracking
Each feature vector generated after the max pooling layers in the CNN
architecture is connected to the recurrent layers in the RNN architecture
Proposed Model: Dynamic Memory Network
Dynamic Memory Network (DMN) for Dialogue Topic Tracking
Inputs
ut-w+1
…
ut-1
ut
ut-2ut-2
Convolutional
layer
Dynamic
Memories
Prediction
yt
Prediction
yt
h1
t-w+1 h2
t-w+1 hm
t-w+1…
h1
t-2 h2
t-2 hm
t-2…
h1
t-1 h2
t-1 hm
t-1…h1
t-1 h2
t-1 hm
t-1…
h1
t h2
t hm
t…h1
t h2
t hm
t…
…
…
…
Max
pooling
Proposed Model: Dynamic Memory Network
Our models represent the latent dialogue state at each given time
step as a set of read-writable memory slots
Each memory slot is updated through a given dialogue sequence by
the content-based operations in gated recurrent networks
Gating mechanisms
Single Gate (Henaff et al. 2016) Update and Reset Gates Cross-slot Interactions
zj
i σ uT
i wj
+ uT
i hj
i−1 σ k αkj
z uT
i wk
+ βkj
z uT
i hk
i−1
rj
i - σ uT
i Wr wj
+ uT
i Ur hj
i−1 σ k αkj
r uT
i wk
+ βkj
r uT
i hk
i−1
˜hj
i tanh Uhj
i−1 + Vwj
+ Wui tanh U rj
i ◦ hj
i−1 + Vwj
+ Wui
hj
i 1 − zj
i ◦ hj
i−1 + zj
i ◦ ˜hj
i
Evaluation: Data
TourSG corpus
Human-human mixed initiative dialogues
35 sessions, 21 hours, 31,034 utterances
Manually annotated with eight topic categories
‘attraction’, ‘transportation’, ‘food’, ‘accommodation’, ‘shopping’, ‘opening’, ‘closing’, ‘other’
15 classes: ({B-, I-} × {c : c ∈ C; and c = ‘other’}) ∪ {O}
Data Statistics
Set # sessions # segments # utterances
Train 14 2,104 12,759
Dev 6 700 4,812
Test 15 2,210 13,463
Total 35 5,014 31,034
Evaluation: Implementation Details
Word Embedding
Initialized with the pre-trained word2vec on 2.9M sentences from travel forum
Fine-tuned while the whole model is trained
Convolutional Layer
Learned 100 feature maps for each of three different filter sizes {3, 4, 5}
For CNN, applied over the current, previous, and history utterances w = 10
For RCNN and DMN, applied for each single utterance
Recurrent Layer
RCNNs
We compared two variants: Vanilla RNNs and Gated Recurrent Units (GRUs)
The hidden layer dimensions were 150 for the vanilla RNN and 50 for the GRU
DMNs
Three dynamic memory networks were trained based on the proposed gating mechanisms
The number of memory slots were m = 5 for the first two distributed models and m = 10 for
the other with cross-slot interactions
Model Training
Adam optimizer by minimizing the categorical cross entropy loss on softmax
With mini-batch size of 50 and dropout after max pooling with the rate of 0.25
Stopped training after 150 epochs where the CNN baseline has been saturated.
Evaluation: Results
Sequential Labelling Segmentation
Models P R F Pk WD
CNN 0.6691 0.6861 0.6775 0.3799 0.4884
RCNN (RNN) 0.6825 0.6572 0.6696 0.3970 0.4634
RCNN (GRU) 0.6936 0.6826 0.6880 0.3888 0.4619
DMN (single) 0.6877 0.7105 0.6989 0.3782 0.4393
DMN (reset & update) 0.6959 0.7035 0.6997 0.3781 0.4427
DMN (cross-slot) 0.7008 0.7090 0.7049† 0.3532‡ 0.4223‡
CNN RCNN Dynamic Memory
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Numberoferrors
missing
extraneous
wrong boundary
wrong category
Slot 0 Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 Slot 7 Slot 8 Slot 9
B-ATTR
B-TRSP
B-FOOD
B-ACCO
B-SHOP
I-ATTR
I-TRSP
I-FOOD
I-ACCO
I-SHOP
0.6
0.7
0.8
0.9
345 Park Avenue, San Jose, CA 95110, USA Email: seokim@adobe.com