SlideShare a Scribd company logo
1 of 29
Download to read offline
Learning Discourse-level Diversity for
Neural Dialog Models Using Conditional
Variational Autoencoders
Tiancheng Zhao, Ran Zhao and Maxine Eskenazi
Language Technologies Institute
Carnegie Mellon University
Code&Data: https://github.com/snakeztc/NeuralDialog-CVAE
Introduction
● End-to-end dialog models based on encoder-decoder models have shown great promises for
modeling open-domain conversations, due to its flexibility and scalability.
System Response
Encoder Decoder
Dialog History/Context
Introduction
However, dull response problem! [Li et al 2015, Serban et al. 2016]. Current solutions
include:
● Add more info to the dialog context [Xing et al 2016, Li et al 2016]
● Improve decoding algorithm, e.g. beam search [Wiseman and Rush 2016]
YesI don’t knowsure
Encoder Decoder
User: I am feeling quite happy today.
… (previous utterances)
Our Key Insights
● Response generation in conversation is a ONE-TO-MANY mapping problem at the
discourse level.
● A similar dialog context can have many different yet valid responses.
● Learn a probabilistic distribution over the valid responses instead of only keep the
most likely one.
Our Key Insights
● Response generation in conversation is a ONE-TO-MANY mapping problem at the discourse
level.
○ A similar dialog context can have many different yet valid responses.
● Learn a probabilistic distribution over the valid responses instead of only keep the most likely
one.
Our Contributions
1. Present an E2E dialog model adapted from Conditional Variational Autoencoder
(CVAE).
2. Enable integration of expert knowledge via knowledge-guided CVAE.
3. Improve the training method of optimizing CVAE/VAE for text generation.
Conditional Variational Auto Encoder (CVAE)
● C is dialog context
○ B: Do you like cats? A: Yes I do
● Z is the latent variable (gaussian)
● X is the next response
○ B: So do I.
Conditional Variational Auto Encoder (CVAE)
● C is dialog context
○ B: Do you like cats? A: Yes I do
● Z is the latent variable (gaussian)
● X is the next response
○ B: So do I.
● Trained by Stochastic Gradient Variational
Bayes (SGVB) [Kingma and Welling 2013]
Knowledge-Guided CVAE (kgCVAE)
● Y is linguistic features extracted from responses
○ Dialog act: statement -> “So do I”.
● Use Y to guide the learning of latent Z
Training of (kg)CVAE
Reconstruction loss
KL-divergence loss
Testing of (kg)CVAE
Optimization Challenge
Training CVAE with RNN decoder is hard due to the vanishing latent variable problem
[Bowman et al., 2015]
● RNN decoder can cheat by using LM information and ignore Z!
Bowman et al. [2015] described two methods to alleviate the problem :
1. KL annealing (KLA): gradually increase the weight of KL term from 0 to 1 (need early stop).
2. Word drop decoding: setting a proportion of target words to 0 (need careful parameter
picking).
BOW Loss
● Predict the bag-of-words in the responses X at once (word counts in the response)
● Break the dependency between words and eliminate the chance of cheating based on LM.
z
c
RNN Lossx
BOW Loss
● Predict the bag-of-words in the responses X at once (word counts in the response)
● Break the dependency between words and eliminate the chance of cheating based on LM.
z
c
RNN Lossx
xwo
FF Bag-of-word Loss
Dataset
Data Name Switchboard Release 2
Number of dialogs 2,400 (2316/60/62 - train/valid/test)
Number of context-response pairs 207,833/5,225/5,481
Vocabulary Size Top 10K
Dialog Act Labels 42 types, tagged by SVM and human
Number of Topics 70 tagged by humans
Quantitative Metrics
Ref resp1
Ref resp Mc
Context
Hyp resp 1
Hyp resp N
ModelHuman ... ...
Quantitative Metrics
d(r, h) is a distance function [0, 1] to measure the similarity between a reference and a hypothesis.
Appropriateness
Diversity
Ref resp1
Ref resp Mc
Context
Hyp resp 1
Hyp resp N
ModelHuman ... ...
Distance Functions used for Evaluation
1. Smoothed Sentence-level BLEU (1/2/3/4): lexical similarity
2. Cosine distance of Bag-of-word Embeddings: distributed semantic similarity.
(pre-trained Glove embedding on twitter)
a. Average of embeddings (A-bow)
b. Extrema of embeddings (E-bow)
3. Dialog Act Match: illocutionary force-level similarity
a. (Use pre-trained dialog act tagger for tagging)
Models (trained with BOW loss)
Encoder Sampling Decoder
Encoder Greedy Decoder
Encoder Greedy Decoder
z
z
y
sampling
sampling
Baseline
CVAE
kgCVAE
Quantitative Analysis Results
Metrics Perplexi
ty (KL)
BLEU-1
(p/r)
BLEU-2
(p/r)
BLEU-3
(p/r)
BLEU-4
(p/r)
A-bow
(p/r)
E-bow
(p/r)
DA
(p/r)
Baseline
(sample)
35.4
(n/a)
0.405/
0.336
0.3/
0.281
0.272/
0.254
0.226/
0.215
0.387/
0.337
0.701/
0.684
0.736/
0.514
CVAE
(greedy)
20.2
(11.36)
0.372/
0.381
0.295/
0.322
0.265/
0.292
0.223/
0.248
0.389/
0.361
0.705/
0.709
0.704/
0.604
kgCVAE
(greedy)
16.02
(13.08)
0.412/
0.411
0.350/
0.356
0.310/
0.318
0.262/
0.272
0.373/
0.336
0.711/
0.712
0.721/
0.598
Note: BLEU are normalized into [0, 1] to be valid precision and recall distance function
Qualitative Analysis
Topic: Recycling Context: A: are they doing a lot of recycling out in Georgia?
Target (statement): well at my workplace we have places for aluminium cans
Baseline + Sampling kgCVAE + Greedy
1. well I’m a graduate student and have two
kids.
1. (non-understand) pardon.
2. well I was in last year and so we’ve had
lots of recycling.
2. (statement) oh you’re not going to have a
curbside pick up here.
3. I’m not sure. 3. (statement) okay I am sure about a recycling
center.
4. well I don’t know I just moved here in new
york.
4. (yes-answer) yeah so.
Latent Space Visualization
● Visualization of the posterior Z on the test
dataset in 2D space using t-SNE.
● Assign different colors to the top 8 frequent
dialog acts.
● The size of circle represents the response
length.
● Exhibit clear clusterings of responses w.r.t the
dialog act
The Effect of BOW Loss
Same setup on PennTree Bank for LM
[Bowman 2015]. Compare 4 setups:
1. Standard VAE
2. KL Annealing (KLA)
3. BOW
4. BOW + KLA
Goal: low reconstruction loss + small
but non-trivial KL cost
Model Perplexity KL Cost
Standard 122.0 0.05
KLA 111.5 2.02
BOW 97.72 7.41
BOW+KLA 73.04 15.94
KL Cost during Training
● Standard model suffers from vanishing
latent variable.
● KLA requires early stopping.
● BOW leads to stable convergence
with/without KLA.
● The same trend is observed on CVAE.
Conclusion and Future Work
● Identify the ONE-TO-MANY nature of open-domain dialog modeling
● Propose two novel models based on latent variables models for generating diverse yet
appropriate responses.
● Explore further in the direction of leveraging both past linguistic findings and deep models
for controllability and explainability.
● Utilize crowdsourcing to yield more robust evaluation.
Code available here! https://github.com/snakeztc/NeuralDialog-CVAE
Thank you!
Questions?
References
1. Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016a. A persona-based neural conversation model.
arXiv preprint arXiv:1603.06155
2. Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2015. A diversity-promoting objective function for neural
conversation models. arXiv preprint arXiv:1510.03055 .
3. Samuel R Bowman, Luke Vilnis, Oriol Vinyals, An- drew M Dai, Rafal Jozefowicz, and Samy Bengio. 2015. Generating
sentences from a continuous space. arXiv preprint arXiv:1511.06349 .
4. Diederik P Kingma and Max Welling. 2013. Auto- encoding variational bayes. arXiv preprint arXiv:1312.6114 .
5. Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016a. A persona-based neural conversation model.
arXiv preprint arXiv:1603.06155
Training Details
Word Embedding 200 Glove pre-trained on Twitter
Utterance Encoder Hidden Size 300
Context Encoder Hidden Size 600
Response Decoder Hidden Size 400
Latent Z Size 200
Context Window Size 10 utterances
Optimizer Adam learning rate=0.001
Testset Creation
● Use 10-nearest neighbour to collect similar context in the training data
● Label a subset of the appropriateness of the 10 responses by 2 human
annotators
● bootstrap via SVM on the whole test set (5481 context/response)
● Resulting 6.79 Avg references responses/context
● Distinct reference dialog acts 4.2

More Related Content

What's hot

5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
RIILP
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
RIILP
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
Lifeng (Aaron) Han
 

What's hot (20)

NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application Trends
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
 
From UML/OCL to natural language (using SBVR as pivot)
From UML/OCL to natural language (using SBVR as pivot)From UML/OCL to natural language (using SBVR as pivot)
From UML/OCL to natural language (using SBVR as pivot)
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
An NLP-based architecture for the autocompletion of partial domain models
An NLP-based architecture for the autocompletion of partial domain modelsAn NLP-based architecture for the autocompletion of partial domain models
An NLP-based architecture for the autocompletion of partial domain models
 
Information Retrieval with Deep Learning
Information Retrieval with Deep LearningInformation Retrieval with Deep Learning
Information Retrieval with Deep Learning
 
BERT
BERTBERT
BERT
 
Reference Scope Identification in Citing Sentences
Reference Scope Identification in Citing SentencesReference Scope Identification in Citing Sentences
Reference Scope Identification in Citing Sentences
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introduction
 
Plug play language_models
Plug play language_modelsPlug play language_models
Plug play language_models
 
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
 
Polymorphism
PolymorphismPolymorphism
Polymorphism
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueTransformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 
BERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesBERT - Part 2 Learning Notes
BERT - Part 2 Learning Notes
 

Similar to Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog Models Using Conditional Variational Autoencoders

Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
IJCI JOURNAL
 

Similar to Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog Models Using Conditional Variational Autoencoders (20)

[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
 
Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptx
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Poster Tweet-Norm 2013
Poster Tweet-Norm 2013Poster Tweet-Norm 2013
Poster Tweet-Norm 2013
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Word_Embeddings.pptx
Word_Embeddings.pptxWord_Embeddings.pptx
Word_Embeddings.pptx
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
 
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
 
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
 
Generating sentences from a continuous space
Generating sentences from a continuous spaceGenerating sentences from a continuous space
Generating sentences from a continuous space
 
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learning
 

More from Association for Computational Linguistics

More from Association for Computational Linguistics (20)

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 

Recently uploaded

Recently uploaded (20)

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 

Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog Models Using Conditional Variational Autoencoders

  • 1. Learning Discourse-level Diversity for Neural Dialog Models Using Conditional Variational Autoencoders Tiancheng Zhao, Ran Zhao and Maxine Eskenazi Language Technologies Institute Carnegie Mellon University Code&Data: https://github.com/snakeztc/NeuralDialog-CVAE
  • 2. Introduction ● End-to-end dialog models based on encoder-decoder models have shown great promises for modeling open-domain conversations, due to its flexibility and scalability. System Response Encoder Decoder Dialog History/Context
  • 3. Introduction However, dull response problem! [Li et al 2015, Serban et al. 2016]. Current solutions include: ● Add more info to the dialog context [Xing et al 2016, Li et al 2016] ● Improve decoding algorithm, e.g. beam search [Wiseman and Rush 2016] YesI don’t knowsure Encoder Decoder User: I am feeling quite happy today. … (previous utterances)
  • 4. Our Key Insights ● Response generation in conversation is a ONE-TO-MANY mapping problem at the discourse level. ● A similar dialog context can have many different yet valid responses. ● Learn a probabilistic distribution over the valid responses instead of only keep the most likely one.
  • 5. Our Key Insights ● Response generation in conversation is a ONE-TO-MANY mapping problem at the discourse level. ○ A similar dialog context can have many different yet valid responses. ● Learn a probabilistic distribution over the valid responses instead of only keep the most likely one.
  • 6. Our Contributions 1. Present an E2E dialog model adapted from Conditional Variational Autoencoder (CVAE). 2. Enable integration of expert knowledge via knowledge-guided CVAE. 3. Improve the training method of optimizing CVAE/VAE for text generation.
  • 7. Conditional Variational Auto Encoder (CVAE) ● C is dialog context ○ B: Do you like cats? A: Yes I do ● Z is the latent variable (gaussian) ● X is the next response ○ B: So do I.
  • 8. Conditional Variational Auto Encoder (CVAE) ● C is dialog context ○ B: Do you like cats? A: Yes I do ● Z is the latent variable (gaussian) ● X is the next response ○ B: So do I. ● Trained by Stochastic Gradient Variational Bayes (SGVB) [Kingma and Welling 2013]
  • 9. Knowledge-Guided CVAE (kgCVAE) ● Y is linguistic features extracted from responses ○ Dialog act: statement -> “So do I”. ● Use Y to guide the learning of latent Z
  • 10. Training of (kg)CVAE Reconstruction loss KL-divergence loss
  • 12. Optimization Challenge Training CVAE with RNN decoder is hard due to the vanishing latent variable problem [Bowman et al., 2015] ● RNN decoder can cheat by using LM information and ignore Z! Bowman et al. [2015] described two methods to alleviate the problem : 1. KL annealing (KLA): gradually increase the weight of KL term from 0 to 1 (need early stop). 2. Word drop decoding: setting a proportion of target words to 0 (need careful parameter picking).
  • 13. BOW Loss ● Predict the bag-of-words in the responses X at once (word counts in the response) ● Break the dependency between words and eliminate the chance of cheating based on LM. z c RNN Lossx
  • 14. BOW Loss ● Predict the bag-of-words in the responses X at once (word counts in the response) ● Break the dependency between words and eliminate the chance of cheating based on LM. z c RNN Lossx xwo FF Bag-of-word Loss
  • 15. Dataset Data Name Switchboard Release 2 Number of dialogs 2,400 (2316/60/62 - train/valid/test) Number of context-response pairs 207,833/5,225/5,481 Vocabulary Size Top 10K Dialog Act Labels 42 types, tagged by SVM and human Number of Topics 70 tagged by humans
  • 16. Quantitative Metrics Ref resp1 Ref resp Mc Context Hyp resp 1 Hyp resp N ModelHuman ... ...
  • 17. Quantitative Metrics d(r, h) is a distance function [0, 1] to measure the similarity between a reference and a hypothesis. Appropriateness Diversity Ref resp1 Ref resp Mc Context Hyp resp 1 Hyp resp N ModelHuman ... ...
  • 18. Distance Functions used for Evaluation 1. Smoothed Sentence-level BLEU (1/2/3/4): lexical similarity 2. Cosine distance of Bag-of-word Embeddings: distributed semantic similarity. (pre-trained Glove embedding on twitter) a. Average of embeddings (A-bow) b. Extrema of embeddings (E-bow) 3. Dialog Act Match: illocutionary force-level similarity a. (Use pre-trained dialog act tagger for tagging)
  • 19. Models (trained with BOW loss) Encoder Sampling Decoder Encoder Greedy Decoder Encoder Greedy Decoder z z y sampling sampling Baseline CVAE kgCVAE
  • 20. Quantitative Analysis Results Metrics Perplexi ty (KL) BLEU-1 (p/r) BLEU-2 (p/r) BLEU-3 (p/r) BLEU-4 (p/r) A-bow (p/r) E-bow (p/r) DA (p/r) Baseline (sample) 35.4 (n/a) 0.405/ 0.336 0.3/ 0.281 0.272/ 0.254 0.226/ 0.215 0.387/ 0.337 0.701/ 0.684 0.736/ 0.514 CVAE (greedy) 20.2 (11.36) 0.372/ 0.381 0.295/ 0.322 0.265/ 0.292 0.223/ 0.248 0.389/ 0.361 0.705/ 0.709 0.704/ 0.604 kgCVAE (greedy) 16.02 (13.08) 0.412/ 0.411 0.350/ 0.356 0.310/ 0.318 0.262/ 0.272 0.373/ 0.336 0.711/ 0.712 0.721/ 0.598 Note: BLEU are normalized into [0, 1] to be valid precision and recall distance function
  • 21. Qualitative Analysis Topic: Recycling Context: A: are they doing a lot of recycling out in Georgia? Target (statement): well at my workplace we have places for aluminium cans Baseline + Sampling kgCVAE + Greedy 1. well I’m a graduate student and have two kids. 1. (non-understand) pardon. 2. well I was in last year and so we’ve had lots of recycling. 2. (statement) oh you’re not going to have a curbside pick up here. 3. I’m not sure. 3. (statement) okay I am sure about a recycling center. 4. well I don’t know I just moved here in new york. 4. (yes-answer) yeah so.
  • 22. Latent Space Visualization ● Visualization of the posterior Z on the test dataset in 2D space using t-SNE. ● Assign different colors to the top 8 frequent dialog acts. ● The size of circle represents the response length. ● Exhibit clear clusterings of responses w.r.t the dialog act
  • 23. The Effect of BOW Loss Same setup on PennTree Bank for LM [Bowman 2015]. Compare 4 setups: 1. Standard VAE 2. KL Annealing (KLA) 3. BOW 4. BOW + KLA Goal: low reconstruction loss + small but non-trivial KL cost Model Perplexity KL Cost Standard 122.0 0.05 KLA 111.5 2.02 BOW 97.72 7.41 BOW+KLA 73.04 15.94
  • 24. KL Cost during Training ● Standard model suffers from vanishing latent variable. ● KLA requires early stopping. ● BOW leads to stable convergence with/without KLA. ● The same trend is observed on CVAE.
  • 25. Conclusion and Future Work ● Identify the ONE-TO-MANY nature of open-domain dialog modeling ● Propose two novel models based on latent variables models for generating diverse yet appropriate responses. ● Explore further in the direction of leveraging both past linguistic findings and deep models for controllability and explainability. ● Utilize crowdsourcing to yield more robust evaluation. Code available here! https://github.com/snakeztc/NeuralDialog-CVAE
  • 27. References 1. Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016a. A persona-based neural conversation model. arXiv preprint arXiv:1603.06155 2. Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2015. A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055 . 3. Samuel R Bowman, Luke Vilnis, Oriol Vinyals, An- drew M Dai, Rafal Jozefowicz, and Samy Bengio. 2015. Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349 . 4. Diederik P Kingma and Max Welling. 2013. Auto- encoding variational bayes. arXiv preprint arXiv:1312.6114 . 5. Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016a. A persona-based neural conversation model. arXiv preprint arXiv:1603.06155
  • 28. Training Details Word Embedding 200 Glove pre-trained on Twitter Utterance Encoder Hidden Size 300 Context Encoder Hidden Size 600 Response Decoder Hidden Size 400 Latent Z Size 200 Context Window Size 10 utterances Optimizer Adam learning rate=0.001
  • 29. Testset Creation ● Use 10-nearest neighbour to collect similar context in the training data ● Label a subset of the appropriateness of the 10 responses by 2 human annotators ● bootstrap via SVM on the whole test set (5481 context/response) ● Resulting 6.79 Avg references responses/context ● Distinct reference dialog acts 4.2