38. 3835 WBA -
How NOT To Evaluate Your Dialogue System:
An Empirical Study of Unsupervised Evaluation Metrics
for Dialogue Response Generation
arxiv.org/abs/1603.08023
-
BLEU
Embedding Based
39. 3935 WBA -
Towards an Automatic Turing Test:
Learning to Evaluate Dialogue Responses
arxiv.org/abs/1708.07149
-
ADEM
RNN
hierarchicalRNN[El Hihi and Bengio, 1995;Sordoni+ 2015]
[shang+, 2016]
Human-like