2. Overview
Learning representations of natural language documents
A brief introduction to Generative Adversarial Networks
Energy-based Generative Adversarial Networks
An adversarial document model
Future work & conclusion
3. Representation learning
The ability to learn robust, reusable feature representations
from unlabelled data has potential applications in a wide
variety of machine learning tasks, such as data retrieval
and classification.
One way to create such representations is to train deep
generative models that can learn to capture the complex
distributions of real-world data.
5. Document representations: LDA
The traditional approach to doing this is to use something
like LDA.
In LDA documents consist of a mixture of topics, with each
topic defining a probability distribution over the words in
the vocabulary.
Documents represented by a vector of mixture weights
over associated topics.
6. Document representations: LDA
α
β
z w N
M
θ
α is the parameter of the Dirichlet prior on the
per-document topic distributions, β is the parameter of the
Dirichlet prior on the per-topic word distribution, θm is the
topic distribution for document m, zmn is the topic for the
nth word in document m, and wmn is the specific word.
9. Generative Adversarial Networks
Generative Adversarial Networks (GANs) involve a
min-max adversarial game between a generative model G
and a discriminative model D.
G(z) is a neural network, that is trained to map samples z
from a prior noise distribution p(z) to the data space.
D(x) is another neural network that takes a data sample x
as input and outputs a single scalar value representing the
probability that x came from the data distribution instead of
G(z).
11. Generative Adversarial Networks
D is trained to maximise the probability of assigning the
correct label to the input x.
G is trained to maximally confuse D, using the gradient of
D(x) with respect to x to update its parameters.
min
G
max
D
Ex∼p(data)[log D(x)] + Ez∼p(z)[log(1 − D(G(z)))]
12. GAN samples
Source: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
https://arxiv.org/abs/1511.06434v2
14. Energy-based Generative Adversarial Networks
Source: Yann Lecun’s slides on energy-based GANs, NIPS 2016.
Energy function: outputs low values on the data manifold,
higher values everywhere else.
15. Energy-based Generative Adversarial Networks
Source: Yann Lecun’s slides on energy-based GANs, NIPS 2016.
Easy to push down energy of observed data via SGD.
How to choose where to push energy up?
16. Energy-based Generative Adversarial Networks
Source: Yann Lecun’s slides on energy-based GANs, NIPS 2016.
Generator learns to pick points where the energy should
be increased.
Can view D as a learned objective function.
17. Energy-based Generative Adversarial Networks
The energy function is trained to push down on the energy
of real samples x, and to push up on the energy of
generated samples ˆx. (fD is the value to be minimised at
each iteration and m is a margin between positive and
negative energies):
fD(x, z) = D(x) + max(0, m − D(G(z)))
At each iteration, the generator G is trained adversarially
against D to minimize fG:
fG(z) = D(G(z))
18. Energy-based Generative Adversarial Networks
In practise, the energy-based GAN formulation seems to
be easier to train.
Empirical results in ”Energy-based Generative Adversarial
Network” (https://arxiv.org/abs/1609.03126) with more than
6500 experiments.
19. An adversarial document model
Can we use the GAN formulation to learn representations
of natural language documents?
Questions:
1. How to represent documents? GANs require everything to
be differentiable, but need to deal with discrete text.
2. How to get a representation? No explicit mapping back to
latent (z) space.
20. An adversarial document model
z
x
CG Enc
DecMSE
h
D
Using an Energy-Based GAN to learn document representations. G is the generator, Enc and Dec are DAE encoder
and decoder networks, C is a corruption process (bypassed at test time) and D is the discriminator.
Input to discriminator is the binary bag-of-words
representation of a document: x ∈ {0, 1}V
.
Energy-based GAN with Denoising Autoencoder
discriminator.
21. Document retrieval evaluation
0.0001 0.0002 0.0005 0.002 0.01 0.05 0.2 1.0
Recall
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Precision
ADM
ADM (AE)
DocNADE
DAE
Precision-recall curves for the document retrieval task on the 20 Newsgroups dataset. DocNADE is described in
(Larochelle and Lauly, 2012), ADM is the adversarial document model, ADM (AE) is the adversarial document
model with a standard Autoencoder as the discriminator (and so it similar to the Energy-Based GAN), and DAE is a
Denoising Autoencoder.
22. Qualitative evaluation: TSNE plot
t-SNE visualizations of the document representations learned by the adversarial document model on the held-out
test dataset of 20 Newsgroups. The documents belong to 20 different topics, which correspond to different coloured
points in the figure.
23. Future work
Understanding why the DAE in the GAN discriminator
appears to produce significantly better representations
than a standalone DAE.
Exploring the impact of applying additional constraints to
the representation layer.
24. Conclusion
Showed that a variation on the recently proposed
Energy-Based GAN can be used to learn document
representations in an unsupervised setting.
In the current formulation still short of state-of-the-art, but
still very early days for this line of research so likely that we
can push this a lot further.
Suggested some interesting areas for future research.
25. More information
Introduction to GANs: http://blog.aylien.com/introduction-
generative-adversarial-networks-code-tensorflow
Paper:
https://sites.google.com/site/nips2016adversarial/home/accepted-
papers