A Brief Introduction on Recurrent Neural Network and Its Application

A Brief Introduction on Recurrent
Neural Network and Its Application
Qiang Gan
All contents are collected online, listed in Reference page.
For Nanjing Deep Learning Meetup Only

Outline
1. RNN
o Model structure
o Parameters
o Learning algorithm
2. Long-Term Dependencies & Vanishing Gradient Problem
o LSTM / GRU
3. Neural Machine Translation
o Encoder-decoder framework
4. Attention Mechanism
o Extract information needed from source
5. RNN other applications
o Image captioning
o Question Answering

Before we start …

Memory
• We are all familiar with the song 《Two Tigers》
o Two tigers, two tiger …
• What is the 10th word?
• We learned them as a sequence, a kind of
conditional memory.
• More example: driving steps, movie scenes, …

“Memory” in Neural Network
• Traditional Neural Network
o Output relies only on current input
o input -> hidden -> output
• Network with “Memory”
o Output relies on current input and history information
o (input + prev_hidden) -> hidden -> output

“Memory” in Neural Network
• Four Steps in Network with “Memory”
1. (input + empty_hidden) -> hidden -> output
• Memory only contains blue information
2. (input + prev_hidden) -> hidden -> output
• Memory contains blue and red information
• Memory contains blue, red and green information
• Memory contains blue, red, green and purple information

Recurrent Neural Network

• Previous example

• Learning algorithm (Backpropagation Through Time)
o Unfold the RNN into DNN (weights shared)
o Black is the prediction, errors are bright yellow, derivatives
are mustard colored.

Long-Term Dependencies Problem
• Consider trying to predict the last word in the text “I
grew up in France… I speak fluent French.”
• We need the context of France, from further back.

Vanishing Gradient Problem
w1,w2,… are the weights, b1,b2,…are the biases,
C is some cost function.
aj = σ(zj), σ is activation function,
zj=wjaj−1+bj is the weighted input to the neuron.

Tanh and derivative
Vanishing Gradient Problem

Long-Short Term Memory
• Standard RNN
• LSTM
o Forget gate, input gate, output gate, cell state

Long-Short Term Memory

LSTM / GRU
LSTM GRU
(fewer parameters)
[1]An Empirical Exploration of Recurrent Network Architecture
[2]Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Neural Machine Translation
• Encoder-decoder
o Input reversing
• 《Sequence to Sequence Learning with Neural Networks》
o Input doubling
• 《Learning to Execute》

Attention Mechanism in NMT
Neural machine translation by jointly learning to align and translate. ICLR2015

Visualization of Attention Matrix
• Translating from English to French
• Elements in each row add up to 1
• in grayscale (0: black, 1: white)
• Alignments found
• La Syrie -> Syria
Neural machine translation by jointly learning to align and translate. ICLR2015

RNN Applications
• Image captioning
o Encode the image with CNN, and decode the embedded
information into description with RNN.
Li-feifei, Stanford Vision Lab

RNN Applications
• Question answering
o Encode the document and query with RNN, and predict
the token.
Teaching Machines to Read and Comprehend. NIPS2015
Attentive Reader

Summary
1. RNN
o Model structure
o Parameters
o Learning algorithm
2. Long-Term Dependencies & Vanishing Gradient Problem
o LSTM / GRU
3. Neural Machine Translation
o Encoder-decoder framework
4. Attention Mechanism
o Extract information needed from source
5. RNN other applications
o Image captioning
o Question Answering

Reference
1. Anyone Can Learn To Code an LSTM-RNN in Python
2. Recurrent Neural Network Tutorial WILDML
3. ATTENTION AND MEMORY IN DEEP LEARNING AND
NLP WILDML
4. Neural Networks and Deep Learning
5. Understanding LSTM Networks
6. Sequence to Sequence Learning with Neural
Networks. NIPS2014
7. Teaching Machines to Read and Comprehend.
NIPS2015

A Brief Introduction on Recurrent Neural Network and Its Application

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A Brief Introduction on Recurrent Neural Network and Its Application

Similar to A Brief Introduction on Recurrent Neural Network and Its Application (20)

More from Xiaohu ZHU

More from Xiaohu ZHU (9)

Recently uploaded

Recently uploaded (20)

A Brief Introduction on Recurrent Neural Network and Its Application