3. Akshay Sehgal (www.akshaysehgal.com)
How to handle sequence data?
• Text, Stock prices, Sensor signals, DNA, Customer purchase behaviour, Sound signals
• Bag of words doesn’t preserve order/sequence in data
• Modelling sequential data requires a ‘temporal’ architecture to simulate ‘memory’
• The attempt is to encode a sequence into itself in an iterative manner (recurrent) over a ‘time step’
• Applications include predictive models, natural language understanding, POS tagging, Machine
translation, natural language generation etc.
4. Akshay Sehgal (www.akshaysehgal.com)
An RNN (Recurrent Neural Network) can be seen as a
layer in a neural network used for encoding sequential
data into a vector representation that can then be used
for various tasks such as classification or just as an
encoding. In other words, it's a method to perform
feature engineering in an automated way for sequential
data.
What is an RNN?
What time is ?
5. Akshay Sehgal (www.akshaysehgal.com)
• Long-term dependencies not captured, as
the number of time steps increase, the RNN
is unable to connect information
• Vanishing gradient problem causes loss of
long term memory, while emphasising short
term.
Why don’t RNNs work in practice?
6. Akshay Sehgal (www.akshaysehgal.com)
• LSTMs try to add long term memory to remember certain hidden states more than others. This allows
them to retain knowledge over longer sequences.
• They have 2 outputs instead of 1, the hidden state and the cell state. Their computation is a bit more
complex than RNNs
How do LSTMs work?
RNN Chain
LSTM Chain
7. Akshay Sehgal (www.akshaysehgal.com)
• An LSTMs architecture consists of 3 gates - Forget
gate, Input gate, Output gate
• Tanh acts as a squashing function while Sigmoid
acts as a decision function (gate)
• Cell state is a channel that runs along the LSTM
chain carrying information from one time-step to
another freely
LSTM cell architecture
8. Akshay Sehgal (www.akshaysehgal.com)
A cell state is a conveyor belt that can carry information
from one time step to another. The three gates add
information to the cell state. Whether to add information
or not is dependent on the Sigmoid function. 0 means
add no information, 1 means add complete information.
The Cell state
9. Akshay Sehgal (www.akshaysehgal.com)
Let's say that the previous few time steps encode the
information about the gender of the subject. This is useful to
predict the next few words when the subject is the same.
But when a new subject enters, we would not want to retain
memory of the information about gender. This is what the
forget gate gets trained to do.
It concatenates the previous hidden state to the current
input, multiplies it with weights and adds a bias, then applies
a sigmoid function before multiplying it to the cell state.
The Forget Gate
10. Akshay Sehgal (www.akshaysehgal.com)
Input gate decides what information needs to be saved to the cell state. It simply does the same operation
as a forget gate but instead of writing it onto the cell state, it combines (multiplies) it with the Tanh
(squashed) of the concatenated vector of hidden state and input (plus bias). This is then added to the cell
state, which has been updated by the forget gate already.
The Input Gate
11. Akshay Sehgal (www.akshaysehgal.com)
Finally, we decide what is the output of the LSTM
cell (other than the cell state, which becomes the
hidden state for the next LSTM cell). This is done
simply by applying a sigmoid function on the
concatenation of the previous hidden state and
current input. But we then multiply it with the
squashed (tanh) version of the cell state which
contains what to remember and what to forget.
The Output Gate