Generating Sequences with Deep LSTMs & RNNS in julia

Generating Sequences
using Deep LSTMs & RNNs
Andre Pemmelaar @QuantixResearch
Julia Tokyo Meetup - April 2015

About Me
Andre Pemmelaar
• 5-yrs Financial System Solutions
• 12 Buy-Side Finance
• 7-yrs Japanese Gov’t Bond Options Market Maker (JGBs)
• 5-yrs Statistical Arbitrage (Global Equities)
• Low latency & Quantitative Algorithm
• Primarily use mixture of basic statistics and machine
learning (Java, F#, Python,R)
• Using Julia for most of my real work (90%) since July, 2014
• Can be reached at @QuantixResearch

Why my interest in LSTMs & RNNs
• In my field, finance, so much of the work involves sequence models.
!
• Most deep learning models are not built for use with sequences. You have
to jury rig them to make it work.
!
• RNNs and LSTM are specifically designed to work with sequence data.
!
• Sequence models can be combined with Reinforcement Learning to
produce some very nice results (more on this and a demo later)
!
• They have begun producing amazing results.
• Better initialization procedures
• Use of Rectified Linear Units for RNNs and “Memory cells” in
LSTM

So what is a Recurrent
Neural Network?

What are Recurrent Neural Networks
1. In their simplest form (RNNs), they are just Neural Networks with a feedback loop
2. The previous time step’s hidden layer and ﬁnal outputs are fed back into the
network as part of the input to the next time step’s hidden layers.
@QuantixResearch

Why Generate Sequences?!
• To improve classiﬁcation?!
• To create synthetic training data?!
• Practical tasks like speech synthesis?!
• To simulate situations?!
• To understand the data
This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves

This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves

Some great examples
Alex Graves!
Formerly at University of Toronto!
Now part of Google Deep Mind Team!
!
Has a great example of generating handwriting using a LSTM!
• 3 inputs: Δx, Δy, pen up/down!
• 121 output units!
• 20 two dimensional Gaussians for x,y = 40 means (linear) + 40!
std. devs (exp) + 20 correlations (tanh) + 20 weights (softmax)!
• 1 sigmoid for up/down!
• 3 hidden Layers, 400 LSTM cells in each!
• 3.6M weights total!
• Trained with RMSprop, learn rate 0.0001, momentum 0.9!
• Error clipped during backward pass (lots of numerical problems)!
• Trained overnight on fast multicore CPU

handwriting demo
http://www.cs.toronto.edu/~graves/handwriting.html

Some great examples
Andrej Karpathy!
Now Stanford University!
!
Has a great example of generating characters
using a LSTM!
• 51 inputs (unique characters)!
• 2 hidden Layers, 20 LSTM cells in each!
• Trained with RMSprop, learn rate 0.0001, momentum
0.9!
• Error clipped during backward pass

Character generation demo
http://cs.stanford.edu/people/karpathy/recurrentjs/

Some great examples
@hardmaru!
Tokyo, Japan!
!
Has a great example of an RNN + Reinforcement learning
using the one of the pole balancing task!
!
• Uses a recurrent neural network!
! !
• Uses genetic algorithms to train the network.!
!
• The demo is doing the balancing inverted double
pendulum task which I suspect is quite hard even for
humans !
!
• All done in Javascript which makes for some great demos!

Pole balancing demo
http://otoro.net/ml/pendulum-esp-mobile/index.html

RecurrentNN.jl
• My ﬁrst public package (Yay!!)
!
• Based on Andrej Karpathy’s implementation in recurrentjs
!
• https://github.com/Andy-P/RecurrentNN.jl
!
• Implements both Recurrent Neural Networks, and Long-Short-Term
Networks
!
• Allows one to compose arbitrary network architecture using graph.jl
!
• Makes use of Rmsprop (a variant of stochastic gradient decent)

graph.jl
• Has functionality to construct arbitrary expression graphs
over which the library can perform automatic differentiation
!
• Similar to what you may ﬁnd in Theano for Python, or in
Torch.
!
• Basic idea is to allow the user to compose neural networks
then call backprop() and have it all work with the solver
!
• https://github.com/Andy-P/RecurrentNN/src/graph.jl

type Graph
backprop::Array{Function,1}
doBackprop::Bool
function Graph(backPropNeeded::Bool)
new(Array(Function,0),backPropNeeded)
end
end
!
function sigmoid(g::Graph, m::NNMatrix)
…
if g.doBackprop
push!(g.backprop,
function ()
…
@inbounds m.dw[i,j] += out.w[i,j] * (1. - out.w[i,j]) * out.dw[i,j]
end )
end
return out
end
graph.jl
During forward
pass we build
up an array of
anonymous
functions to
calculate each
of the
gradients

graph.jl
type Graph
backprop::Array{Function,1}
doBackprop::Bool
function Graph(backPropNeeded::Bool)
new(Array(Function,0),backPropNeeded)
end
end
!
function sigmoid(g::Graph, m::NNMatrix)
…
if g.doBackprop
push!(g.backprop,
function ()
…
@inbounds m.dw[i,j] += out.w[i,j] * (1. - out.w[i,j]) * out.dw[i,j]
end )
end
return out
end
…
# use built up graph of backprop functions
# to compute backprop (set .dw ﬁelds in matirices)
for i = length(g.backprop):-1:1 g.backprop[i]() end
Then we loop
backwards through the
array calling each of
the functions to
propagate the
gradients backwards
through the network

solver.jl
function step(solver::Solver, model::Model, …)
…
for k = 1:length(modelMatices)
@inbounds m = modelMatices[k] # mat ref
@inbounds s = solver.stepcache[k]
for i = 1:m.n
for j = 1:m.d
!
# rmsprop adaptive learning rate
@inbounds mdwi = m.dw[i,j]
@inbounds s.w[i,j] = s.w[i,j] * solver.decayrate + (1.0 - solver.decayrate) * mdwi^2
!
# gradient clip
…
!
# update and regularize
@inbounds m.w[i,j] +=
- stepsize * mdwi / sqrt(s.w[i,j] + solver.smootheps) - regc * m.w[i,j]
end
end
end
…
end
Now that we have
calculated each of the
gradients, we can call
the solver to loop
through and update
each of the weights
based on the gradients
we stored during the
backprop pass
RMSProp uses an adaptive learning
rate for each individual parameter

solve.jl
Examples of RmsProp vs
other optimization algorithms
http://imgur.com/a/Hqolp

example.jl
• Based on I. Sutskever et.al. “Generating Text with
Recurrent Neural Networks” ICML, 2011!
!
• Closely follows Andrej Karpathy’s example!
!
• Read in about 1400 English Sentences from Paul Graham’s essay’s on what makes
a successful start-up!
!
• Learns to predict the next character from the previous character!
!
• Uses perplexity for cost function!
!
• Takes about 8-12hrs to get a good model (need to anneal learning rate)!
!
• letter embedding = 6, hidden units = 100 (note example default is set to 5 & [20,20])

sample output -1hr
• be bet sroud thir an
• the to be startups dalle a boticast that co thas as tame
goudtent wist
• the dase mede dosle on astasing sandiry if the the op
• that the dor slous seof the pos to they wame mace thas
theming obs and secofcagires morlillers dure t
• you i it stark to fon'te nallof the they coulker imn to suof imas
to ge thas int thals le withe the t

sample output -5hrs
!
• you dire prefor reple take stane to of conwe that there cimh the
don't than high breads them one gro
• but startups you month
• work of have not end a will araing thec sow about startup maunost
matate thinkij the show that's but
• you dire prefor reple take stane to of conwe that there cimh the
don't than high breads them one gro
• but cashe the sowe the mont pecipest ﬁtlid just
• Argmax: it's the startups the the seem the startups the the seem the
startups the the seem the startups the

sample output -10hrs
• and if will be dismiss we can all they have to be a demo
every looking
• you stall the right take to grow fast, you won't back
• new rectionally not a lot of that the initial single of optimizing
money you don't prosperity don't pl
• when you she have to probably as one there are on the
startup ideas week
• the startup need of to a company is the doesn't raise in
startups who conﬁdent is that doesn't usual

What’s not yet so great
about this package?

What’s not yet so great about this
package?
Garbage Collection
!
• Tried to keep close to the original
implementation to make regression
testing easier
!
• Karpathy’s version frequently uses JS’
push to build arrays of matrices
!
• This is appropriate in Javascript but
creates a lot of GC in Julia.
!
• The likely ﬁx is to create the arrays
only once and then update them inline
on each pass (version 0.2!)
Model Types
!
• Models need some kind of interface
that the solver can call to get the
collection of matrices
!
• At the moment that is implemented
in collectNNMat() function
!
• Could be tightened up by making
this part of the initialization of the
models
!

Thank you!
Andre Pemmelaar @QuantixResearch
Julia Tokyo Meetup - April 2015

Generating Sequences with Deep LSTMs & RNNS in julia

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (17)

Similar a Generating Sequences with Deep LSTMs & RNNS in julia

Similar a Generating Sequences with Deep LSTMs & RNNS in julia (20)

Último

Último (20)

Generating Sequences with Deep LSTMs & RNNS in julia