2. About Me
Andre Pemmelaar
• 5-yrs Financial System Solutions
• 12 Buy-Side Finance
• 7-yrs Japanese Gov’t Bond Options Market Maker (JGBs)
• 5-yrs Statistical Arbitrage (Global Equities)
• Low latency & Quantitative Algorithm
• Primarily use mixture of basic statistics and machine
learning (Java, F#, Python,R)
• Using Julia for most of my real work (90%) since July, 2014
• Can be reached at @QuantixResearch
3. Why my interest in LSTMs & RNNs
• In my field, finance, so much of the work involves sequence models.
!
• Most deep learning models are not built for use with sequences. You have
to jury rig them to make it work.
!
• RNNs and LSTM are specifically designed to work with sequence data.
!
• Sequence models can be combined with Reinforcement Learning to
produce some very nice results (more on this and a demo later)
!
• They have begun producing amazing results.
• Better initialization procedures
• Use of Rectified Linear Units for RNNs and “Memory cells” in
LSTM
6. What are Recurrent Neural Networks
1. In their simplest form (RNNs), they are just Neural Networks with a feedback loop
2. The previous time step’s hidden layer and final outputs are fed back into the
network as part of the input to the next time step’s hidden layers.
@QuantixResearch
7. Why Generate Sequences?!
• To improve classification?!
• To create synthetic training data?!
• Practical tasks like speech synthesis?!
• To simulate situations?!
• To understand the data
This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves
8. This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves
9. This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves
10. This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves
11. This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves
12. Some great examples
Alex Graves!
Formerly at University of Toronto!
Now part of Google Deep Mind Team!
!
Has a great example of generating handwriting using a LSTM!
• 3 inputs: Δx, Δy, pen up/down!
• 121 output units!
• 20 two dimensional Gaussians for x,y = 40 means (linear) + 40!
std. devs (exp) + 20 correlations (tanh) + 20 weights (softmax)!
• 1 sigmoid for up/down!
• 3 hidden Layers, 400 LSTM cells in each!
• 3.6M weights total!
• Trained with RMSprop, learn rate 0.0001, momentum 0.9!
• Error clipped during backward pass (lots of numerical problems)!
• Trained overnight on fast multicore CPU
14. Some great examples
Andrej Karpathy!
Now Stanford University!
!
Has a great example of generating characters
using a LSTM!
• 51 inputs (unique characters)!
• 2 hidden Layers, 20 LSTM cells in each!
• Trained with RMSprop, learn rate 0.0001, momentum
0.9!
• Error clipped during backward pass
16. Some great examples
@hardmaru!
Tokyo, Japan!
!
Has a great example of an RNN + Reinforcement learning
using the one of the pole balancing task!
!
• Uses a recurrent neural network!
! !
• Uses genetic algorithms to train the network.!
!
• The demo is doing the balancing inverted double
pendulum task which I suspect is quite hard even for
humans !
!
• All done in Javascript which makes for some great demos!
19. RecurrentNN.jl
• My first public package (Yay!!)
!
• Based on Andrej Karpathy’s implementation in recurrentjs
!
• https://github.com/Andy-P/RecurrentNN.jl
!
• Implements both Recurrent Neural Networks, and Long-Short-Term
Networks
!
• Allows one to compose arbitrary network architecture using graph.jl
!
• Makes use of Rmsprop (a variant of stochastic gradient decent)
20. graph.jl
• Has functionality to construct arbitrary expression graphs
over which the library can perform automatic differentiation
!
• Similar to what you may find in Theano for Python, or in
Torch.
!
• Basic idea is to allow the user to compose neural networks
then call backprop() and have it all work with the solver
!
• https://github.com/Andy-P/RecurrentNN/src/graph.jl
23. solver.jl
function step(solver::Solver, model::Model, …)
…
for k = 1:length(modelMatices)
@inbounds m = modelMatices[k] # mat ref
@inbounds s = solver.stepcache[k]
for i = 1:m.n
for j = 1:m.d
!
# rmsprop adaptive learning rate
@inbounds mdwi = m.dw[i,j]
@inbounds s.w[i,j] = s.w[i,j] * solver.decayrate + (1.0 - solver.decayrate) * mdwi^2
!
# gradient clip
…
!
# update and regularize
@inbounds m.w[i,j] +=
- stepsize * mdwi / sqrt(s.w[i,j] + solver.smootheps) - regc * m.w[i,j]
end
end
end
…
end
Now that we have
calculated each of the
gradients, we can call
the solver to loop
through and update
each of the weights
based on the gradients
we stored during the
backprop pass
RMSProp uses an adaptive learning
rate for each individual parameter
25. example.jl
• Based on I. Sutskever et.al. “Generating Text with
Recurrent Neural Networks” ICML, 2011!
!
• Closely follows Andrej Karpathy’s example!
!
• Read in about 1400 English Sentences from Paul Graham’s essay’s on what makes
a successful start-up!
!
• Learns to predict the next character from the previous character!
!
• Uses perplexity for cost function!
!
• Takes about 8-12hrs to get a good model (need to anneal learning rate)!
!
• letter embedding = 6, hidden units = 100 (note example default is set to 5 & [20,20])
26. sample output -1hr
• be bet sroud thir an
• the to be startups dalle a boticast that co thas as tame
goudtent wist
• the dase mede dosle on astasing sandiry if the the op
• that the dor slous seof the pos to they wame mace thas
theming obs and secofcagires morlillers dure t
• you i it stark to fon'te nallof the they coulker imn to suof imas
to ge thas int thals le withe the t
27. sample output -5hrs
!
• you dire prefor reple take stane to of conwe that there cimh the
don't than high breads them one gro
• but startups you month
• work of have not end a will araing thec sow about startup maunost
matate thinkij the show that's but
• you dire prefor reple take stane to of conwe that there cimh the
don't than high breads them one gro
• but cashe the sowe the mont pecipest fitlid just
• Argmax: it's the startups the the seem the startups the the seem the
startups the the seem the startups the
28. sample output -10hrs
• and if will be dismiss we can all they have to be a demo
every looking
• you stall the right take to grow fast, you won't back
• new rectionally not a lot of that the initial single of optimizing
money you don't prosperity don't pl
• when you she have to probably as one there are on the
startup ideas week
• the startup need of to a company is the doesn't raise in
startups who confident is that doesn't usual
30. What’s not yet so great about this
package?
Garbage Collection
!
• Tried to keep close to the original
implementation to make regression
testing easier
!
• Karpathy’s version frequently uses JS’
push to build arrays of matrices
!
• This is appropriate in Javascript but
creates a lot of GC in Julia.
!
• The likely fix is to create the arrays
only once and then update them inline
on each pass (version 0.2!)
Model Types
!
• Models need some kind of interface
that the solver can call to get the
collection of matrices
!
• At the moment that is implemented
in collectNNMat() function
!
• Could be tightened up by making
this part of the initialization of the
models
!