A technical talk discussing how to use the Markov Chain Monte Carlo methods inPyMC3 to deliver novel Bayesian Statistical models. Our case study is how to infer the strengths of Rugby teams from the Six Nations. This talk was delivered at the University of Cambridge in 2015.
2. Who am I?Who am I?
I work as a Data Scientist for a large Telecommunications Company
Masters in Mathematics
Interned at Amazon
Was a consultant for a while
Occasional contributor to Pandas and other projects
Co-organizer of the Data Science Meetup in Luxembourg
Member of Royal Statistical Society and NumFOCUS
@springcoil
3. What is Probabilistic ProgrammingWhat is Probabilistic Programming
Basically using random variables instead of variables
Allows you to create a generative story rather than a black box
A different tool to Machine Learning
A different paradigm to frequentist statistics
Forces you to be explicit about your 'subjective' assumptions
6. Bayesian StatisticsBayesian Statistics
I studied Mathematics, and encountered in textbooks Bayesians
This is a hard area to do by pen and paper, and most integrals can't be
solved in exact form
Thankfully there was an invention of Monte Carlo Simulations
These simulations are used to approximate your likelihood function
10. How do you pick your prior?How do you pick your prior?
This is a bit of an art
You generally base the prior on experience
As you add more data this matters less and less
11.
12. Huh but isn't ProbabilisticHuh but isn't Probabilistic
Programming just Stan and BUGS?Programming just Stan and BUGS?
13. No in Python you have PyMC3No in Python you have PyMC3
A complete rewrite of PyMC2 now in 'Beta' status
Based upon Theano
Computational techniques for handling gradients
Automatic Differentiation and GPU speedup
Theano - is also used in deep learning!
Currently there is a project to port ' ' from
I gave a thorough tutorial on this -
Key authors: John Salvatier, Thomas Wiecki, Chris Fonnesbeck
BMH PyMC2 to PyMC3
my github
14. Case study: Rugby AnalyticsCase study: Rugby Analytics
I wanted to do a model of the Six Nations last year.
I wanted to build an understandable model to predict the winner
Key Info: Inferring the 'strength' of each team.
We only have scoring data, which is noisy hence Bayesian Stats
15. What did I do?What did I do?
1. I picked Gamma as a prior for all teams
2. I used a Hierarchical Model because I wanted home advantage to be
stronger for stronger teams based
3. From this I was able to create a novel model based only on historical
results and scoring intensity
4. I simulated the likelihood function using MCMC
21. What actually happenedWhat actually happened
The model incorrectly predicted that England would come out on top.
Ireland actually won by points difference of 6 points.
It really came down to the wire!
"Prediction is difficult especially about the future"
One of the problems is what we call 'over-shrinkage' and you can
delve into the results to see what the errors are, my model was within
the errors.
Hat tip: Thanks to Abraham Flaxman and the PyMC3 on helping me
port this from PyMC2 to PyMC3
22. Lessons learnedLessons learned
I can build an explainable model using PyMC2 and PyMC3
Generative stories help you build up interest with your colleagues
Communication is the 'last mile' problem of Data Science
PyMC3 is cool please use it and please contribute