SlideShare una empresa de Scribd logo
1 de 50
Descargar para leer sin conexión
ITERATING OVER STATISTICAL MODELS
NCAA TOURNAMENT EDITION
Daniel Lee
WHY ARE YOU LISTENING TO ME?
▸ Stan developer

http://mc-stan.org

▸ Researcher at Columbia

▸ Co-founder of Stan Group

training / statistical support /
consulting






▸ email: bearlee@alum.mit.edu

twitter: @djsyclik / @mcmc_stan

web: http://syclik.com
MARCH MADNESS
WHAT IS MARCH MADNESS?
WHAT IS MARCH MADNESS?
CHAMPIONSHIP GAME: 

VILLANOVA VS NORTH CAROLINA
P(VILLANOVA WIN)?
CHAMPIONSHIP GAME: 

VILLANOVA VS NORTH CAROLINA
P(VILLANOVA WIN)?
VILLANOVA WILDCATS VS. NORTH CAROLINA TAR HEELS
▸ 1:00. 70 - 69
▸ 0:35. 72 - 69
▸ 0:23. 72 - 71
P(VILLANOVA WIN)?
VILLANOVA WILDCATS VS. NORTH CAROLINA TAR HEELS
▸ 1:00. 70 - 69
▸ 0:35. 72 - 69
▸ 0:23. 72 - 71
▸ 0:13. 74 - 71
P(VILLANOVA WIN)?
VILLANOVA WILDCATS VS. NORTH CAROLINA TAR HEELS
▸ 1:00. 70 - 69
▸ 0:35. 72 - 69
▸ 0:23. 72 - 71
▸ 0:13. 74 - 71
▸ 0:06. 74 - 74
P(VILLANOVA WIN)?
VILLANOVA WILDCATS VS. NORTH CAROLINA TAR HEELS
▸ 1:00. 70 - 69
▸ 0:35. 72 - 69
▸ 0:23. 72 - 71
▸ 0:13. 74 - 71
▸ 0:06. 74 - 74
▸ 0:00. 77 - 74. Villanova wins.
STATISTICAL MODELS
TREAT
AS CODE
WRITING CODE IS A DISCIPLINE
WRITING CODE IS A DISCIPLINE
▸ design patterns
▸ testing
▸ code review
▸ maintenance
▸ modularity
▸ collaboration
CAN BE
IS STATISTICAL MODELING A DISCIPLINE?
▸ Art or science?

▸ Models have names

▸ Statistical model vs implementation

▸ Collaboration on the statistical model?
TREAT STATISTICAL MODELS AS CODE
WHAT DO WE NEED TO DO
▸ Elevate statistical models: first class entity

▸ Modularize

▸ Language

▸ Discuss subtle details

▸ Collaborate
TREAT STATISTICAL MODELS AS CODE
STAN GETS US CLOSE
▸ statistical modeling language
▸ domain-specific language

has its own grammar; not R or BUGS!
▸ rstan

shinystan, RStudio integration, rstanarm
▸ open-source

core libraries are new BSD
▸ Stan program
▸ plain text
▸ plays nicely with source
repositories
▸ imperative language
Note: Stan isn’t the only thing you can do this with
MODELING
BASKETBALL
MARCH MADNESS
BASKETBALL HISTORY
▸ 1891
▸ Dr. James Naismith. Springfield, MA
▸ Non-contact conditioning
▸ 13 simple rules
▸ Peach basket





MARCH MADNESS
BASKETBALL HISTORY
▸ 1891
▸ Dr. James Naismith. Springfield, MA
▸ Non-contact conditioning
▸ 13 simple rules
▸ Peach basket

10. The umpire shall be judge of the men and shall note the fouls and notify the
referee when three consecutive fouls have been made. He shall have power to
disqualify men according to Rule 5.
MARCH MADNESS
BASKETBALL NOW
▸ 2 x 20 min half
▸ Increasing score
▸ Points increment by 2, 3, and 1
▸ 5 players, unlimited substitutions
▸ player DQ: 5th foul
▸ Bonus: 7 team fouls

Double bonus: 10 team fouls
MARCH MADNESS
DATA
▸ 351 NCAA Division 1 Men’s basketball teams
▸ 33 conferences
▸ 5421 games
▸ 24 - 35 games per team
▸ Max 3 observations
IS THIS BIG DATA?
TALL DATA VS WIDE DATA
▸ Tall data

lots of replications



▸ Wide data

lots of fields



day, home, score, ot, fgm, fga, 3pm, 3pa, 3a, fta, ftm, or, dr, ast, to, stl, blk, pf
THREE STEPS OF
BAYESIAN DATA ANALYSIS
ANDREW GELMAN IN BDA
THE THREE STEPS OF BAYESIAN DATA ANALYSIS
1. Set up full probability model



2. Condition on observed data



3. Evaluate the fit of the model

ANDREW GELMAN IN BDA
THE THREE STEPS OF BAYESIAN DATA ANALYSIS
1. Set up full probability model

Write a Stan program

2. Condition on observed data



3. Evaluate the fit of the model

ANDREW GELMAN IN BDA
THE THREE STEPS OF BAYESIAN DATA ANALYSIS
1. Set up full probability model

Write a Stan program

2. Condition on observed data

Run RStan

3. Evaluate the fit of the model

ANDREW GELMAN IN BDA
THE THREE STEPS OF BAYESIAN DATA ANALYSIS
1. Set up full probability model

Write a Stan program

2. Condition on observed data

Run RStan

3. Evaluate the fit of the model

R, ShinyStan, posterior predictive checks

ITERATING OVER
MODELS
ITERATING OVER STATISTICAL MODELS
STATISTICAL MODEL #1
▸ Only 2015-2016 matters
▸ Teams have a latent ability
▸ “logistic regression”
▸ “Bradley-Terry model”
y ⇠ bernoulli(logit 1
(✓1 ✓2))
P(VILLANOVA > UNC | DATA) = 0.73
ITERATING OVER STATISTICAL MODELS
TREAT STATISTICAL MODELS AS CODE
▸ Statistical model in a separate file
▸ Git
▸ Testing
▸ Inspection of fit
▸ Backtest on historical data
▸ Priors
▸ “Model #1”
IN WRITING, YOU MUST KILL ALL YOUR
DARLINGS.
Willliam Faulkner
ITERATING OVER STATISTICAL MODELS
STATISTICAL MODEL #2
▸ Home court advantage!

▸ Teams have a latent ability
▸ “logistic regression”
▸ “Bradley-Terry model”
y ⇠ bernoulli(logit 1
(↵ + ✓1 ✓2))
P(VILLANOVA > UNC | DATA) = 0.52
ITERATING OVER STATISTICAL MODELS
STATISTICAL MODEL #3
▸ Assumptions:
▸ Only 2015-2016 matters
▸ Teams have a latent ability
▸ Model points
▸ Add a home court advantage
ITERATING OVER STATISTICAL MODELS
HOW DID WE DO?
▸ Kaggle: 87 / 608
▸ What went wrong?
▸ What’s next?
STATISTICAL MODELS
TREAT
AS CODE
THANKS
▸ Collaborative effort
▸ NCAA modeling:

Rob Trangucci
▸ Stan team

Andrew Gelman, Bob Carpenter, Matt
Hoffman, Ben Goodrich, Michael Betancourt,
Marcus Brubaker, Jiqiang Guo, Peter Li, Allen
Riddell, Marco Ignacio, Jeff Arnold, Mitzi
Morris, Rob Goedman, Brian Lau, Jonah
Gabry, Alp Kucukelbir, Robert Grant, Dustin
Tran, Krzysztof Sakrejda, Aki Vehtari, Rayleigh
Lei, Sebastian Weber
HELP
▸ http://mc-stan.org
▸ stan-users mailing list

▸ Stan Group Inc.

http://stan.fit

training / statistical support / consulting



▸ bearlee@alum.mit.edu / @djsyclik / @mcmc_stan

Más contenido relacionado

Destacado

Destacado (15)

Analyzing NYC Transit Data
Analyzing NYC Transit DataAnalyzing NYC Transit Data
Analyzing NYC Transit Data
 
The Political Impact of Social Penumbras
The Political Impact of Social PenumbrasThe Political Impact of Social Penumbras
The Political Impact of Social Penumbras
 
Reflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYCReflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYC
 
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
 
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love TestsDr. Datascience or: How I Learned to Stop Munging and Love Tests
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
 
Julia + R for Data Science
Julia + R for Data ScienceJulia + R for Data Science
Julia + R for Data Science
 
R for Everything
R for EverythingR for Everything
R for Everything
 
Using R at NYT Graphics
Using R at NYT GraphicsUsing R at NYT Graphics
Using R at NYT Graphics
 
Thinking Small About Big Data
Thinking Small About Big DataThinking Small About Big Data
Thinking Small About Big Data
 
Improving Data Interoperability for Python and R
Improving Data Interoperability for Python and RImproving Data Interoperability for Python and R
Improving Data Interoperability for Python and R
 
Building Scalable Prediction Services in R
Building Scalable Prediction Services in RBuilding Scalable Prediction Services in R
Building Scalable Prediction Services in R
 
What We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
What We Learned Building an R-Python Hybrid Predictive Analytics PipelineWhat We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
What We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
 
High-Performance Python
High-Performance PythonHigh-Performance Python
High-Performance Python
 
Inside the R Consortium
Inside the R ConsortiumInside the R Consortium
Inside the R Consortium
 
Scaling Analysis Responsibly
Scaling Analysis ResponsiblyScaling Analysis Responsibly
Scaling Analysis Responsibly
 

Más de Work-Bench

Cloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedCloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions Compared
Work-Bench
 

Más de Work-Bench (8)

2017 Enterprise Almanac
2017 Enterprise Almanac2017 Enterprise Almanac
2017 Enterprise Almanac
 
AI to Enable Next Generation of People Managers
AI to Enable Next Generation of People ManagersAI to Enable Next Generation of People Managers
AI to Enable Next Generation of People Managers
 
Startup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview ProcessStartup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview Process
 
Cloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedCloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions Compared
 
Building a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDBBuilding a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDB
 
How to Market Your Startup to the Enterprise
How to Market Your Startup to the EnterpriseHow to Market Your Startup to the Enterprise
How to Market Your Startup to the Enterprise
 
Marketing & Design for the Enterprise
Marketing & Design for the EnterpriseMarketing & Design for the Enterprise
Marketing & Design for the Enterprise
 
Playing the Marketing Long Game
Playing the Marketing Long GamePlaying the Marketing Long Game
Playing the Marketing Long Game
 

Último

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 

Último (20)

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 

Iterating over statistical models: NCAA tournament edition

  • 1. ITERATING OVER STATISTICAL MODELS NCAA TOURNAMENT EDITION
  • 2. Daniel Lee WHY ARE YOU LISTENING TO ME? ▸ Stan developer
 http://mc-stan.org
 ▸ Researcher at Columbia
 ▸ Co-founder of Stan Group
 training / statistical support / consulting 
 
 
 ▸ email: bearlee@alum.mit.edu
 twitter: @djsyclik / @mcmc_stan
 web: http://syclik.com
  • 4. WHAT IS MARCH MADNESS?
  • 5. WHAT IS MARCH MADNESS?
  • 7. P(VILLANOVA WIN)? CHAMPIONSHIP GAME: 
 VILLANOVA VS NORTH CAROLINA
  • 8. P(VILLANOVA WIN)? VILLANOVA WILDCATS VS. NORTH CAROLINA TAR HEELS ▸ 1:00. 70 - 69 ▸ 0:35. 72 - 69 ▸ 0:23. 72 - 71
  • 9. P(VILLANOVA WIN)? VILLANOVA WILDCATS VS. NORTH CAROLINA TAR HEELS ▸ 1:00. 70 - 69 ▸ 0:35. 72 - 69 ▸ 0:23. 72 - 71 ▸ 0:13. 74 - 71
  • 10. P(VILLANOVA WIN)? VILLANOVA WILDCATS VS. NORTH CAROLINA TAR HEELS ▸ 1:00. 70 - 69 ▸ 0:35. 72 - 69 ▸ 0:23. 72 - 71 ▸ 0:13. 74 - 71 ▸ 0:06. 74 - 74
  • 11. P(VILLANOVA WIN)? VILLANOVA WILDCATS VS. NORTH CAROLINA TAR HEELS ▸ 1:00. 70 - 69 ▸ 0:35. 72 - 69 ▸ 0:23. 72 - 71 ▸ 0:13. 74 - 71 ▸ 0:06. 74 - 74 ▸ 0:00. 77 - 74. Villanova wins.
  • 12.
  • 14. WRITING CODE IS A DISCIPLINE
  • 15. WRITING CODE IS A DISCIPLINE ▸ design patterns ▸ testing ▸ code review ▸ maintenance ▸ modularity ▸ collaboration CAN BE
  • 16. IS STATISTICAL MODELING A DISCIPLINE? ▸ Art or science?
 ▸ Models have names
 ▸ Statistical model vs implementation
 ▸ Collaboration on the statistical model?
  • 17. TREAT STATISTICAL MODELS AS CODE WHAT DO WE NEED TO DO ▸ Elevate statistical models: first class entity
 ▸ Modularize
 ▸ Language
 ▸ Discuss subtle details
 ▸ Collaborate
  • 18. TREAT STATISTICAL MODELS AS CODE STAN GETS US CLOSE ▸ statistical modeling language ▸ domain-specific language
 has its own grammar; not R or BUGS! ▸ rstan
 shinystan, RStudio integration, rstanarm ▸ open-source
 core libraries are new BSD ▸ Stan program ▸ plain text ▸ plays nicely with source repositories ▸ imperative language Note: Stan isn’t the only thing you can do this with
  • 20. MARCH MADNESS BASKETBALL HISTORY ▸ 1891 ▸ Dr. James Naismith. Springfield, MA ▸ Non-contact conditioning ▸ 13 simple rules ▸ Peach basket
 
 

  • 21. MARCH MADNESS BASKETBALL HISTORY ▸ 1891 ▸ Dr. James Naismith. Springfield, MA ▸ Non-contact conditioning ▸ 13 simple rules ▸ Peach basket
 10. The umpire shall be judge of the men and shall note the fouls and notify the referee when three consecutive fouls have been made. He shall have power to disqualify men according to Rule 5.
  • 22. MARCH MADNESS BASKETBALL NOW ▸ 2 x 20 min half ▸ Increasing score ▸ Points increment by 2, 3, and 1 ▸ 5 players, unlimited substitutions ▸ player DQ: 5th foul ▸ Bonus: 7 team fouls
 Double bonus: 10 team fouls
  • 23. MARCH MADNESS DATA ▸ 351 NCAA Division 1 Men’s basketball teams ▸ 33 conferences ▸ 5421 games ▸ 24 - 35 games per team ▸ Max 3 observations
  • 24.
  • 25. IS THIS BIG DATA?
  • 26. TALL DATA VS WIDE DATA ▸ Tall data
 lots of replications
 
 ▸ Wide data
 lots of fields
 
 day, home, score, ot, fgm, fga, 3pm, 3pa, 3a, fta, ftm, or, dr, ast, to, stl, blk, pf
  • 27. THREE STEPS OF BAYESIAN DATA ANALYSIS
  • 28. ANDREW GELMAN IN BDA THE THREE STEPS OF BAYESIAN DATA ANALYSIS 1. Set up full probability model
 
 2. Condition on observed data
 
 3. Evaluate the fit of the model

  • 29. ANDREW GELMAN IN BDA THE THREE STEPS OF BAYESIAN DATA ANALYSIS 1. Set up full probability model
 Write a Stan program
 2. Condition on observed data
 
 3. Evaluate the fit of the model

  • 30. ANDREW GELMAN IN BDA THE THREE STEPS OF BAYESIAN DATA ANALYSIS 1. Set up full probability model
 Write a Stan program
 2. Condition on observed data
 Run RStan
 3. Evaluate the fit of the model

  • 31. ANDREW GELMAN IN BDA THE THREE STEPS OF BAYESIAN DATA ANALYSIS 1. Set up full probability model
 Write a Stan program
 2. Condition on observed data
 Run RStan
 3. Evaluate the fit of the model
 R, ShinyStan, posterior predictive checks

  • 33. ITERATING OVER STATISTICAL MODELS STATISTICAL MODEL #1 ▸ Only 2015-2016 matters ▸ Teams have a latent ability ▸ “logistic regression” ▸ “Bradley-Terry model” y ⇠ bernoulli(logit 1 (✓1 ✓2))
  • 34.
  • 35. P(VILLANOVA > UNC | DATA) = 0.73
  • 36.
  • 37.
  • 38. ITERATING OVER STATISTICAL MODELS TREAT STATISTICAL MODELS AS CODE ▸ Statistical model in a separate file ▸ Git ▸ Testing ▸ Inspection of fit ▸ Backtest on historical data ▸ Priors ▸ “Model #1”
  • 39. IN WRITING, YOU MUST KILL ALL YOUR DARLINGS. Willliam Faulkner
  • 40. ITERATING OVER STATISTICAL MODELS STATISTICAL MODEL #2 ▸ Home court advantage!
 ▸ Teams have a latent ability ▸ “logistic regression” ▸ “Bradley-Terry model” y ⇠ bernoulli(logit 1 (↵ + ✓1 ✓2))
  • 41.
  • 42. P(VILLANOVA > UNC | DATA) = 0.52
  • 43.
  • 44. ITERATING OVER STATISTICAL MODELS STATISTICAL MODEL #3 ▸ Assumptions: ▸ Only 2015-2016 matters ▸ Teams have a latent ability ▸ Model points ▸ Add a home court advantage
  • 45.
  • 46.
  • 47.
  • 48. ITERATING OVER STATISTICAL MODELS HOW DID WE DO? ▸ Kaggle: 87 / 608 ▸ What went wrong? ▸ What’s next?
  • 50. THANKS ▸ Collaborative effort ▸ NCAA modeling:
 Rob Trangucci ▸ Stan team
 Andrew Gelman, Bob Carpenter, Matt Hoffman, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, Allen Riddell, Marco Ignacio, Jeff Arnold, Mitzi Morris, Rob Goedman, Brian Lau, Jonah Gabry, Alp Kucukelbir, Robert Grant, Dustin Tran, Krzysztof Sakrejda, Aki Vehtari, Rayleigh Lei, Sebastian Weber HELP ▸ http://mc-stan.org ▸ stan-users mailing list
 ▸ Stan Group Inc.
 http://stan.fit
 training / statistical support / consulting
 
 ▸ bearlee@alum.mit.edu / @djsyclik / @mcmc_stan