SlideShare una empresa de Scribd logo
1 de 35
1
Headline Goes Here
Speaker Name or Subhead Goes Here
DO NOT USE PUBLICLY
PRIOR TO 10/23/12
Doing Data Science on the
NFL Play by Play Dataset
Jesse Anderson | Curriculum Developer and Instructor
July 2013 v2
Plays
2
• Advanced NFL
stats released all
Play by Play since
2002 season
• 2,898 total games
• 471,392 plays
Full Play Entry
3
20121119_CHI@SF,3,1
7,48,SF,CHI,3,2,76,20,
0,(2:48) C.Kaepernick
pass short right to
M.Crabtree to SF 25
for 1 yard (C.Tillman).
Caught at SF 25. 0-yds
YAC,0,3,0,27,7 ,2012
Play Description
4
(2:48) C.Kaepernick
pass short right to
M.Crabtree to SF
25 for 1 yard
(C.Tillman). Caught
at SF 25. 0-yds YAC
There's A Chart for That
5
There's A Custom MapReduce Behind That
6
public class IncompletesMapper extends
Mapper<LongWritable, Text, Text, PassWritable> {
@Override
public void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException {
String line = value.toString();
if (line.contains("incomplete")) {
Matcher matcher = incompletePass.matcher(line);
if (matcher.find()) {
context.write(new Text(matcher.group(1) + "-" +
matcher.group(2)), new PassWritable(1,Integer.parseInt(matcher.group(3))));
7
The Hive Story
Enter the Query
Queryable Data
8
Give me every run
play by New Orleans in
the 2010 season
From the Data: Fourth Downs
9
15% of 4th down
plays weren't kicks
Play by Play Pieces
10
(2:48) C.Kaepernick
pass short right to
M.Crabtree to SF
25 for 1 yard
(C.Tillman). Caught
at SF 25. 0-yds YAC
From the Data: Sacks
11
QB sacks and
scrambles
double on 3rd downs
Hive
• Abstraction on top of
MapReduce
• Allows queries using a SQL-like
language
12
Hive Query
13
Give me every run by
New Orleans in the
2010 season:
SELECT * FROM
playbyplay WHERE
playtype = "RUN"
and year = 2010
and game like
"%NO%";
From the Data: Yards to Go
14
With 1 yard to go, 65%
of plays are runs
15
Lost in data
Algorithm Alone
Data Janitorial
16
From the Data: Number of Plays By Yard Line
17
Direction of Offense
Stadium
18
Figuring Out Stadium
19
20121119_CHI@SF
Date Played Away Team Home Team
From the Data: Stadium Attendance
20
Stadiums with the smallest
capacities average the best
scores 20.55-17.79
Stadium Data
21
Stadium The capacity of the stadium
Expanded Capacity The expanded capacity of the stadium
Location The location of the stadium
Playing Surface The type of grass, etc that the stadium has
Is Artificial Is the playing surface artificial
Team The name of the team that plays at the stadium
Roof Type The type of roof in the stadium (None, Retractable, Dome)
Elevation The elevation of the stadium
From the Data: Stadium Elevation
22
There is a 1%
increase in passes at
Mile High versus sea
level stadiums
Weather
23
1,015 games had weather
From the Data: Fumble
24
Games with weather
have a fumble 93%
of the time
compared to 56%
without
Weather Data
25
STATION Station identifier
STATION NAME Station location name
READING DATE Date of reading
PRCP Precipitation
AWND Average daily wind speed
WV20 Fog, ice fog, or freezing fog (may include heavy fog)
TMAX
Maximum temperature
TMIN Minimum temperature
From the Data: Home Field Advantage
26
Baltimore has the
biggest weather
advantage 22-14
Arrests
27
Arrest Data
28
Season Player Arrested in (February to February)
Team Team person played on
Player Name of player Arrested
Player Arrested Was a player in the play arrested that season
Offense Player Arrested Offense had player arrested in season
Defense Player Arrested Defense had player arrested in season
Home Team Player Arrested Home Team had player arrested in season
Away Team Player Arrested Away Team had player arrested in season
Whenever there are
arrests either in the
home team, away team
or both, the home team
29
From 2002 to 2012, each
team had many arrests.
From to a low in 2002 of
56% to a
HIGH OFWINS
Arrest = Win?
30
31
32
The Low Downs
• /me - http://www.jesse-anderson.com
• @jessetanderson
• Code - https://github.com/eljefe6a/nfldata
*I am not in any way affiliated with the NFL or any Team
33
From the Data: Weather
34
Wind had the most effect on
games
At calm winds 41% pass and
37% run
At >30 MPH 34% pass and 46%
run
From the Data: Field Goals
35
Weather only increases
misses by %1
14% of Field Goals are
missed
21% of Field Goals are
missed 30-39 MPH
average winds

Más contenido relacionado

Destacado (20)

Halloweeen
HalloweeenHalloweeen
Halloweeen
 
Historia...cena
Historia...cenaHistoria...cena
Historia...cena
 
Trabajo Práctico nº7
Trabajo Práctico nº7Trabajo Práctico nº7
Trabajo Práctico nº7
 
Desarrollo de las pags 7 y 8
Desarrollo de las pags 7 y 8Desarrollo de las pags 7 y 8
Desarrollo de las pags 7 y 8
 
Prof. cuervo hp
Prof. cuervo hpProf. cuervo hp
Prof. cuervo hp
 
Jjjj
JjjjJjjj
Jjjj
 
Journalism, Blogging and the Real Time Web
Journalism, Blogging and the Real Time WebJournalism, Blogging and the Real Time Web
Journalism, Blogging and the Real Time Web
 
Aporte individual wiki 2 oviedo_ricardo
Aporte individual wiki 2 oviedo_ricardoAporte individual wiki 2 oviedo_ricardo
Aporte individual wiki 2 oviedo_ricardo
 
Fujitsu France IT Future 2013 : Evolution du Data Center adapter la productio...
Fujitsu France IT Future 2013 : Evolution du Data Center adapter la productio...Fujitsu France IT Future 2013 : Evolution du Data Center adapter la productio...
Fujitsu France IT Future 2013 : Evolution du Data Center adapter la productio...
 
Medios de transmisión 1
Medios de transmisión 1Medios de transmisión 1
Medios de transmisión 1
 
Domotica
DomoticaDomotica
Domotica
 
Museos virtuales
Museos virtualesMuseos virtuales
Museos virtuales
 
Tema 6 mates denisa
Tema 6 mates denisaTema 6 mates denisa
Tema 6 mates denisa
 
Propiedades de la potenciación
Propiedades de la potenciaciónPropiedades de la potenciación
Propiedades de la potenciación
 
Innovación
InnovaciónInnovación
Innovación
 
Material didáctico
Material didácticoMaterial didáctico
Material didáctico
 
Necesito Un Abrazo.
Necesito Un Abrazo.Necesito Un Abrazo.
Necesito Un Abrazo.
 
Plaza merca
Plaza mercaPlaza merca
Plaza merca
 
Actividad Final Nº 2 - Didactica Universitaria - Unida
Actividad Final Nº 2 - Didactica Universitaria - UnidaActividad Final Nº 2 - Didactica Universitaria - Unida
Actividad Final Nº 2 - Didactica Universitaria - Unida
 
NECESITA UN CRM O GESTOR DE CORREO
NECESITA UN CRM O GESTOR DE CORREONECESITA UN CRM O GESTOR DE CORREO
NECESITA UN CRM O GESTOR DE CORREO
 

Más de OSCON Byrum

OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom FifieldOSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom FifieldOSCON Byrum
 
Protecting Open Innovation with the Defensive Patent License
Protecting Open Innovation with the Defensive Patent LicenseProtecting Open Innovation with the Defensive Patent License
Protecting Open Innovation with the Defensive Patent LicenseOSCON Byrum
 
Using Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataUsing Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataOSCON Byrum
 
Finite State Machines - Why the fear?
Finite State Machines - Why the fear?Finite State Machines - Why the fear?
Finite State Machines - Why the fear?OSCON Byrum
 
Open Source Automotive Development
Open Source Automotive DevelopmentOpen Source Automotive Development
Open Source Automotive DevelopmentOSCON Byrum
 
How we built our community using Github - Uri Cohen
How we built our community using Github - Uri CohenHow we built our community using Github - Uri Cohen
How we built our community using Github - Uri CohenOSCON Byrum
 
The Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in PythonThe Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in PythonOSCON Byrum
 
Distributed Coordination with Python
Distributed Coordination with PythonDistributed Coordination with Python
Distributed Coordination with PythonOSCON Byrum
 
An overview of open source in East Asia (China, Japan, Korea)
An overview of open source in East Asia (China, Japan, Korea)An overview of open source in East Asia (China, Japan, Korea)
An overview of open source in East Asia (China, Japan, Korea)OSCON Byrum
 
US Patriot Act OSCON2012 David Mertz
US Patriot Act OSCON2012 David MertzUS Patriot Act OSCON2012 David Mertz
US Patriot Act OSCON2012 David MertzOSCON Byrum
 
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...OSCON Byrum
 
Big Data for each one of us
Big Data for each one of usBig Data for each one of us
Big Data for each one of usOSCON Byrum
 
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking OSCON Byrum
 
Declarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScriptDeclarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScriptOSCON Byrum
 
Using and Building Open Source in Google Corporate Engineering - Justin McWil...
Using and Building Open Source in Google Corporate Engineering - Justin McWil...Using and Building Open Source in Google Corporate Engineering - Justin McWil...
Using and Building Open Source in Google Corporate Engineering - Justin McWil...OSCON Byrum
 
A Look at the Network: Searching for Truth in Distributed Applications
A Look at the Network: Searching for Truth in Distributed ApplicationsA Look at the Network: Searching for Truth in Distributed Applications
A Look at the Network: Searching for Truth in Distributed ApplicationsOSCON Byrum
 
Life After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data CloudLife After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data CloudOSCON Byrum
 
Faster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypesFaster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypesOSCON Byrum
 
Comparing open source private cloud platforms
Comparing open source private cloud platformsComparing open source private cloud platforms
Comparing open source private cloud platformsOSCON Byrum
 
State of the Art Web Mapping with Open Source
State of the Art Web Mapping with Open SourceState of the Art Web Mapping with Open Source
State of the Art Web Mapping with Open SourceOSCON Byrum
 

Más de OSCON Byrum (20)

OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom FifieldOSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
 
Protecting Open Innovation with the Defensive Patent License
Protecting Open Innovation with the Defensive Patent LicenseProtecting Open Innovation with the Defensive Patent License
Protecting Open Innovation with the Defensive Patent License
 
Using Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataUsing Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open Data
 
Finite State Machines - Why the fear?
Finite State Machines - Why the fear?Finite State Machines - Why the fear?
Finite State Machines - Why the fear?
 
Open Source Automotive Development
Open Source Automotive DevelopmentOpen Source Automotive Development
Open Source Automotive Development
 
How we built our community using Github - Uri Cohen
How we built our community using Github - Uri CohenHow we built our community using Github - Uri Cohen
How we built our community using Github - Uri Cohen
 
The Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in PythonThe Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in Python
 
Distributed Coordination with Python
Distributed Coordination with PythonDistributed Coordination with Python
Distributed Coordination with Python
 
An overview of open source in East Asia (China, Japan, Korea)
An overview of open source in East Asia (China, Japan, Korea)An overview of open source in East Asia (China, Japan, Korea)
An overview of open source in East Asia (China, Japan, Korea)
 
US Patriot Act OSCON2012 David Mertz
US Patriot Act OSCON2012 David MertzUS Patriot Act OSCON2012 David Mertz
US Patriot Act OSCON2012 David Mertz
 
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
 
Big Data for each one of us
Big Data for each one of usBig Data for each one of us
Big Data for each one of us
 
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
 
Declarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScriptDeclarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScript
 
Using and Building Open Source in Google Corporate Engineering - Justin McWil...
Using and Building Open Source in Google Corporate Engineering - Justin McWil...Using and Building Open Source in Google Corporate Engineering - Justin McWil...
Using and Building Open Source in Google Corporate Engineering - Justin McWil...
 
A Look at the Network: Searching for Truth in Distributed Applications
A Look at the Network: Searching for Truth in Distributed ApplicationsA Look at the Network: Searching for Truth in Distributed Applications
A Look at the Network: Searching for Truth in Distributed Applications
 
Life After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data CloudLife After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data Cloud
 
Faster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypesFaster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypes
 
Comparing open source private cloud platforms
Comparing open source private cloud platformsComparing open source private cloud platforms
Comparing open source private cloud platforms
 
State of the Art Web Mapping with Open Source
State of the Art Web Mapping with Open SourceState of the Art Web Mapping with Open Source
State of the Art Web Mapping with Open Source
 

Último

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Último (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Doing Data Science on the NFL Play by Play Dataset

  • 1. 1 Headline Goes Here Speaker Name or Subhead Goes Here DO NOT USE PUBLICLY PRIOR TO 10/23/12 Doing Data Science on the NFL Play by Play Dataset Jesse Anderson | Curriculum Developer and Instructor July 2013 v2
  • 2. Plays 2 • Advanced NFL stats released all Play by Play since 2002 season • 2,898 total games • 471,392 plays
  • 3. Full Play Entry 3 20121119_CHI@SF,3,1 7,48,SF,CHI,3,2,76,20, 0,(2:48) C.Kaepernick pass short right to M.Crabtree to SF 25 for 1 yard (C.Tillman). Caught at SF 25. 0-yds YAC,0,3,0,27,7 ,2012
  • 4. Play Description 4 (2:48) C.Kaepernick pass short right to M.Crabtree to SF 25 for 1 yard (C.Tillman). Caught at SF 25. 0-yds YAC
  • 5. There's A Chart for That 5
  • 6. There's A Custom MapReduce Behind That 6 public class IncompletesMapper extends Mapper<LongWritable, Text, Text, PassWritable> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); if (line.contains("incomplete")) { Matcher matcher = incompletePass.matcher(line); if (matcher.find()) { context.write(new Text(matcher.group(1) + "-" + matcher.group(2)), new PassWritable(1,Integer.parseInt(matcher.group(3))));
  • 8. Queryable Data 8 Give me every run play by New Orleans in the 2010 season
  • 9. From the Data: Fourth Downs 9 15% of 4th down plays weren't kicks
  • 10. Play by Play Pieces 10 (2:48) C.Kaepernick pass short right to M.Crabtree to SF 25 for 1 yard (C.Tillman). Caught at SF 25. 0-yds YAC
  • 11. From the Data: Sacks 11 QB sacks and scrambles double on 3rd downs
  • 12. Hive • Abstraction on top of MapReduce • Allows queries using a SQL-like language 12
  • 13. Hive Query 13 Give me every run by New Orleans in the 2010 season: SELECT * FROM playbyplay WHERE playtype = "RUN" and year = 2010 and game like "%NO%";
  • 14. From the Data: Yards to Go 14 With 1 yard to go, 65% of plays are runs
  • 17. From the Data: Number of Plays By Yard Line 17 Direction of Offense
  • 19. Figuring Out Stadium 19 20121119_CHI@SF Date Played Away Team Home Team
  • 20. From the Data: Stadium Attendance 20 Stadiums with the smallest capacities average the best scores 20.55-17.79
  • 21. Stadium Data 21 Stadium The capacity of the stadium Expanded Capacity The expanded capacity of the stadium Location The location of the stadium Playing Surface The type of grass, etc that the stadium has Is Artificial Is the playing surface artificial Team The name of the team that plays at the stadium Roof Type The type of roof in the stadium (None, Retractable, Dome) Elevation The elevation of the stadium
  • 22. From the Data: Stadium Elevation 22 There is a 1% increase in passes at Mile High versus sea level stadiums
  • 24. From the Data: Fumble 24 Games with weather have a fumble 93% of the time compared to 56% without
  • 25. Weather Data 25 STATION Station identifier STATION NAME Station location name READING DATE Date of reading PRCP Precipitation AWND Average daily wind speed WV20 Fog, ice fog, or freezing fog (may include heavy fog) TMAX Maximum temperature TMIN Minimum temperature
  • 26. From the Data: Home Field Advantage 26 Baltimore has the biggest weather advantage 22-14
  • 28. Arrest Data 28 Season Player Arrested in (February to February) Team Team person played on Player Name of player Arrested Player Arrested Was a player in the play arrested that season Offense Player Arrested Offense had player arrested in season Defense Player Arrested Defense had player arrested in season Home Team Player Arrested Home Team had player arrested in season Away Team Player Arrested Away Team had player arrested in season
  • 29. Whenever there are arrests either in the home team, away team or both, the home team 29 From 2002 to 2012, each team had many arrests. From to a low in 2002 of 56% to a HIGH OFWINS Arrest = Win?
  • 30. 30
  • 31. 31
  • 32. 32 The Low Downs • /me - http://www.jesse-anderson.com • @jessetanderson • Code - https://github.com/eljefe6a/nfldata *I am not in any way affiliated with the NFL or any Team
  • 33. 33
  • 34. From the Data: Weather 34 Wind had the most effect on games At calm winds 41% pass and 37% run At >30 MPH 34% pass and 46% run
  • 35. From the Data: Field Goals 35 Weather only increases misses by %1 14% of Field Goals are missed 21% of Field Goals are missed 30-39 MPH average winds

Notas del editor

  1. Extract value and insight.http://www.flickr.com/photos/billlublin/3972999678/sizes/o/
  2. http://www.flickr.com/photos/nathaninsandiego/5159833527/sizes/o/
  3. Unstructured data. Human generated.http://www.flickr.com/photos/nathaninsandiego/5159833527/sizes/o/
  4. Incomplete passes to a receiver averaged over seasons togetherA.Luck to R.WayneG.Ferotte to C.ChambersJ.Freeman to V.JacksonT.Brady to R.MossA.Luck to D.Avery
  5. This break up creates 96 different queryablecolumsnhttp://www.flickr.com/photos/modenadude/6150263821/sizes/o/in/photostream/
  6. 1st downs are 52% runs and 42% pass2nd downs are 45% runs and 49% pass3rd downs are 26% runs and 66% passhttp://www.flickr.com/photos/crackerbunny/3215652008/sizes/l/
  7. Easy for humans to parse data, hard for computers.Natural language processingWhile breaking down the data, we need to know what questions we want to answer.Look back at my commits to see what I&apos;ve added.http://www.flickr.com/photos/nathaninsandiego/5159833527/sizes/o/
  8. http://www.flickr.com/photos/modenadude/6150820962/sizes/o/
  9. This break up creates 96 different queryable columns.Limited to data about playshttp://www.flickr.com/photos/modenadude/6150263821/sizes/o/in/photostream/
  10. 1 yard is 65% runX and 24 has the highest chance of a sack at 4.6%X and 21 has the highest chance of a QB scramble 1.7%X and 10 is about even between pass and run at high 40&apos;shttp://www.flickr.com/photos/crackerbunny/3215652008/sizes/l/
  11. 6% of plays lack weather dataHours spent diagnosing missing or bad dataHours spent downloading datahttp://www.flickr.com/photos/37611179@N00/2295452969/in/photolist-4uQNck-5SRuWS-5WYBDL-677pYM-7cscT7-7vyC7G-7XRk46-84U1Ft-ayVaRS-7ReJrS-dpXi1U-8cTwQ1-7Pq9iE-bEo82F-98LeR5-9Ue2aF-b3vtrz-7YWv62
  12. 100-81: 9%80 - 3%79-50:41%49-21: 28%20-0:18%http://en.wikipedia.org/wiki/File:Acre_over_football_field.svghttp://www.flickr.com/photos/10792703@N07/5753103429/in/photolist-9Loadr-cFWwK5-7EF4kv-d8HppU-aWhuw6-8HBrik-9X7RqK-9XaR7f-e81wbX-89PW2o-8u8GKc-dCM1x1-9bbf31-8Mco3M-ck72kf-bmuLcL-dPUGbG-8HEzxY-bSMizz-92FLxy-7LCu9g-8qcDik-81ASaj-81ASas-81ASam-81ASad-dqfGpZ-9X81MM-ck73Q3-dgnu17-dgnsVy-dgntA5-dgnrba-a85BMW-aBZgcM-beiJi2-boaW1F-7CbZ6C-a9FcCw-8nEGtU-8JwV5X-dAgFZu-doXFTj
  13. Georgia Domehttp://www.flickr.com/photos/ucumari/481430551/sizes/o/
  14. Date of game is important later on
  15. http://www.flickr.com/photos/aneebaba/5154335641/sizes/o/
  16. http://www.flickr.com/photos/aneebaba/5154335641/sizes/o/
  17. http://www.flickr.com/photos/zruda/1807289958/in/photolist-3KGQkG-44bNJx-4js8Cg-4pQ1bg-4sNLUK-4wBzkz-4wFGmh-559J6y-5nxQVm-5qnF14-5r9AyS-5r9AJq-5KGLMR-5KGQNx-5W2oxe-5W2oKZ-5W6Gt9-5W6Gvs-6k6HX8-6k6J2B-6k6Jcn-6kaUuC-6kaUQE-6wffW7-7chpaN-dFfSAs-8RsNT8-9Pzgh1-9PwrNF-812vNy-a6s3Ec-8NFpHL-bpjMZq-bpjRu1-bnv3gS-8qemwV-dFfSuG-aKju4r-9gin1L/http://www.flickr.com/photos/17251027@N00/2190657211/in/photolist-4kzG4V-4qfDjD-5e3UP6-5k4eSa-5m73Pf-5mR3nR-5nSv8u-5qnF14-5rGWN8-5rM4m3-5rM58f-5rMcT7-5rMdB3-5rMeko-5rMeZs-5rMhBN-5rMEqb-5rNvKb-5vrbfb-5zUrSt-5C3LQs-5CcaoK-5Cgq7N-5Cgtko-643317-6433ym-649s84-6EBd5T-6LwGEX-6XnJXg-6Y6D6D-71kkp7-741GVR-741H1z-741H5r-741Hcg-741HfM-741Hja-741Hoa-741HyT-741HBx-741HF6-741HJn-741HMR-741J5p-741J9r-741JdM-741Jiz-741JnM-741Jtv-741JxPhttp://www.flickr.com/photos/kevharb/3124008816/
  18. http://www.flickr.com/photos/keithallison/2310794054/sizes/o/
  19. No direct key between stadium and weather station.The average for weather scoring is 21-18 and without weather is 21-19
  20. Miami has the worst 14-18Pittsburgh has the biggest non-weather advantage 24-14http://www.flickr.com/photos/37611179@N00/2295452969/in/photolist-4uQNck-5SRuWS-5WYBDL-677pYM-7cscT7-7vyC7G-7XRk46-84U1Ft-ayVaRS-7ReJrS-dpXi1U-8cTwQ1-7Pq9iE-bEo82F-98LeR5-9Ue2aF-b3vtrz-7YWv62
  21. Used by permission of Lego Police Force https://www.facebook.com/LegoPD
  22. 2008 was the peak with 29 or 32 teams with an arrest.Commissioner Goodell implemented a personal conduct policy in 2007 for the 2008 season.http://www.thebiglead.com/index.php/2013/07/01/nfl-offseason-arrests-are-up-61-since-roger-goodell-implemented-personal-conduct-policy-in-2007/
  23. Weather not as big as issue.Arrests not a big issueWe need to use data to make decisions.
  24. Learn more at screencast.Use QuickStart VM
  25. http://www.flickr.com/photos/paolo_rosa/5062025369/sizes/o/
  26. http://www.flickr.com/photos/billlublin/3973002210/sizes/o/