SlideShare una empresa de Scribd logo
1 de 38
What to do when you can’t get all the data
VVOJ 2013
Jennifer LaFleur, CIR
Sometimes data is not in rows and columns
But there are ways to gather data…
Sampling
Building from documents
Physical surveys
Testing
Questionnaires, polls and surveys
Sampling Considerations
What’s the universe?
How will you draw the sample?
How will you get the items, docs or data?
How far will you want to break it down?
What sort of accuracy do you need?
Sampling Considerations
Random sample – Ever item has an equal
chance of being included.
Rand() function in Excel, then sort
SPSS and other statistics programs will pull a
random sample
Sampling Considerations
Random sample – Ever item has an equal
chance of being included.
Rand() function in Excel, then sort
SPSS and other statistics programs will pull a
random sample
Sampling Considerations
Systematic sample – Pulling every Nth record.
Stratified sample – pulling your sample based on
another underlying number – such as population.
Rather than pouring all the records for four counties in
a pot and pulling randomly – you pull a random
sample from each county.

Oversampling – pulling more of a particular group
in order to do further research with that group
It might even help in these situations
White criminals seeking presidential pardons were nearly four times as
likely to succeed as minorities.
ProPublica and The Seattle
Times wanted to study
foreclosure patterns in three
U.S. cities, but much of the
information we needed was
scattered in various paper and
electronic files. So we pulled a
random sample of foreclosure
filings from each city.
The Dallas Morning News
used a random sample of noncapital murder cases to build a
database based on hard-copy
juror questionnaires and
coding of voir dire transcripts.
To examine food safety, the Center for Investigative Reporting in
Bosnia sampled food – literally and had it tested in labs.
To study whether bus drivers in St. Louis were complying with the
ADA, reporters rode a sample of bus routes. The stories prompted
a federal investigation into the transit system.
To provide a look into who we’re driving with during a typical
commute, Dallas Morning News reporters drove the major routes
and recorded the trucks on the road.
Other examples
Counting vehicles in HOV lanes from official
count points
Build a database of murders from news articles
Physically checking work sites, bridges, dams
and just about anything else
What are NOT scientific surveys
Web polls
Radio or TV call-in polls
Man or woman on the street
American Idol
Beyond the basics
Check with experts
Research others’ methodologies
We did a study to audit accessibility – so we worked
with a company who did that to help us develop a
survey to use on a sample of facilities.

Oh, and make sure actual data really doesn’t
exist
SHOULD HAVE USED SAMPLING? Dozens of St. Louis voters are
being wrongly accused of casting ballots from fraudulent addresses in
last year's Nov. 7 election.
Other uses of sampling
To test a theory before diving into a massive
records hunt
To double-check your analysis – you should get
roughly the same results from a sample of your
data
Sensor journalism
Using tests or sensors to gather data
Find the right methodology
Read research reports
Find an existing model
Find an expert
Duplicate or do spot checks
Keep detailed notes so you can explain what you
did
If you’re doing a survey or poll – test run it on a
few folk before full launch
Dealing with documents
Data entry
Data entry firms --- but use verification if you can
afford it
Intern entry – definitely double enter
Mechanical Turk – Amazon service for small – task
data gathering
(http://www.propublica.org/article/propublicas-guide-tomechanical-turk)
Dealing with documents
Scanning and scraping
Need to have good quality documents – be cautious
of documents with lots of tiny numbers
Do physical spot checks of your results
Check totals, counts
Sweden

From
Helena Bengtsson,
SVT

Worlds oldest FOIA law from 1766
Doesn’t support electronic format
What does that mean?
There could be a database that is a public
record – but the authorities can choose to
disclose it as docs
What to do?
Scan and OCR documents
Create your own panel
Crowd sourcing
Surveys
Who gets childcare allowance?
5000 kr ($700) a month to stay home
Children 1-3 years old
Could also be used for nanny and/or
grandparents to stay with kids
Apply with local municipalities
Building our own panel
• TV-show about education
• Created a teachers panel – 900 teachers
900 teachers to represent all teachers in Sweden
Asked for participants on TV and web
Matched that group against known statistics
about teachers
Parameters: Sex, Age, Geography, Grade and
Public/Private school
Checked the group with a question we knew
the answer to
• Drug control at counties and cities
• Surveyed 355 counties and districts – all replied
Other examples
1800 vicars about same sex marriges
Unions about on-going negotiations
21 regions about…
Female representation in publicly owned companies
Public contributions to political parties

290 counties about…
Night time care for old people that still live at home
Will you raise the tax next year?
Alcohol licences for restaurants
Public contributions to local organizations
The candidates
Public information about all 54 673:
Age
Place of residence
Declared income from two years back
Company interest – member of the board
Selfowned company
Bulletproofing your project
Bounce your methodology off experts
Disclose all details of your methodology
Never draw conclusions about a whole
group unless you got all or almost all to
answer –Helena B.

Más contenido relacionado

Similar a Data journalism without data

Reproduced with permission of the copyright owner. Further re.docx
Reproduced with permission of the copyright owner.  Further re.docxReproduced with permission of the copyright owner.  Further re.docx
Reproduced with permission of the copyright owner. Further re.docxaudeleypearl
 
Methodology 2.pptx
Methodology 2.pptxMethodology 2.pptx
Methodology 2.pptxMarcCollazo1
 
Data collection methods
Data collection methodsData collection methods
Data collection methodsashima_sodhi
 
Data collection methods
Data collection methodsData collection methods
Data collection methodsSourabh Modgil
 
Research Data Management
Research  Data ManagementResearch  Data Management
Research Data ManagementMahmoud91Tx
 
Aed1222 lesson 1 and 3
Aed1222 lesson 1 and 3Aed1222 lesson 1 and 3
Aed1222 lesson 1 and 3nurun2010
 
The Art and Science of Survey Research
The Art and Science of Survey ResearchThe Art and Science of Survey Research
The Art and Science of Survey ResearchSiobhan O'Dwyer
 
Preparing Social Science Students for Research: Data Use Beginning Day One
Preparing Social Science Students for Research: Data Use Beginning Day OnePreparing Social Science Students for Research: Data Use Beginning Day One
Preparing Social Science Students for Research: Data Use Beginning Day OneLynette Hoelter
 
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...News Leaders Association's NewsTrain
 
Intro to res meth tarig 2013
Intro to res meth  tarig 2013Intro to res meth  tarig 2013
Intro to res meth tarig 2013mohdmx123
 
Survey research
Survey researchSurvey research
Survey researchshakirhina
 
DOC-20230219-WA0000..pdf
DOC-20230219-WA0000..pdfDOC-20230219-WA0000..pdf
DOC-20230219-WA0000..pdfAkshayaJerry
 
Finding statistics2
Finding statistics2Finding statistics2
Finding statistics2lmk7
 

Similar a Data journalism without data (20)

Reproduced with permission of the copyright owner. Further re.docx
Reproduced with permission of the copyright owner.  Further re.docxReproduced with permission of the copyright owner.  Further re.docx
Reproduced with permission of the copyright owner. Further re.docx
 
Methodology 2.pptx
Methodology 2.pptxMethodology 2.pptx
Methodology 2.pptx
 
Data collection methods
Data collection methodsData collection methods
Data collection methods
 
Data collection methods
Data collection methodsData collection methods
Data collection methods
 
Research Data Management
Research  Data ManagementResearch  Data Management
Research Data Management
 
Managerialstatistics
ManagerialstatisticsManagerialstatistics
Managerialstatistics
 
Aed1222 lesson 1 and 3
Aed1222 lesson 1 and 3Aed1222 lesson 1 and 3
Aed1222 lesson 1 and 3
 
The Art and Science of Survey Research
The Art and Science of Survey ResearchThe Art and Science of Survey Research
The Art and Science of Survey Research
 
Preparing Social Science Students for Research: Data Use Beginning Day One
Preparing Social Science Students for Research: Data Use Beginning Day OnePreparing Social Science Students for Research: Data Use Beginning Day One
Preparing Social Science Students for Research: Data Use Beginning Day One
 
Ona 2012
Ona 2012Ona 2012
Ona 2012
 
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
 
Umhoefer: Data-driven enterprise - handout
Umhoefer: Data-driven enterprise - handoutUmhoefer: Data-driven enterprise - handout
Umhoefer: Data-driven enterprise - handout
 
Intro to res meth tarig 2013
Intro to res meth  tarig 2013Intro to res meth  tarig 2013
Intro to res meth tarig 2013
 
Survey research
Survey researchSurvey research
Survey research
 
DOC-20230219-WA0000..pdf
DOC-20230219-WA0000..pdfDOC-20230219-WA0000..pdf
DOC-20230219-WA0000..pdf
 
SOC2002 Lecture 6
SOC2002 Lecture 6SOC2002 Lecture 6
SOC2002 Lecture 6
 
Gfrp alevel sociology 2020
Gfrp alevel sociology 2020Gfrp alevel sociology 2020
Gfrp alevel sociology 2020
 
Mendelson: Driving daily enterprise coverage
Mendelson: Driving daily enterprise coverageMendelson: Driving daily enterprise coverage
Mendelson: Driving daily enterprise coverage
 
Management Aptitude Test 11 October
Management Aptitude Test 11 OctoberManagement Aptitude Test 11 October
Management Aptitude Test 11 October
 
Finding statistics2
Finding statistics2Finding statistics2
Finding statistics2
 

Más de Jennifer LaFleur

How drawing exercises your brain
How drawing exercises your brainHow drawing exercises your brain
How drawing exercises your brainJennifer LaFleur
 
Investigating Disabiity Issues
Investigating Disabiity IssuesInvestigating Disabiity Issues
Investigating Disabiity IssuesJennifer LaFleur
 
Crunching the numbers NR14
Crunching the numbers NR14Crunching the numbers NR14
Crunching the numbers NR14Jennifer LaFleur
 
Nr14: Ten tips for data journalists
Nr14: Ten tips for data journalistsNr14: Ten tips for data journalists
Nr14: Ten tips for data journalistsJennifer LaFleur
 
Data journalism at Techraking 6
Data journalism at Techraking 6Data journalism at Techraking 6
Data journalism at Techraking 6Jennifer LaFleur
 
Mind the Gap NICAR14 (holes in data)
Mind the Gap NICAR14 (holes in data)Mind the Gap NICAR14 (holes in data)
Mind the Gap NICAR14 (holes in data)Jennifer LaFleur
 
VVOJ Intro to data journalism
VVOJ Intro to data journalismVVOJ Intro to data journalism
VVOJ Intro to data journalismJennifer LaFleur
 
Diagnosing dirty data_ire2013
Diagnosing dirty data_ire2013Diagnosing dirty data_ire2013
Diagnosing dirty data_ire2013Jennifer LaFleur
 

Más de Jennifer LaFleur (13)

How drawing exercises your brain
How drawing exercises your brainHow drawing exercises your brain
How drawing exercises your brain
 
Brain flipping ire17
Brain flipping ire17Brain flipping ire17
Brain flipping ire17
 
Investigating Disabiity Issues
Investigating Disabiity IssuesInvestigating Disabiity Issues
Investigating Disabiity Issues
 
Cats stats
Cats statsCats stats
Cats stats
 
Getting it the rightest
Getting it the rightestGetting it the rightest
Getting it the rightest
 
ACP Getting the Goods
ACP Getting the GoodsACP Getting the Goods
ACP Getting the Goods
 
Crunching the numbers NR14
Crunching the numbers NR14Crunching the numbers NR14
Crunching the numbers NR14
 
Nr14: Ten tips for data journalists
Nr14: Ten tips for data journalistsNr14: Ten tips for data journalists
Nr14: Ten tips for data journalists
 
Data journalism at Techraking 6
Data journalism at Techraking 6Data journalism at Techraking 6
Data journalism at Techraking 6
 
Mind the Gap NICAR14 (holes in data)
Mind the Gap NICAR14 (holes in data)Mind the Gap NICAR14 (holes in data)
Mind the Gap NICAR14 (holes in data)
 
VVOJ Intro to data journalism
VVOJ Intro to data journalismVVOJ Intro to data journalism
VVOJ Intro to data journalism
 
Diagnosing dirty data_ire2013
Diagnosing dirty data_ire2013Diagnosing dirty data_ire2013
Diagnosing dirty data_ire2013
 
Transparency ire13
Transparency ire13Transparency ire13
Transparency ire13
 

Último

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Data journalism without data

  • 1. What to do when you can’t get all the data VVOJ 2013 Jennifer LaFleur, CIR
  • 2. Sometimes data is not in rows and columns
  • 3. But there are ways to gather data… Sampling Building from documents Physical surveys Testing Questionnaires, polls and surveys
  • 4. Sampling Considerations What’s the universe? How will you draw the sample? How will you get the items, docs or data? How far will you want to break it down? What sort of accuracy do you need?
  • 5. Sampling Considerations Random sample – Ever item has an equal chance of being included. Rand() function in Excel, then sort SPSS and other statistics programs will pull a random sample
  • 6. Sampling Considerations Random sample – Ever item has an equal chance of being included. Rand() function in Excel, then sort SPSS and other statistics programs will pull a random sample
  • 7.
  • 8. Sampling Considerations Systematic sample – Pulling every Nth record. Stratified sample – pulling your sample based on another underlying number – such as population. Rather than pouring all the records for four counties in a pot and pulling randomly – you pull a random sample from each county. Oversampling – pulling more of a particular group in order to do further research with that group
  • 9. It might even help in these situations
  • 10. White criminals seeking presidential pardons were nearly four times as likely to succeed as minorities.
  • 11. ProPublica and The Seattle Times wanted to study foreclosure patterns in three U.S. cities, but much of the information we needed was scattered in various paper and electronic files. So we pulled a random sample of foreclosure filings from each city.
  • 12. The Dallas Morning News used a random sample of noncapital murder cases to build a database based on hard-copy juror questionnaires and coding of voir dire transcripts.
  • 13. To examine food safety, the Center for Investigative Reporting in Bosnia sampled food – literally and had it tested in labs.
  • 14. To study whether bus drivers in St. Louis were complying with the ADA, reporters rode a sample of bus routes. The stories prompted a federal investigation into the transit system.
  • 15. To provide a look into who we’re driving with during a typical commute, Dallas Morning News reporters drove the major routes and recorded the trucks on the road.
  • 16. Other examples Counting vehicles in HOV lanes from official count points Build a database of murders from news articles Physically checking work sites, bridges, dams and just about anything else
  • 17. What are NOT scientific surveys Web polls Radio or TV call-in polls Man or woman on the street American Idol
  • 18. Beyond the basics Check with experts Research others’ methodologies We did a study to audit accessibility – so we worked with a company who did that to help us develop a survey to use on a sample of facilities. Oh, and make sure actual data really doesn’t exist
  • 19. SHOULD HAVE USED SAMPLING? Dozens of St. Louis voters are being wrongly accused of casting ballots from fraudulent addresses in last year's Nov. 7 election.
  • 20. Other uses of sampling To test a theory before diving into a massive records hunt To double-check your analysis – you should get roughly the same results from a sample of your data
  • 21. Sensor journalism Using tests or sensors to gather data
  • 22.
  • 23.
  • 24.
  • 25. Find the right methodology Read research reports Find an existing model Find an expert Duplicate or do spot checks Keep detailed notes so you can explain what you did If you’re doing a survey or poll – test run it on a few folk before full launch
  • 26. Dealing with documents Data entry Data entry firms --- but use verification if you can afford it Intern entry – definitely double enter Mechanical Turk – Amazon service for small – task data gathering
  • 28. Dealing with documents Scanning and scraping Need to have good quality documents – be cautious of documents with lots of tiny numbers Do physical spot checks of your results Check totals, counts
  • 29. Sweden From Helena Bengtsson, SVT Worlds oldest FOIA law from 1766 Doesn’t support electronic format What does that mean? There could be a database that is a public record – but the authorities can choose to disclose it as docs
  • 30. What to do? Scan and OCR documents Create your own panel Crowd sourcing Surveys
  • 31. Who gets childcare allowance? 5000 kr ($700) a month to stay home Children 1-3 years old Could also be used for nanny and/or grandparents to stay with kids Apply with local municipalities
  • 32. Building our own panel • TV-show about education • Created a teachers panel – 900 teachers
  • 33. 900 teachers to represent all teachers in Sweden Asked for participants on TV and web Matched that group against known statistics about teachers Parameters: Sex, Age, Geography, Grade and Public/Private school Checked the group with a question we knew the answer to
  • 34. • Drug control at counties and cities • Surveyed 355 counties and districts – all replied
  • 35. Other examples 1800 vicars about same sex marriges Unions about on-going negotiations 21 regions about… Female representation in publicly owned companies Public contributions to political parties 290 counties about… Night time care for old people that still live at home Will you raise the tax next year? Alcohol licences for restaurants Public contributions to local organizations
  • 36. The candidates Public information about all 54 673: Age Place of residence Declared income from two years back Company interest – member of the board Selfowned company
  • 37.
  • 38. Bulletproofing your project Bounce your methodology off experts Disclose all details of your methodology Never draw conclusions about a whole group unless you got all or almost all to answer –Helena B.