SlideShare una empresa de Scribd logo
1 de 47
Data Can Only Dance with Its Music:
Understanding the ecology of public data
Tom Johnson - Inst. for Analytic Journalism
Carli Brosseau - The Oregonian
Andy Lehren - The New York Times
Presentation @ https://goo.gl/pMq5ec
NICAR - March 2017 - Jacksonville, FL 1
Topic(s)
• FOIA strategies
– The data?
– The metadata?
• Separate requests?
• Bundled requests?
• Pros and cons?
• National action plan?
2NICAR - March 2017 - Jacksonville, FL
The Data
♪ ♩ ♫ ♬ ♩♩ ♪ ♩ ♫ ♪
♩ ♪ ♬
♪ ♩ ♫ ♪
3
You can…
• Count it
• Categorize it
• Measure
category
proportions
• Measure space
between
NICAR - March 2017 - Jacksonville, FL
Ecosystem for that data
4NICAR - March 2017 - Jacksonville, FL
Ecosystem for that data
5
♩
♫
♬
♩♩
♪
♫
♫
NICAR - March 2017 - Jacksonville, FL
So……
Do we request only the DATA?
• A statement of the
rationale and laws
• A subject
• Contact information
• Depending on your
request, you may also need
to include an argument for
release and/or supporting
documents and
information
• Request fee waiver.
• Or….
Do we request the METADATA?
• Code schema for entire
file, not just specific fields
• Blank data collection
paper forms (if used)
• Shots of data entry
screen(s)
• Description of software
and versions used to enter
data
• Software training manuals
• All emails related to
training AND THEIR
ATTACHMENT5
6NICAR - March 2017 - Jacksonville, FL
Challenges/problems with metadata requests
• Laws vary – federal, state, local jurisdictions
• Multiple exemptions
• Agencies often/usually not required to…
– create new data or reports
– Produce drafts or “working papers”
• Agencies only deliver PDFs
– Claim equipment can only produce PDFs
– Claim PDFs are used to prevent modification of data
• Data Huggers --
7NICAR - March 2017 - Jacksonville, FL
Request for NM prison medication data
8
Please note: I am NOT seeking
any information related to
individuals in custody.
9
• Actual size of 101 pages on 8.5x11 page
• Third column is “patient name” redacted
by printing out PDF, laying a strip of paper
down the patient’s name column,
rescanning and saving as PDF
• Many pages cockeyed on printout
• Essentially, totally useless. Impossible to
scan and extract data at a reasonable cost
• Why? Passive-aggressive behavior
against these damn journalists?
Incompetence in terms of software mgmt
skills? ¿Quién sabe?
Request for NM prison medication data
• Obviously has problems:
– Can’t extract data, ergo useless
data/document
• OK. Let’s try to find out how the database
works: Both products and process
– PDF surely originated by either a DB program
or entered into Excel or someone filled in a
form
– Can a report be generated with named person
field redacted? Most likely.
• We turn to metadata request
10NICAR - March 2017 - Jacksonville, FL
Refiled request
11
1. Please supply copies of any contracts,
purchase orders or letters of understanding
with vendors related to the purchase,
installation and training of the data base and
spreadsheet program(s) used to account for the
purchases of all inmate medication, pharmacy
costs and back up pharmacy costs used by the
Department of Corrections.
2. Please supply the documents describing
and/or naming the digital files used and
saved when entering the data pertaining to
the costs referenced in #1 above.
Back to the well for metadata
• Why? If we know what program(s) are being
used to enter the data we can possibly
determine what types of reports can be
generated
• Also looking for training protocols for hints
about what the DB administrators or clerks
“should” know
• Asking for emails and their attachments and
calendars of training.
12
1. What are the names and versions of the data base and
spreadsheet program(s) used to account for the purchases of all
inmate medication, pharmacy costs and back up pharmacy costs
used by the Department of Corrections.
2. Please supply copies of any contracts or purchase orders with
vendors related to the installation of the programs referenced in #1
above.
3. Please supply the names of the digital files used and saved when
entering the data pertaining to the costs referenced in #1.
4. Please supply the code sheets (sometimes called the “code
schema” or “meta data”) describing all of the fields and defining the
variables, i.e. headers of the columns and/or rows, used when
entering the data in this record system. (I pledge not to use external
data to attempt to identify specific patients.)
13NICAR - March 2017 - Jacksonville, FL
5. Please supply any flowcharts and instructions describing data
entry, data analysis and production of reports related to these
records along with the coding for any and all variables that are not
specifically included in the metadata, e.g. SPSS logfiles, SQL
programs, key fields specified, etc.
6. Please supply copies of the materials used when training the data-
entry and analytic personnel. (These would typically include ink-on-
paper materials for in-class or self-study, video tutorials or URL links
to such tutorials, PowerPoint presentations, etc.)
7. Please provide copies of all email and scheduling calendars related
to training personnel in the use of the above programs.
8. Please supply copies any reports reflecting your department’s
analysis of the data described in #1 above.
14NICAR - March 2017 - Jacksonville, FL
What we received
• We have unusable PDFs of spreadsheets
• We get copy of contract with what was called
Corrections Corporation of America and
Corizon Health Inc. [Also tied to MHM Services, Inc. or
sometimes, CoreCivic. See also Muckrock: “This is why private prisons
shouldn’t control access to their records”]
15NICAR - March 2017 - Jacksonville, FL
JTJ: NM contract with Centurion
Correctional Health Care
16
SP????
What we received
• We have unusable PDFs of spreadsheets
• We get” Copy of Contract with what was
called Corrections Corporation of
America and Corizon Health Inc. [Also tied to MHM
Services, Inc. or, sometimes. CoreCivic. he Public-Private Partner for
Healthcare. See also “This is why private prisons shouldn’t control access
to their records”]
• Next steps:
– Back to IPRA, but preparing to sue if necessary.
17NICAR - March 2017 - Jacksonville, FL
Carli Brosseau, The Oregonian
• Getting local and state efforts to get
data and its metadata
18NICAR - March 2017 - Jacksonville, FL
Asking for the record layout first -- v. 1
19
Asking for the record layout first – v. 2
20
Asking for the record layout first -- v. 3
21NICAR17 – March 2017 – Jacksonville, Florida
Some takeaways
• Consider the size and professionalism of the
agency.
• Ask to look at the record layouts in person.
• Understand that you will have to ask follow-up
questions no matter what you get.
• Emphasize early and often that the
documentation helps make requests more
efficient.
• Do your legislators understand these issues?
They should.
22NICAR17 – March 2017 – Jacksonville, Florida
Transparency by design -- a dream
1. Data can be exported to a non-proprietary,
open format.
2. This functionality should be built in and not
require programming.
3. The vendor should make it simple for the
public body to redact.
4. The vendor will provide a detailed description
of all tables and fields in the database that will
be a public record, not subject to the
exemptions for trade secrets or copyright
protections.
23NICAR17 – March 2017 – Jacksonville, Florida
FOIA panel
Andy Lehren, The New York Times
NICAR 2017
Jacksonville, Fla.
24NICAR17 – March 2017 – Jacksonville, Florida
• Look for databases of databases
• GIS files
• Even old ones can help you know what is
being collected
• Read the specs of ‘proprietary’ programs. This
can help you learn not only what is collected,
but how easy it may be to export.
• Look for reports. This will tell you what is
collected
25NICAR17 – March 2017 – Jacksonville, Florida
• Look at other places. If another place with
easier access has similar data, you can learn
how things are collected
• Follow the data trail. Local > County >
Regional > State > Federal
• See if you can create your own version.
• Data entry, surveys, reader feedback
26NICAR17 – March 2017 – Jacksonville, Florida
27NICAR17 – March 2017 – Jacksonville, Florida
28NICAR - March 2017 - Jacksonville, FL
29
30
31
32NICAR17 – March 2017 – Jacksonville, Florida
33NICAR17 – March 2017 – Jacksonville, Florida
34NICAR17 – March 2017 – Jacksonville, Florida
35
The Death of Timothy Thomas
36NICAR17 – March 2017 – Jacksonville, Florida
37NICAR17 – March 2017 – Jacksonville, Florida
38NICAR17 – March 2017 – Jacksonville, Florida
Resources:
• FOIA Wiki - foia.wiki/wiki/Main_Page
• David Cuellier’s “Pro Se Power! Acquiring Public Records by
Filing Suit”
• TJ letter to Dept. of Corrections RE medications
• Muckrock: “The site provides a repository of hundreds of thousands of
pages of original government materials, information on how to file
requests, and tools to make the requesting process easier.”
• FOIA.Gov - Explore the FOIA data that makes up an agency’s annual
FOIA report. Search for data from a single agency or compare data from
multiple agencies.
39NICAR17 – March 2017 – Jacksonville, Florida
Discussion
• Building FOIA strategies
• Best practices?
– Laws vary by jurisdiction
– Wordsmithing the request
– File type specifics
– Appeals
– Suits – by institution or individual
40
Data Can Only Dance with Its Music:
Understanding the ecology of public data
Tom Johnson - Inst. for Analytic Journalism
Carli Brosseau - The Oregonian
Andy Lehren - The New York Times
Presentation @ http://tiny.cc/iz4djy
41NICAR - March 2017 - Jacksonville, FL
• EX JOHNSON: I tend to use these slides below
the closing title as a scratch pad, someplace
for possible slides the bring up depending on
the discussion.
42NICAR - March 2017 - Jacksonville, FL
• “It’s held separately by N different
organizations and we can’t join it
up.”
• “It will make people angry and
scared without helping them.”
• “It is technically impossible.”
• “We do not own the data.”
• “The data is just too large to be
published and used.”
• “Our website cannot hold files this
large.”
• “We know the data is wrong.”
• “We know the data is wrong, and
people will tell us where it is
wrong.” 43
• “We know the data is wrong, and
we will waste valuable resources
inputting the corrections people
send us.”
• “People will draw superficial
conclusions from the data without
understanding the wider picture.”
• “People will construct [football]
league tables from it.” [?]
• “It will generate more Freedom of
Information requests.”
• “It might be combined with other
data to identify individuals/
sensitive information.”
• “It will cost too much to put it into
a standard format.”
• “Our IT suppliers will charge us a
fortune to do an ad hoc extract.”
“Data Hugging!”
44
How MuckRock Works
MuckRock helps anyone file, track and share public
records requests, using a mix of software and hands-
on help to make the process as easy and transparent
as possible.
Originally made possible by a grant from the Sunlight
Foundation, the service is now funded primarily by its
users, including journalists, researchers, activists, and
people who just want to better understand what their
government is up to.
www.muckrock.com/
NICAR - March 2017 - Jacksonville, FL
45NICAR - March 2017 - Jacksonville, FL
Value of Getting Vendor Contracts
• Named parties
– Contacts for interviews; insights
• Definitions
– Sometimes similar to the field names in reports,
Excel or TK
• Also see original RFPs for insights into what
the required data might be
46NICAR - March 2017 - Jacksonville, FL
“…helped reveal aspects of the NSA’s
government data dragnets, uncovering the use
of “zombie cookies” and canvas fingerprinting
to secretly acquire user information, and
detailing how companies work with data
brokers to find personal information about
their customers.”
47NICAR - March 2017 - Jacksonville, FL

Más contenido relacionado

La actualidad más candente

Dare To Do Docs
Dare To Do DocsDare To Do Docs
Dare To Do Docs
cmorse
 

La actualidad más candente (14)

A crash course in data for information graphics
A crash course in data for information graphicsA crash course in data for information graphics
A crash course in data for information graphics
 
Uic Summer 2008
Uic Summer 2008Uic Summer 2008
Uic Summer 2008
 
Everything Except Taxes
Everything Except TaxesEverything Except Taxes
Everything Except Taxes
 
Help Getting Public Records by Manuel Torres - Monroe, La., NewsTrain - Oct. ...
Help Getting Public Records by Manuel Torres - Monroe, La., NewsTrain - Oct. ...Help Getting Public Records by Manuel Torres - Monroe, La., NewsTrain - Oct. ...
Help Getting Public Records by Manuel Torres - Monroe, La., NewsTrain - Oct. ...
 
National latina researchers network supercharge your search 2015 webinar
National latina researchers network supercharge your search 2015 webinarNational latina researchers network supercharge your search 2015 webinar
National latina researchers network supercharge your search 2015 webinar
 
Finding and using government and legal resources - Spring 2014
Finding and using government and legal resources - Spring 2014Finding and using government and legal resources - Spring 2014
Finding and using government and legal resources - Spring 2014
 
Open records resources - seven habits of highly effective open-records users ...
Open records resources - seven habits of highly effective open-records users ...Open records resources - seven habits of highly effective open-records users ...
Open records resources - seven habits of highly effective open-records users ...
 
Open Data and Library Services
Open Data and Library Services  Open Data and Library Services
Open Data and Library Services
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BI
 
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...
 
RPI Research in Linked Open Government Systems
RPI Research in Linked Open Government SystemsRPI Research in Linked Open Government Systems
RPI Research in Linked Open Government Systems
 
Dare To Do Docs
Dare To Do DocsDare To Do Docs
Dare To Do Docs
 
The Lock to the Safe Has Been Tampered With: Why FERPA and IRB Aren't Enough
The Lock to the Safe Has Been Tampered With: Why FERPA and IRB Aren't EnoughThe Lock to the Safe Has Been Tampered With: Why FERPA and IRB Aren't Enough
The Lock to the Safe Has Been Tampered With: Why FERPA and IRB Aren't Enough
 
CSIA 360 PAPER #1 CAN WE ENSURE THAT OPEN DATA IS USEFUL AND SECURE? (UMUC)
CSIA 360 PAPER #1 CAN WE ENSURE THAT OPEN DATA IS USEFUL AND SECURE? (UMUC)CSIA 360 PAPER #1 CAN WE ENSURE THAT OPEN DATA IS USEFUL AND SECURE? (UMUC)
CSIA 360 PAPER #1 CAN WE ENSURE THAT OPEN DATA IS USEFUL AND SECURE? (UMUC)
 

Similar a Data can only dance with its music NICAR17

Data For Dummies
Data For DummiesData For Dummies
Data For Dummies
C.A.F.C.A.
 
Final ppt sec.data.coll
Final ppt sec.data.collFinal ppt sec.data.coll
Final ppt sec.data.coll
Ram Sonawane
 
Bmgt 311 chapter_5
Bmgt 311 chapter_5Bmgt 311 chapter_5
Bmgt 311 chapter_5
Chris Lovett
 
Topic 5 ReviewThis topic review is a tool designed to prepare st.docx
Topic 5 ReviewThis topic review is a tool designed to prepare st.docxTopic 5 ReviewThis topic review is a tool designed to prepare st.docx
Topic 5 ReviewThis topic review is a tool designed to prepare st.docx
juliennehar
 
SEEMA KUMARI BPT 4th year secondary data.pptx
SEEMA KUMARI  BPT 4th year secondary data.pptxSEEMA KUMARI  BPT 4th year secondary data.pptx
SEEMA KUMARI BPT 4th year secondary data.pptx
AlkaKumari74
 

Similar a Data can only dance with its music NICAR17 (20)

Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Data For Dummies
Data For DummiesData For Dummies
Data For Dummies
 
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
 
Bmgt 311 chapter_5
Bmgt 311 chapter_5Bmgt 311 chapter_5
Bmgt 311 chapter_5
 
BoyarMiller – What Every Attorney Needs to Know Regarding Document Retention,...
BoyarMiller – What Every Attorney Needs to Know Regarding Document Retention,...BoyarMiller – What Every Attorney Needs to Know Regarding Document Retention,...
BoyarMiller – What Every Attorney Needs to Know Regarding Document Retention,...
 
Final ppt sec.data.coll
Final ppt sec.data.collFinal ppt sec.data.coll
Final ppt sec.data.coll
 
S2
S2S2
S2
 
Five creative search solutions using text analytics
Five creative search solutions using text analyticsFive creative search solutions using text analytics
Five creative search solutions using text analytics
 
Examining the Big Data Frontier
Examining the Big Data FrontierExamining the Big Data Frontier
Examining the Big Data Frontier
 
Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
 
Bmgt 311 chapter_5
Bmgt 311 chapter_5Bmgt 311 chapter_5
Bmgt 311 chapter_5
 
Practical Data Management Plans
Practical Data Management PlansPractical Data Management Plans
Practical Data Management Plans
 
Topic 5 ReviewThis topic review is a tool designed to prepare st.docx
Topic 5 ReviewThis topic review is a tool designed to prepare st.docxTopic 5 ReviewThis topic review is a tool designed to prepare st.docx
Topic 5 ReviewThis topic review is a tool designed to prepare st.docx
 
What Every Attorney Needs to Know
What Every Attorney Needs to KnowWhat Every Attorney Needs to Know
What Every Attorney Needs to Know
 
OECD workshop on measuring the link between public procurement, R&D and innov...
OECD workshop on measuring the link between public procurement, R&D and innov...OECD workshop on measuring the link between public procurement, R&D and innov...
OECD workshop on measuring the link between public procurement, R&D and innov...
 
Citi Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics PresentationCiti Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics Presentation
 
SEEMA KUMARI BPT 4th year secondary data.pptx
SEEMA KUMARI  BPT 4th year secondary data.pptxSEEMA KUMARI  BPT 4th year secondary data.pptx
SEEMA KUMARI BPT 4th year secondary data.pptx
 
Ona 2012
Ona 2012Ona 2012
Ona 2012
 

Más de J T "Tom" Johnson

Gold rushwriterspresentation 2013
Gold rushwriterspresentation 2013Gold rushwriterspresentation 2013
Gold rushwriterspresentation 2013
J T "Tom" Johnson
 
Esp #001-no son los documentos; son los datos-traducido
 Esp #001-no son los documentos; son los datos-traducido Esp #001-no son los documentos; son los datos-traducido
Esp #001-no son los documentos; son los datos-traducido
J T "Tom" Johnson
 
Esp #002-validación de datos en la era digital-traducido
 Esp #002-validación de datos en la era digital-traducido Esp #002-validación de datos en la era digital-traducido
Esp #002-validación de datos en la era digital-traducido
J T "Tom" Johnson
 
Esp #003-open-datamovement-traducido
 Esp #003-open-datamovement-traducido Esp #003-open-datamovement-traducido
Esp #003-open-datamovement-traducido
J T "Tom" Johnson
 
The Global Open Data Movement
The Global Open Data MovementThe Global Open Data Movement
The Global Open Data Movement
J T "Tom" Johnson
 

Más de J T "Tom" Johnson (20)

Doing Journalism in The Digital Age.
Doing Journalism in The Digital Age.  Doing Journalism in The Digital Age.
Doing Journalism in The Digital Age.
 
Death (or Live?) of American Journalism-Part 2
 Death (or Live?) of American Journalism-Part 2 Death (or Live?) of American Journalism-Part 2
Death (or Live?) of American Journalism-Part 2
 
Death (or Live?) of American Journalism-Part 1
 Death (or Live?) of American Journalism-Part 1 Death (or Live?) of American Journalism-Part 1
Death (or Live?) of American Journalism-Part 1
 
Dominican republic journos cir 31 jan 2020
Dominican republic journos   cir 31 jan 2020Dominican republic journos   cir 31 jan 2020
Dominican republic journos cir 31 jan 2020
 
Presentation to Journalists from the Dominican Republic
Presentation to Journalists from the Dominican RepublicPresentation to Journalists from the Dominican Republic
Presentation to Journalists from the Dominican Republic
 
It’s the people’s data presentation april 2015
It’s the people’s data presentation april 2015It’s the people’s data presentation april 2015
It’s the people’s data presentation april 2015
 
Dancing faster in the datasphere
Dancing faster in the datasphereDancing faster in the datasphere
Dancing faster in the datasphere
 
Building Data-centric Media Organizations
Building Data-centric Media OrganizationsBuilding Data-centric Media Organizations
Building Data-centric Media Organizations
 
Gold rushwriterspresentation 2013
Gold rushwriterspresentation 2013Gold rushwriterspresentation 2013
Gold rushwriterspresentation 2013
 
Tom johnson datavalidity-eng-nov21-arbol
Tom johnson datavalidity-eng-nov21-arbolTom johnson datavalidity-eng-nov21-arbol
Tom johnson datavalidity-eng-nov21-arbol
 
Maps and data esri health care 2012
Maps and data   esri health care 2012Maps and data   esri health care 2012
Maps and data esri health care 2012
 
Esp #001-no son los documentos; son los datos-traducido
 Esp #001-no son los documentos; son los datos-traducido Esp #001-no son los documentos; son los datos-traducido
Esp #001-no son los documentos; son los datos-traducido
 
Esp #002-validación de datos en la era digital-traducido
 Esp #002-validación de datos en la era digital-traducido Esp #002-validación de datos en la era digital-traducido
Esp #002-validación de datos en la era digital-traducido
 
Esp #003-open-datamovement-traducido
 Esp #003-open-datamovement-traducido Esp #003-open-datamovement-traducido
Esp #003-open-datamovement-traducido
 
Esp #004-proceso de periodismo en el nuevo datosfera-traducido
 Esp #004-proceso de periodismo en el nuevo datosfera-traducido Esp #004-proceso de periodismo en el nuevo datosfera-traducido
Esp #004-proceso de periodismo en el nuevo datosfera-traducido
 
Data validation in the Digital Age
Data validation in the Digital AgeData validation in the Digital Age
Data validation in the Digital Age
 
The Global Open Data Movement
The Global Open Data MovementThe Global Open Data Movement
The Global Open Data Movement
 
It's the people's data
It's the people's dataIt's the people's data
It's the people's data
 
The s+a3 project: leveraging analytic resources
The s+a3 project: leveraging analytic resourcesThe s+a3 project: leveraging analytic resources
The s+a3 project: leveraging analytic resources
 
It's not the documents; it's the DATA
It's not the documents; it's the DATAIt's not the documents; it's the DATA
It's not the documents; it's the DATA
 

Último

Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 
VIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Último (20)

WORLD DEVELOPMENT REPORT 2024 - Economic Growth in Middle-Income Countries.
WORLD DEVELOPMENT REPORT 2024 - Economic Growth in Middle-Income Countries.WORLD DEVELOPMENT REPORT 2024 - Economic Growth in Middle-Income Countries.
WORLD DEVELOPMENT REPORT 2024 - Economic Growth in Middle-Income Countries.
 
Pimpri Chinchwad ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi R...
Pimpri Chinchwad ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi R...Pimpri Chinchwad ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi R...
Pimpri Chinchwad ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi R...
 
(NEHA) Call Girls Nagpur Call Now 8250077686 Nagpur Escorts 24x7
(NEHA) Call Girls Nagpur Call Now 8250077686 Nagpur Escorts 24x7(NEHA) Call Girls Nagpur Call Now 8250077686 Nagpur Escorts 24x7
(NEHA) Call Girls Nagpur Call Now 8250077686 Nagpur Escorts 24x7
 
VIP Model Call Girls Lohegaon ( Pune ) Call ON 8005736733 Starting From 5K to...
VIP Model Call Girls Lohegaon ( Pune ) Call ON 8005736733 Starting From 5K to...VIP Model Call Girls Lohegaon ( Pune ) Call ON 8005736733 Starting From 5K to...
VIP Model Call Girls Lohegaon ( Pune ) Call ON 8005736733 Starting From 5K to...
 
The NAP process & South-South peer learning
The NAP process & South-South peer learningThe NAP process & South-South peer learning
The NAP process & South-South peer learning
 
Government e Marketplace GeM Presentation
Government e Marketplace GeM PresentationGovernment e Marketplace GeM Presentation
Government e Marketplace GeM Presentation
 
Financing strategies for adaptation. Presentation for CANCC
Financing strategies for adaptation. Presentation for CANCCFinancing strategies for adaptation. Presentation for CANCC
Financing strategies for adaptation. Presentation for CANCC
 
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
 
Coastal Protection Measures in Hulhumale'
Coastal Protection Measures in Hulhumale'Coastal Protection Measures in Hulhumale'
Coastal Protection Measures in Hulhumale'
 
Regional Snapshot Atlanta Aging Trends 2024
Regional Snapshot Atlanta Aging Trends 2024Regional Snapshot Atlanta Aging Trends 2024
Regional Snapshot Atlanta Aging Trends 2024
 
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
 
1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS
 
best call girls in Pune - 450+ Call Girl Cash Payment 8005736733 Neha Thakur
best call girls in Pune - 450+ Call Girl Cash Payment 8005736733 Neha Thakurbest call girls in Pune - 450+ Call Girl Cash Payment 8005736733 Neha Thakur
best call girls in Pune - 450+ Call Girl Cash Payment 8005736733 Neha Thakur
 
Call Girls Chakan Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chakan Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Chakan Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chakan Call Me 7737669865 Budget Friendly No Advance Booking
 
An Atoll Futures Research Institute? Presentation for CANCC
An Atoll Futures Research Institute? Presentation for CANCCAn Atoll Futures Research Institute? Presentation for CANCC
An Atoll Futures Research Institute? Presentation for CANCC
 
The Economic and Organised Crime Office (EOCO) has been advised by the Office...
The Economic and Organised Crime Office (EOCO) has been advised by the Office...The Economic and Organised Crime Office (EOCO) has been advised by the Office...
The Economic and Organised Crime Office (EOCO) has been advised by the Office...
 
VIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 Booking
 
Scaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP processScaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP process
 
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
 
The Most Attractive Pune Call Girls Handewadi Road 8250192130 Will You Miss T...
The Most Attractive Pune Call Girls Handewadi Road 8250192130 Will You Miss T...The Most Attractive Pune Call Girls Handewadi Road 8250192130 Will You Miss T...
The Most Attractive Pune Call Girls Handewadi Road 8250192130 Will You Miss T...
 

Data can only dance with its music NICAR17

  • 1. Data Can Only Dance with Its Music: Understanding the ecology of public data Tom Johnson - Inst. for Analytic Journalism Carli Brosseau - The Oregonian Andy Lehren - The New York Times Presentation @ https://goo.gl/pMq5ec NICAR - March 2017 - Jacksonville, FL 1
  • 2. Topic(s) • FOIA strategies – The data? – The metadata? • Separate requests? • Bundled requests? • Pros and cons? • National action plan? 2NICAR - March 2017 - Jacksonville, FL
  • 3. The Data ♪ ♩ ♫ ♬ ♩♩ ♪ ♩ ♫ ♪ ♩ ♪ ♬ ♪ ♩ ♫ ♪ 3 You can… • Count it • Categorize it • Measure category proportions • Measure space between NICAR - March 2017 - Jacksonville, FL
  • 4. Ecosystem for that data 4NICAR - March 2017 - Jacksonville, FL
  • 5. Ecosystem for that data 5 ♩ ♫ ♬ ♩♩ ♪ ♫ ♫ NICAR - March 2017 - Jacksonville, FL
  • 6. So…… Do we request only the DATA? • A statement of the rationale and laws • A subject • Contact information • Depending on your request, you may also need to include an argument for release and/or supporting documents and information • Request fee waiver. • Or…. Do we request the METADATA? • Code schema for entire file, not just specific fields • Blank data collection paper forms (if used) • Shots of data entry screen(s) • Description of software and versions used to enter data • Software training manuals • All emails related to training AND THEIR ATTACHMENT5 6NICAR - March 2017 - Jacksonville, FL
  • 7. Challenges/problems with metadata requests • Laws vary – federal, state, local jurisdictions • Multiple exemptions • Agencies often/usually not required to… – create new data or reports – Produce drafts or “working papers” • Agencies only deliver PDFs – Claim equipment can only produce PDFs – Claim PDFs are used to prevent modification of data • Data Huggers -- 7NICAR - March 2017 - Jacksonville, FL
  • 8. Request for NM prison medication data 8 Please note: I am NOT seeking any information related to individuals in custody.
  • 9. 9 • Actual size of 101 pages on 8.5x11 page • Third column is “patient name” redacted by printing out PDF, laying a strip of paper down the patient’s name column, rescanning and saving as PDF • Many pages cockeyed on printout • Essentially, totally useless. Impossible to scan and extract data at a reasonable cost • Why? Passive-aggressive behavior against these damn journalists? Incompetence in terms of software mgmt skills? ¿Quién sabe?
  • 10. Request for NM prison medication data • Obviously has problems: – Can’t extract data, ergo useless data/document • OK. Let’s try to find out how the database works: Both products and process – PDF surely originated by either a DB program or entered into Excel or someone filled in a form – Can a report be generated with named person field redacted? Most likely. • We turn to metadata request 10NICAR - March 2017 - Jacksonville, FL
  • 11. Refiled request 11 1. Please supply copies of any contracts, purchase orders or letters of understanding with vendors related to the purchase, installation and training of the data base and spreadsheet program(s) used to account for the purchases of all inmate medication, pharmacy costs and back up pharmacy costs used by the Department of Corrections. 2. Please supply the documents describing and/or naming the digital files used and saved when entering the data pertaining to the costs referenced in #1 above.
  • 12. Back to the well for metadata • Why? If we know what program(s) are being used to enter the data we can possibly determine what types of reports can be generated • Also looking for training protocols for hints about what the DB administrators or clerks “should” know • Asking for emails and their attachments and calendars of training. 12
  • 13. 1. What are the names and versions of the data base and spreadsheet program(s) used to account for the purchases of all inmate medication, pharmacy costs and back up pharmacy costs used by the Department of Corrections. 2. Please supply copies of any contracts or purchase orders with vendors related to the installation of the programs referenced in #1 above. 3. Please supply the names of the digital files used and saved when entering the data pertaining to the costs referenced in #1. 4. Please supply the code sheets (sometimes called the “code schema” or “meta data”) describing all of the fields and defining the variables, i.e. headers of the columns and/or rows, used when entering the data in this record system. (I pledge not to use external data to attempt to identify specific patients.) 13NICAR - March 2017 - Jacksonville, FL
  • 14. 5. Please supply any flowcharts and instructions describing data entry, data analysis and production of reports related to these records along with the coding for any and all variables that are not specifically included in the metadata, e.g. SPSS logfiles, SQL programs, key fields specified, etc. 6. Please supply copies of the materials used when training the data- entry and analytic personnel. (These would typically include ink-on- paper materials for in-class or self-study, video tutorials or URL links to such tutorials, PowerPoint presentations, etc.) 7. Please provide copies of all email and scheduling calendars related to training personnel in the use of the above programs. 8. Please supply copies any reports reflecting your department’s analysis of the data described in #1 above. 14NICAR - March 2017 - Jacksonville, FL
  • 15. What we received • We have unusable PDFs of spreadsheets • We get copy of contract with what was called Corrections Corporation of America and Corizon Health Inc. [Also tied to MHM Services, Inc. or sometimes, CoreCivic. See also Muckrock: “This is why private prisons shouldn’t control access to their records”] 15NICAR - March 2017 - Jacksonville, FL
  • 16. JTJ: NM contract with Centurion Correctional Health Care 16 SP????
  • 17. What we received • We have unusable PDFs of spreadsheets • We get” Copy of Contract with what was called Corrections Corporation of America and Corizon Health Inc. [Also tied to MHM Services, Inc. or, sometimes. CoreCivic. he Public-Private Partner for Healthcare. See also “This is why private prisons shouldn’t control access to their records”] • Next steps: – Back to IPRA, but preparing to sue if necessary. 17NICAR - March 2017 - Jacksonville, FL
  • 18. Carli Brosseau, The Oregonian • Getting local and state efforts to get data and its metadata 18NICAR - March 2017 - Jacksonville, FL
  • 19. Asking for the record layout first -- v. 1 19
  • 20. Asking for the record layout first – v. 2 20
  • 21. Asking for the record layout first -- v. 3 21NICAR17 – March 2017 – Jacksonville, Florida
  • 22. Some takeaways • Consider the size and professionalism of the agency. • Ask to look at the record layouts in person. • Understand that you will have to ask follow-up questions no matter what you get. • Emphasize early and often that the documentation helps make requests more efficient. • Do your legislators understand these issues? They should. 22NICAR17 – March 2017 – Jacksonville, Florida
  • 23. Transparency by design -- a dream 1. Data can be exported to a non-proprietary, open format. 2. This functionality should be built in and not require programming. 3. The vendor should make it simple for the public body to redact. 4. The vendor will provide a detailed description of all tables and fields in the database that will be a public record, not subject to the exemptions for trade secrets or copyright protections. 23NICAR17 – March 2017 – Jacksonville, Florida
  • 24. FOIA panel Andy Lehren, The New York Times NICAR 2017 Jacksonville, Fla. 24NICAR17 – March 2017 – Jacksonville, Florida
  • 25. • Look for databases of databases • GIS files • Even old ones can help you know what is being collected • Read the specs of ‘proprietary’ programs. This can help you learn not only what is collected, but how easy it may be to export. • Look for reports. This will tell you what is collected 25NICAR17 – March 2017 – Jacksonville, Florida
  • 26. • Look at other places. If another place with easier access has similar data, you can learn how things are collected • Follow the data trail. Local > County > Regional > State > Federal • See if you can create your own version. • Data entry, surveys, reader feedback 26NICAR17 – March 2017 – Jacksonville, Florida
  • 27. 27NICAR17 – March 2017 – Jacksonville, Florida
  • 28. 28NICAR - March 2017 - Jacksonville, FL
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. 32NICAR17 – March 2017 – Jacksonville, Florida
  • 33. 33NICAR17 – March 2017 – Jacksonville, Florida
  • 34. 34NICAR17 – March 2017 – Jacksonville, Florida
  • 35. 35
  • 36. The Death of Timothy Thomas 36NICAR17 – March 2017 – Jacksonville, Florida
  • 37. 37NICAR17 – March 2017 – Jacksonville, Florida
  • 38. 38NICAR17 – March 2017 – Jacksonville, Florida
  • 39. Resources: • FOIA Wiki - foia.wiki/wiki/Main_Page • David Cuellier’s “Pro Se Power! Acquiring Public Records by Filing Suit” • TJ letter to Dept. of Corrections RE medications • Muckrock: “The site provides a repository of hundreds of thousands of pages of original government materials, information on how to file requests, and tools to make the requesting process easier.” • FOIA.Gov - Explore the FOIA data that makes up an agency’s annual FOIA report. Search for data from a single agency or compare data from multiple agencies. 39NICAR17 – March 2017 – Jacksonville, Florida
  • 40. Discussion • Building FOIA strategies • Best practices? – Laws vary by jurisdiction – Wordsmithing the request – File type specifics – Appeals – Suits – by institution or individual 40
  • 41. Data Can Only Dance with Its Music: Understanding the ecology of public data Tom Johnson - Inst. for Analytic Journalism Carli Brosseau - The Oregonian Andy Lehren - The New York Times Presentation @ http://tiny.cc/iz4djy 41NICAR - March 2017 - Jacksonville, FL
  • 42. • EX JOHNSON: I tend to use these slides below the closing title as a scratch pad, someplace for possible slides the bring up depending on the discussion. 42NICAR - March 2017 - Jacksonville, FL
  • 43. • “It’s held separately by N different organizations and we can’t join it up.” • “It will make people angry and scared without helping them.” • “It is technically impossible.” • “We do not own the data.” • “The data is just too large to be published and used.” • “Our website cannot hold files this large.” • “We know the data is wrong.” • “We know the data is wrong, and people will tell us where it is wrong.” 43 • “We know the data is wrong, and we will waste valuable resources inputting the corrections people send us.” • “People will draw superficial conclusions from the data without understanding the wider picture.” • “People will construct [football] league tables from it.” [?] • “It will generate more Freedom of Information requests.” • “It might be combined with other data to identify individuals/ sensitive information.” • “It will cost too much to put it into a standard format.” • “Our IT suppliers will charge us a fortune to do an ad hoc extract.” “Data Hugging!”
  • 44. 44 How MuckRock Works MuckRock helps anyone file, track and share public records requests, using a mix of software and hands- on help to make the process as easy and transparent as possible. Originally made possible by a grant from the Sunlight Foundation, the service is now funded primarily by its users, including journalists, researchers, activists, and people who just want to better understand what their government is up to. www.muckrock.com/ NICAR - March 2017 - Jacksonville, FL
  • 45. 45NICAR - March 2017 - Jacksonville, FL
  • 46. Value of Getting Vendor Contracts • Named parties – Contacts for interviews; insights • Definitions – Sometimes similar to the field names in reports, Excel or TK • Also see original RFPs for insights into what the required data might be 46NICAR - March 2017 - Jacksonville, FL
  • 47. “…helped reveal aspects of the NSA’s government data dragnets, uncovering the use of “zombie cookies” and canvas fingerprinting to secretly acquire user information, and detailing how companies work with data brokers to find personal information about their customers.” 47NICAR - March 2017 - Jacksonville, FL

Notas del editor

  1. Data Can Only Dance with Its Music: Understanding the ecology of public data Since the original FOIA legislation 50 years ago, we sought, first, documents and more recently data, the people's data stored as 1s and 0s.  Such requests are often rejected for various reasons including, "Our technology can only produce what you want as PDF files.  Take it or leave it."  Today, however, changes are being implemented recognizing that it is the broad management metadata that provides the context for our FOIA requests.  An important breakthrough was the passage of California legislation in 2016 requiring retention and disclosure of multiple variables related to the collection, management and reporting of data.  See here.  Such legislation is an early indicator of an impending shift in our FOIA process.   Today it is  important to understand not only who is collecting and keeping the data, but how they are doing so.  In this way, if an agency says, "Our technology won't let us give you that data," we can come back with very specific arguments that, "Yes, here is how you can get the data for us in its original, fine-grained file format."  The question that is still to be worked out is, "Do I make, first, a separate request for the metadata and then come back with a specific request for the data (and this is how and why you can get it.)"  or do we bundle the metadata request in the same request for the data?   I suspect that right now, to bundle the requests to all levels of government might provide a too-easy rationale for an agency to reject the request as too burdensome, but we can only encourage people to try both until some pattern emerges. I could do a presentation on this topic. Tom
  2. So let’s say we requested some data, and we got it. But only the “data”. We can do SOME analysis on it. [CLICK] We can COUNT the cases or records We can CATEGORIZE the types We can MEASURE category proportions For certain data types, we might want to MEASURE DISTANCE/SPACE between record. For example, doing network or GIS analysis. We really can’t play the song. We need other information [CLICK]
  3. We need to know the ecosystem in which that “data” lives. In this case, Where the data falls on the treble or bass clef In what key, in THIS arrangement, will the “data” be played? What is the TIME signature? And as we see more of the musical data, will the time and key remain constant? [CLICK]
  4. NOW we can start to play the music. The equivalent of in-depth analysis of data set.
  5. Insert image of 11-page PDFs received Obviously has problems: Can’t extract data OK. Let’s try to find out how the database works PDF surely originated by either a DB program or entered into Excel or someone filled in a form Can a report be generated with named person field redacted?
  6. Insert image of 11-page PDFs received Obviously has problems: Can’t extract data OK. Let’s try to find out how the database works PDF surely originated by either a DB program or entered into Excel or someone filled in a form Can a report be generated with named person field redacted?
  7. Show recast IPRA request, going for metadata Why? If we know what program(s) are being used to enter the data we can possibly determine what type of reports can be generated Also looking for training protocal for hints about what the DB administrators or clerks “should” know Asking for emails to find attachments and possibly calendars of training.
  8. #1 – #2 -- Named parties Contacts for insights and INTERVIEWS Definitions Sometimes similar to the field names in reports, Excel or ??? Also see original RFPs for CLUES into what the required data might be
  9. “7. Please provide copies of all email and scheduling calendars related to training personnel in the use of the above programs.” For an interesting angle on the value of asking for e-mails, see Jansen, Koos. “US Mint Releases New Fort Knox “Audit Documentation”. The First Critical Observations.” https://goo.gl/6iazzs
  10. JTJ: Contract with the NM Corrections and Centurion Correctional Health Care.
  11. The story
  12. Part of the nut
  13. Graphic …
  14. Show video
  15. Free and collaborative resource on the U.S. Freedom of Information Act, 5 U.S.C. § 552, is provided by the Reporters Committee for Freedom of the Press, with contributions from The FOIA Project at TRAC, MuckRock, FOIA Mapper, and users like you. Need information about a particular department, agency, or component? Visit the Agencies Landing Page for FOIA regulations, statistics, record systems, lawsuits, and practical tips. Have a question about FOIA, want to discuss something an agency is doing, or have some news? Visit the FOIA Wiki Forum. Want to get involved in the development of this the FOIA Wiki? See the help wanted categ
  16. Data Can Only Dance with Its Music: Understanding the ecology of public data Since the original FOIA legislation 50 years ago, we sought, first, documents and more recently data, the people's data stored as 1s and 0s.  Such requests are often rejected for various reasons including, "Our technology can only produce what you want as PDF files.  Take it or leave it."  Today, however, changes are being implemented recognizing that it is the broad management metadata that provides the context for our FOIA requests.  An important breakthrough was the passage of California legislation in 2016 requiring retention and disclosure of multiple variables related to the collection, management and reporting of data.  See here.  Such legislation is an early indicator of an impending shift in our FOIA process.   Today it is  important to understand not only who is collecting and keeping the data, but how they are doing so.  In this way, if an agency says, "Our technology won't let us give you that data," we can come back with very specific arguments that, "Yes, here is how you can get the data for us in its original, fine-grained file format."  The question that is still to be worked out is, "Do I make, first, a separate request for the metadata and then come back with a specific request for the data (and this is how and why you can get it.)"  or do we bundle the metadata request in the same request for the data?   I suspect that right now, to bundle the requests to all levels of government might provide a too-easy rationale for an agency to reject the request as too burdensome, but we can only encourage people to try both until some pattern emerges. I could do a presentation on this topic. Tom
  17. Some silly, some tragic, all questionable in democratic societies – European in this case, but typical in North America Source: “Excuses for Data Hugging” by Martin Kliehm, posted on Saturday, July 23rd, 2011 at 9:07 PM. RSS 2.0. Martin Kliehm (@kliehm) is a web developer since 1996. He works as a Senior Frontend Engineer in Frankfurt, Germany.
  18. Free and collaborative resource on the U.S. Freedom of Information Act, 5 U.S.C. § 552, is provided by the Reporters Committee for Freedom of the Press, with contributions from The FOIA Project at TRAC, MuckRock, FOIA Mapper, and users like you. Need information about a particular department, agency, or component? Visit the Agencies Landing Page for FOIA regulations, statistics, record systems, lawsuits, and practical tips. Includes a page for just about every federal agency Have a question about FOIA, want to discuss something an agency is doing, or have some news? Visit the FOIA Wiki Forum. Want to get involved in the development of this the FOIA Wiki? See the help wanted category
  19. See definitions in “Centurion Correctional Healthcare of NM Medical Services 16-770-1300-0097.PDF (Google Drive-IAJ Presentations – NICAR Jacksonville)
  20. Screen shot from email – 20 Oct. 2016