The Most Attractive Pune Call Girls Handewadi Road 8250192130 Will You Miss T...
Data can only dance with its music NICAR17
1. Data Can Only Dance with Its Music:
Understanding the ecology of public data
Tom Johnson - Inst. for Analytic Journalism
Carli Brosseau - The Oregonian
Andy Lehren - The New York Times
Presentation @ https://goo.gl/pMq5ec
NICAR - March 2017 - Jacksonville, FL 1
2. Topic(s)
• FOIA strategies
– The data?
– The metadata?
• Separate requests?
• Bundled requests?
• Pros and cons?
• National action plan?
2NICAR - March 2017 - Jacksonville, FL
3. The Data
♪ ♩ ♫ ♬ ♩♩ ♪ ♩ ♫ ♪
♩ ♪ ♬
♪ ♩ ♫ ♪
3
You can…
• Count it
• Categorize it
• Measure
category
proportions
• Measure space
between
NICAR - March 2017 - Jacksonville, FL
5. Ecosystem for that data
5
♩
♫
♬
♩♩
♪
♫
♫
NICAR - March 2017 - Jacksonville, FL
6. So……
Do we request only the DATA?
• A statement of the
rationale and laws
• A subject
• Contact information
• Depending on your
request, you may also need
to include an argument for
release and/or supporting
documents and
information
• Request fee waiver.
• Or….
Do we request the METADATA?
• Code schema for entire
file, not just specific fields
• Blank data collection
paper forms (if used)
• Shots of data entry
screen(s)
• Description of software
and versions used to enter
data
• Software training manuals
• All emails related to
training AND THEIR
ATTACHMENT5
6NICAR - March 2017 - Jacksonville, FL
7. Challenges/problems with metadata requests
• Laws vary – federal, state, local jurisdictions
• Multiple exemptions
• Agencies often/usually not required to…
– create new data or reports
– Produce drafts or “working papers”
• Agencies only deliver PDFs
– Claim equipment can only produce PDFs
– Claim PDFs are used to prevent modification of data
• Data Huggers --
7NICAR - March 2017 - Jacksonville, FL
8. Request for NM prison medication data
8
Please note: I am NOT seeking
any information related to
individuals in custody.
9. 9
• Actual size of 101 pages on 8.5x11 page
• Third column is “patient name” redacted
by printing out PDF, laying a strip of paper
down the patient’s name column,
rescanning and saving as PDF
• Many pages cockeyed on printout
• Essentially, totally useless. Impossible to
scan and extract data at a reasonable cost
• Why? Passive-aggressive behavior
against these damn journalists?
Incompetence in terms of software mgmt
skills? ¿Quién sabe?
10. Request for NM prison medication data
• Obviously has problems:
– Can’t extract data, ergo useless
data/document
• OK. Let’s try to find out how the database
works: Both products and process
– PDF surely originated by either a DB program
or entered into Excel or someone filled in a
form
– Can a report be generated with named person
field redacted? Most likely.
• We turn to metadata request
10NICAR - March 2017 - Jacksonville, FL
11. Refiled request
11
1. Please supply copies of any contracts,
purchase orders or letters of understanding
with vendors related to the purchase,
installation and training of the data base and
spreadsheet program(s) used to account for the
purchases of all inmate medication, pharmacy
costs and back up pharmacy costs used by the
Department of Corrections.
2. Please supply the documents describing
and/or naming the digital files used and
saved when entering the data pertaining to
the costs referenced in #1 above.
12. Back to the well for metadata
• Why? If we know what program(s) are being
used to enter the data we can possibly
determine what types of reports can be
generated
• Also looking for training protocols for hints
about what the DB administrators or clerks
“should” know
• Asking for emails and their attachments and
calendars of training.
12
13. 1. What are the names and versions of the data base and
spreadsheet program(s) used to account for the purchases of all
inmate medication, pharmacy costs and back up pharmacy costs
used by the Department of Corrections.
2. Please supply copies of any contracts or purchase orders with
vendors related to the installation of the programs referenced in #1
above.
3. Please supply the names of the digital files used and saved when
entering the data pertaining to the costs referenced in #1.
4. Please supply the code sheets (sometimes called the “code
schema” or “meta data”) describing all of the fields and defining the
variables, i.e. headers of the columns and/or rows, used when
entering the data in this record system. (I pledge not to use external
data to attempt to identify specific patients.)
13NICAR - March 2017 - Jacksonville, FL
14. 5. Please supply any flowcharts and instructions describing data
entry, data analysis and production of reports related to these
records along with the coding for any and all variables that are not
specifically included in the metadata, e.g. SPSS logfiles, SQL
programs, key fields specified, etc.
6. Please supply copies of the materials used when training the data-
entry and analytic personnel. (These would typically include ink-on-
paper materials for in-class or self-study, video tutorials or URL links
to such tutorials, PowerPoint presentations, etc.)
7. Please provide copies of all email and scheduling calendars related
to training personnel in the use of the above programs.
8. Please supply copies any reports reflecting your department’s
analysis of the data described in #1 above.
14NICAR - March 2017 - Jacksonville, FL
15. What we received
• We have unusable PDFs of spreadsheets
• We get copy of contract with what was called
Corrections Corporation of America and
Corizon Health Inc. [Also tied to MHM Services, Inc. or
sometimes, CoreCivic. See also Muckrock: “This is why private prisons
shouldn’t control access to their records”]
15NICAR - March 2017 - Jacksonville, FL
16. JTJ: NM contract with Centurion
Correctional Health Care
16
SP????
17. What we received
• We have unusable PDFs of spreadsheets
• We get” Copy of Contract with what was
called Corrections Corporation of
America and Corizon Health Inc. [Also tied to MHM
Services, Inc. or, sometimes. CoreCivic. he Public-Private Partner for
Healthcare. See also “This is why private prisons shouldn’t control access
to their records”]
• Next steps:
– Back to IPRA, but preparing to sue if necessary.
17NICAR - March 2017 - Jacksonville, FL
18. Carli Brosseau, The Oregonian
• Getting local and state efforts to get
data and its metadata
18NICAR - March 2017 - Jacksonville, FL
21. Asking for the record layout first -- v. 3
21NICAR17 – March 2017 – Jacksonville, Florida
22. Some takeaways
• Consider the size and professionalism of the
agency.
• Ask to look at the record layouts in person.
• Understand that you will have to ask follow-up
questions no matter what you get.
• Emphasize early and often that the
documentation helps make requests more
efficient.
• Do your legislators understand these issues?
They should.
22NICAR17 – March 2017 – Jacksonville, Florida
23. Transparency by design -- a dream
1. Data can be exported to a non-proprietary,
open format.
2. This functionality should be built in and not
require programming.
3. The vendor should make it simple for the
public body to redact.
4. The vendor will provide a detailed description
of all tables and fields in the database that will
be a public record, not subject to the
exemptions for trade secrets or copyright
protections.
23NICAR17 – March 2017 – Jacksonville, Florida
24. FOIA panel
Andy Lehren, The New York Times
NICAR 2017
Jacksonville, Fla.
24NICAR17 – March 2017 – Jacksonville, Florida
25. • Look for databases of databases
• GIS files
• Even old ones can help you know what is
being collected
• Read the specs of ‘proprietary’ programs. This
can help you learn not only what is collected,
but how easy it may be to export.
• Look for reports. This will tell you what is
collected
25NICAR17 – March 2017 – Jacksonville, Florida
26. • Look at other places. If another place with
easier access has similar data, you can learn
how things are collected
• Follow the data trail. Local > County >
Regional > State > Federal
• See if you can create your own version.
• Data entry, surveys, reader feedback
26NICAR17 – March 2017 – Jacksonville, Florida
39. Resources:
• FOIA Wiki - foia.wiki/wiki/Main_Page
• David Cuellier’s “Pro Se Power! Acquiring Public Records by
Filing Suit”
• TJ letter to Dept. of Corrections RE medications
• Muckrock: “The site provides a repository of hundreds of thousands of
pages of original government materials, information on how to file
requests, and tools to make the requesting process easier.”
• FOIA.Gov - Explore the FOIA data that makes up an agency’s annual
FOIA report. Search for data from a single agency or compare data from
multiple agencies.
39NICAR17 – March 2017 – Jacksonville, Florida
40. Discussion
• Building FOIA strategies
• Best practices?
– Laws vary by jurisdiction
– Wordsmithing the request
– File type specifics
– Appeals
– Suits – by institution or individual
40
41. Data Can Only Dance with Its Music:
Understanding the ecology of public data
Tom Johnson - Inst. for Analytic Journalism
Carli Brosseau - The Oregonian
Andy Lehren - The New York Times
Presentation @ http://tiny.cc/iz4djy
41NICAR - March 2017 - Jacksonville, FL
42. • EX JOHNSON: I tend to use these slides below
the closing title as a scratch pad, someplace
for possible slides the bring up depending on
the discussion.
42NICAR - March 2017 - Jacksonville, FL
43. • “It’s held separately by N different
organizations and we can’t join it
up.”
• “It will make people angry and
scared without helping them.”
• “It is technically impossible.”
• “We do not own the data.”
• “The data is just too large to be
published and used.”
• “Our website cannot hold files this
large.”
• “We know the data is wrong.”
• “We know the data is wrong, and
people will tell us where it is
wrong.” 43
• “We know the data is wrong, and
we will waste valuable resources
inputting the corrections people
send us.”
• “People will draw superficial
conclusions from the data without
understanding the wider picture.”
• “People will construct [football]
league tables from it.” [?]
• “It will generate more Freedom of
Information requests.”
• “It might be combined with other
data to identify individuals/
sensitive information.”
• “It will cost too much to put it into
a standard format.”
• “Our IT suppliers will charge us a
fortune to do an ad hoc extract.”
“Data Hugging!”
44. 44
How MuckRock Works
MuckRock helps anyone file, track and share public
records requests, using a mix of software and hands-
on help to make the process as easy and transparent
as possible.
Originally made possible by a grant from the Sunlight
Foundation, the service is now funded primarily by its
users, including journalists, researchers, activists, and
people who just want to better understand what their
government is up to.
www.muckrock.com/
NICAR - March 2017 - Jacksonville, FL
46. Value of Getting Vendor Contracts
• Named parties
– Contacts for interviews; insights
• Definitions
– Sometimes similar to the field names in reports,
Excel or TK
• Also see original RFPs for insights into what
the required data might be
46NICAR - March 2017 - Jacksonville, FL
47. “…helped reveal aspects of the NSA’s
government data dragnets, uncovering the use
of “zombie cookies” and canvas fingerprinting
to secretly acquire user information, and
detailing how companies work with data
brokers to find personal information about
their customers.”
47NICAR - March 2017 - Jacksonville, FL
Notas del editor
Data Can Only Dance with Its Music: Understanding the ecology of public dataSince the original FOIA legislation 50 years ago, we sought, first, documents and more recently data, the people's data stored as 1s and 0s. Such requests are often rejected for various reasons including, "Our technology can only produce what you want as PDF files. Take it or leave it." Today, however, changes are being implemented recognizing that it is the broad management metadata that provides the context for our FOIA requests. An important breakthrough was the passage of California legislation in 2016 requiring retention and disclosure of multiple variables related to the collection, management and reporting of data. See here. Such legislation is an early indicator of an impending shift in our FOIA process. Today it is important to understand not only who is collecting and keeping the data, but how they are doing so. In this way, if an agency says, "Our technology won't let us give you that data," we can come back with very specific arguments that, "Yes, here is how you can get the data for us in its original, fine-grained file format." The question that is still to be worked out is, "Do I make, first, a separate request for the metadata and then come back with a specific request for the data (and this is how and why you can get it.)" or do we bundle the metadata request in the same request for the data? I suspect that right now, to bundle the requests to all levels of government might provide a too-easy rationale for an agency to reject the request as too burdensome, but we can only encourage people to try both until some pattern emerges.
I could do a presentation on this topic.
Tom
So let’s say we requested some data, and we got it. But only the “data”.
We can do SOME analysis on it. [CLICK]
We can COUNT the cases or records
We can CATEGORIZE the types
We can MEASURE category proportions
For certain data types, we might want to MEASURE DISTANCE/SPACE between record. For example, doing network or GIS analysis.
We really can’t play the song. We need other information [CLICK]
We need to know the ecosystem in which that “data” lives. In this case,
Where the data falls on the treble or bass clef
In what key, in THIS arrangement, will the “data” be played?
What is the TIME signature? And as we see more of the musical data, will the time and key remain constant? [CLICK]
NOW we can start to play the music. The equivalent of in-depth analysis of data set.
Insert image of 11-page PDFs received
Obviously has problems:
Can’t extract data
OK. Let’s try to find out how the database works
PDF surely originated by either a DB program or entered into Excel or someone filled in a form
Can a report be generated with named person field redacted?
Insert image of 11-page PDFs received
Obviously has problems:
Can’t extract data
OK. Let’s try to find out how the database works
PDF surely originated by either a DB program or entered into Excel or someone filled in a form
Can a report be generated with named person field redacted?
Show recast IPRA request, going for metadata
Why? If we know what program(s) are being used to enter the data we can possibly determine what type of reports can be generated
Also looking for training protocal for hints about what the DB administrators or clerks “should” know
Asking for emails to find attachments and possibly calendars of training.
#1 –
#2 -- Named parties
Contacts for insights and INTERVIEWS
Definitions
Sometimes similar to the field names in reports, Excel or ???
Also see original RFPs for CLUES into what the required data might be
“7. Please provide copies of all email and scheduling calendars related to training personnel in the use of the above programs.”
For an interesting angle on the value of asking for e-mails, see Jansen, Koos. “US Mint Releases New Fort Knox “Audit Documentation”. The First Critical Observations.” https://goo.gl/6iazzs
JTJ: Contract with the NM Corrections and Centurion Correctional Health Care.
The story
Part of the nut
Graphic …
Show video
Free and collaborative resource on the U.S. Freedom of Information Act, 5 U.S.C. § 552, is provided by the Reporters Committee for Freedom of the Press, with contributions from The FOIA Project at TRAC, MuckRock, FOIA Mapper, and users like you.
Need information about a particular department, agency, or component? Visit the Agencies Landing Page for FOIA regulations, statistics, record systems, lawsuits, and practical tips.
Have a question about FOIA, want to discuss something an agency is doing, or have some news? Visit the FOIA Wiki Forum.
Want to get involved in the development of this the FOIA Wiki? See the help wanted categ
Data Can Only Dance with Its Music: Understanding the ecology of public dataSince the original FOIA legislation 50 years ago, we sought, first, documents and more recently data, the people's data stored as 1s and 0s. Such requests are often rejected for various reasons including, "Our technology can only produce what you want as PDF files. Take it or leave it." Today, however, changes are being implemented recognizing that it is the broad management metadata that provides the context for our FOIA requests. An important breakthrough was the passage of California legislation in 2016 requiring retention and disclosure of multiple variables related to the collection, management and reporting of data. See here. Such legislation is an early indicator of an impending shift in our FOIA process. Today it is important to understand not only who is collecting and keeping the data, but how they are doing so. In this way, if an agency says, "Our technology won't let us give you that data," we can come back with very specific arguments that, "Yes, here is how you can get the data for us in its original, fine-grained file format." The question that is still to be worked out is, "Do I make, first, a separate request for the metadata and then come back with a specific request for the data (and this is how and why you can get it.)" or do we bundle the metadata request in the same request for the data? I suspect that right now, to bundle the requests to all levels of government might provide a too-easy rationale for an agency to reject the request as too burdensome, but we can only encourage people to try both until some pattern emerges.
I could do a presentation on this topic.
Tom
Some silly, some tragic, all questionable in democratic societies – European in this case, but typical in North America
Source: “Excuses for Data Hugging” by Martin Kliehm, posted on Saturday, July 23rd, 2011 at 9:07 PM. RSS 2.0. Martin Kliehm (@kliehm) is a web developer since 1996. He works as a Senior Frontend Engineer in Frankfurt, Germany.
Free and collaborative resource on the U.S. Freedom of Information Act, 5 U.S.C. § 552, is provided by the Reporters Committee for Freedom of the Press, with contributions from The FOIA Project at TRAC, MuckRock, FOIA Mapper, and users like you.
Need information about a particular department, agency, or component? Visit the Agencies Landing Page for FOIA regulations, statistics, record systems, lawsuits, and practical tips.
Includes a page for just about every federal agency
Have a question about FOIA, want to discuss something an agency is doing, or have some news? Visit the FOIA Wiki Forum.
Want to get involved in the development of this the FOIA Wiki? See the help wanted category
See definitions in “Centurion Correctional Healthcare of NM Medical Services 16-770-1300-0097.PDF (Google Drive-IAJ Presentations – NICAR Jacksonville)