2. Document describing data (and/or digital
materials) that have been or will be gathered in
a study or project.
Often includes details on how data will be
organized, preserved, and accessed
Facilitates re-use of data sets by either PI or
other researchers
Required component of grants for MANY
agencies (NSF and NIH)
3. Starting January 2011 for NEW, non-
collaborative proposals
Not voluntary – “integral part” of proposal
Data Management Plans for all data resulting
from any level of NSF funding
Supplementary 2-page document (max)
Optional: Also part of 15-page (max) Project
Description
4. Must address both physical and digital data
“Efficiency and effectiveness” of the DMP will
be considered by NSF and disciplinary division
or directorate
Must include sufficient information that peer
reviewers and project monitors can assess
present proposal and past performance
As of January 2011, proposals will not be
accepted without an accompanying data plan!
5. Such dissemination of data is necessary for the
community to stimulate new advances as quickly as
possible and to allow prompt evaluation of the results
by the scientific community. “ – NSF (italics mine)
Part of Openness trend in federal government
(data.gov - Open Government Initiative)
NIH Public Access Policy (2008)
Public access to federally funded research hearings
- Information Policy, Census and National Archives
Subcommittee of U.S. Congress (July, 2010)
6. It makes your research easier!
Data available in case you need it later
Helps avoid accusations of fraud or bad science
To share it for others to use and learn from
To get credit for producing it
To keep from drowning in irrelevant stuff
... especially at grant/project end
7. Gene expression microarray data: “Publicly
available data was significantly (p=0.006)
associated with a 69% increase in citations,
independently of journal impact factor, date of
publication, and author country of origin.”
Piwowar, Heather et al. “Sharing detailed research
data is associated with increased citation rate.” PLoS
One 2010. DOI: 10.1371/journal.pone.0000308
Maybe there’s an advantage here!
8. Discuss specific requirements for NSF
Data Management plans
Suggest ways to manage, share, and
archive data more effectively
Provide resources for more information
10. What data are you collecting or making?
Can it be recreated? How much would that cost?
How much of it? How fast is it growing? Does it
change?
What file format(s)?
What’s your infrastructure for data collection and
storage like?
How do you find it, or find what you’re looking
for in it?
How easy is it to get new people up to speed? Or
share data with others?
11. Who are the audiences for your data?
You (including Future You), your lab colleagues
(including future ones), your PIs
Disciplinary colleagues, at your institution or at others
Colleagues in allied disciplines
The world!
What are your obligations to others?
Funder requirements
Confidentiality issues
IP questions
Security
12. How do you and your lab get from where you
are to where you need to be?
Document, document, document all decisions and
all processes!
Secret sauce: the more you strategize upfront,
the less angst and panic later.
“Make it up as you go along” is very bad practice!
But the best-laid plans go agley... so be flexible.
And watch your field! Best practices are still in flux.
13. Four kinds of data defined by OMB:
Observational
Examples: Sensor data, telemetry, survey data, sample
data, neuroimages.
Experimental
Examples: gene sequences, chromatograms, toroid
magnetic field data.
Simulation
Examples: climate models, economic models.
Derived or compiled
Examples: text and data mining, compiled database, 3D
models, data gathered from public documents.
14. Preliminary analyses
Raw data is included in this definition
Drafts of scientific papers
Plans for future research
Peer reviews or communications with
colleagues
Physical objects, such as gel samples
15. As early as possible, but no later than
guidelines laid down by relevant Directorate
Engineering Section: “no later than the acceptance
for publication of the main findings of the final data”
Earth Sciences: “No later than two (2) years after the
data were collected.”
Social and Economic Sciences: “within one year after
the expiration of an award”
Be aware of concerns that may require earlier
or later disclosure
FERPA? Human Subjects data? HIPAA?
16. Again, specific retention periods will depend
on the type of data and the grant program
Example: NSF Engineering Section suggests
retention period of “three years after either
completion of the grant project or public release of
research data, whichever is later”
Certain types of data will need to be retained
longer
Patent data, longitudinal data sets, etc.
Ask: is your data of permanent value?
17. Analyzed data (incl. images, tables and tables of
numbers used for making graphs)
Metadata that defines how data was generated,
such as experiment descriptions, computer code,
and computer-calculation input
18. Investigators are expected to preserve/share
primary data, samples, physical collections, &
supporting materials
Provide easily accessible information about data
holdings, including quality assessments and
guidance/finding aids
Data may be made available through submission to
national data center, publication in journal, book, or
accessible website of institutional archives
20. All submitted plans must include, at
minimum:
1. Expected Data: types, physical/electronic collections,
materials to be produced
2. Standards for data and metadata format and content
3. Policies for access and sharing, including provisions for
appropriate protection of privacy, confidentiality,
security, intellectual property, etc.
4. Policies and provisions for re-use, re-distribution, and
the production of derivatives
5. Plans for archiving data, samples, and other research
products, and for preservation of access to them
21. In short: What kind of data will be produced by
your research processes?
Keep in mind:
File formats of complete data sets
Any software or code that will be needed/produced
Physical samples or other individual data points
Some divisions require retention of physical samples;
consult your Program Officer
22. In short: how will you organize your data within
datasets to make it widely accessible, and how
will you make data sets identifiable?
Keep in mind:
Any data formatting standards for your particular
discipline
Any metadata (author, date, subject, etc.) that your
program attaches automatically, and what you will
need to attach manually
How will you find your data for later consultation?
How will others find it?
23. In short: How will you allow other researchers
to find and use your data?
Keep in mind:
How will other researchers find your data? (i.e. How
will you publicize its existence?)
How will you provide access to your data?(CD-RW?
Data Repository? Download via pantherFILE?)
How will you prepare your data for sharing?
Do you need to depersonalize or declassify anything?
24. Data Management Plans are required even if a
project is not expected to generate data that
requires sharing
DMP should clearly explain non-sharing in
light of COI standards (peer review)
Between the lines: Not sharing will require
justification and close scrutiny by NSF
Sharing is preferred
25. In short: How will researchers obtain the
appropriate permissions to use your data?
Keep in mind:
Is a blanket permissions statement or a case-by-case
policy more efficient/practical?
What responsibilities will users of your data have re:
privacy, intellectual property, etc.?
How will you deal with users who violate these
provisions?
26. In short: How will you make sure your data
stays intact and available once you are done
using it?
Keep in Mind:
What are your retention requirements? Is this a
permanent data set?
What storage media will you use? Are you prepared
to migrate/emulate as needed?
Do you have a data backup plan?
28. Think about where you will put your data
Local? Network drive? Online data management
system?
Think about how you (or others) will find your
data
Think about how others may use your data, when
found
Think about how to store your data in the long
term (or if to store it long-term at all)
29. Will anybody be able to read these files at the
end of your time horizon?
Where possible, prefer file formats that are:
Open, standardized
Documented
In wide use
Easy to data-mine, transform, recast
If you need to transform data for durability,
do it now, not later.
30. Fundamental question: What would someone
unfamiliar with your data need in order to
find, evaluate, understand, and reuse them?
Consider the differences between someone
inside your lab, someone outside your lab but
in your field, and someone outside your field.
Two parts: metadata and methods
31. About the project
Title, people, key dates, funders and grants
About the data
Title, key dates, creator(s), subjects, rights, included
files, format(s),versions, checksums
Interpretive aids: codebooks, data dictionaries,
algorithms, code
Keep this with the data– think of it as a
Readme file
32. Reason #1 for not reusing someone else’s data: “I
don’t know enough about how it was gathered to
trust it.”
Document what you did. (A published article may
or may not be enough.)
Document any limitations of what you did.
If you ran code on the data, document the code and
keep it with the data.
Need a codebook? Or a data dictionary?
If I can’t identify at sight what each bit of your dataset
means, yes, you do need a codebook or data dictionary.
DO NOT FORGET UNITS!
33. Your own drive (PC, server, flash drive, etc.)
And if you lose it? Or it breaks?
Somebody else’s drive
Departmental or campus drive
“Cloud” drive
Do they care as much about your data as you do?
What about versioning?
Library motto: Lots Of Copies Keeps Stuff Safe.
Two onsite copies, one offsite copy.
Keep confidentiality and security requirements in
mind, of course
34. If data need to persist beyond project end, you have to
deal with a new kind of risk: organizational risk.
Servers come and go. So do labs. So do entire departments.
This is especially important if you share data! Don’t let it 404!
You need to find a trustworthy partner.
On campus: try the library or Research and Sponsored
Programs. (UITS has a role but can’t do it alone!)
Off campus: look for a disciplinary data repository, or a journal
that accepts data. (It’s a good idea to do this as part of your
planning process.)
Let somebody else worry! You have new projects to get
on with.
36. Informational websites
UW-Madison: http://researchdata.wisc.edu/
UW-Milwaukee: http://dataplan.uwm.edu
Don’t just use the site for your own campus!
Data experts
IT cyberinfrastructure experts
Archivists/records managers
MINDS@UW: minds.wisconsin.edu
Data in final form that make sense as discrete files
37. For Information:
NSF Grant Proposal Guide
http://www.nsf.gov/pubs/policydocs/pappguide/nsf
11001/gpg_index.jsp
MIT Data Management and Publishing
http://libraries.mit.edu/guides/subjects/data-
management/index.html
For storage/management (non-inclusive):
A partial list of potential repositories:
http://databib.org
Ask: can my home institution provide better service?
38. For assistance with writing your plan:
California Digital Library DMP Creation Tool
https://dmp.cdlib.org/ (Select “UWM” as institution)
Data Conservancy DMP Template/Questionnaire
http://dataconservancy.org/dataManagementPlans
DataONE Best Practices Examples
http://www.dataone.org/plans
Data Curation Profiles (Purdue University)
http://datacurationprofiles.org/
Digital Curation Center Tools Catalog
http://www.dcc.ac.uk/resources/external/tools-
services
39. Make sure your data plan covers at least the
minimum requirements set out by NSF
Create appropriate metadata to help you
manage and find data
Use open, universal standards and file formats
Be prepared to preserve access tools along with
data itself
Be aware of time periods for data sharing and
retention
40. Contact the presenter
Brad Houston, UW-Milwaukee
houstobn@uwm.edu (414) 229-6979
This presentation available online at:
http://www.slideshare.net/herodotusjr/data-
management-plans-dmp-for-nsf