Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
Lightning Talks
William Mischo, University of Illinois at Urbana-Champaign
ICT Role in 21st Century Education & its Challenges.pptx
RDAP14: An analysis and characterization of DMPs in NSF proposals from the University of Illinois
1. An Analysis and Characterization of
DMPs in NSF Proposals from the
University of Illinois
RDAP14 Research Data Access & Preservation
Summit
March 26, 2014
William H. Mischo, Mary C. Schlembach, Megan A.
O’Donnell
University of Illinois at Urbana-Champaign
Iowa State University
2. NSF data Management Plans
• Data Management Plans (DMPs): required
element in NSF proposals, January 2011
• July 2011: the Library, working with the campus
Office of Sponsored Programs and Research
Administration (OSPRA) began an analysis of
DMPs in submitted NSF grant proposals
• Currently, looked at 1,600 grants with 1,260 in
the analysis.
3. Reasons for DMPs
• Make key research data available and sharable
• Allow the use of data for verification of results
and reproducibility of research work
• Agency can show significant return on
investment to justify funding
• We want to know storage venues and
mechanisms for sharing and reuse
• Also use of local templates and local campus
resources such as IDEALS
4. Follow-on
• Develop campus-wide infrastructure (Research
Data Service - RDS) to support UIUC researchers
in managing their data
• Assist in compliance with federal agencies
• Develop important partnerships with campus
units (CITES, NCSA, Colleges) and national
entities
• Develop best practices and standard approaches
5. Analysis
• Analysis attempts to characterize and classify
DMPs into categories
• DMPs assigned multiple categories
• 1,260 DMPs from July 2011 to November 2013
6. Categories
• PI Server – Servers and workstations that the PIs
(and their students/staff) use to store project
data. Examples: laboratory server, external hard
drive, and group computer.
• PI Website – Websites edited or administered
by the PI or a group they belong to. If a
departmental URL was given, it was also given
the term “department.” Examples: lab website,
project website, wiki, PI’s website
7. Categories
• Campus – Services located, operated by, run by
UIUC or endorsed by UIUC. This includes IDEALS,
netfiles and Box.net, NCSA, and Beckman.
• Department – Used when a department was
specifically mentioned as providing a storage or
hosting resource. Examples: Departmental
website, departmental server, departmental
backup service or a web address traced back to
an academic department. Also given the
“campus” label.
8. Categories
• Remote – Services and sites not located on the
UIUC campus. Examples: NASA, other campuses,
collaborative projects, non-UIUC institutes
• Disciplinary – Disciplinary repositories. Many are
open access but not all. Examples: GenBank,
arXiv, ICPSR, SEAD, Nanohub, and Dryad
• Cloud – Storage services using cloud technology.
Examples: Google Documents, Google Code,
Box.net, Amazon, Microsoft, Dropbox
9. Categories
• Publication – Scholarly outputs including journal
articles, workshops, and conference
presentations or posters. Very few DMPs were
explicit as to how their “publications” and data
were related or separated.
• Analog - Physical records including lab
notebooks, photographs, and files. Does not
include specimens or artifacts.
• Specimens - – Physical specimens; usually
biological or artifacts
10. Categories
• Optical Disc - DVD, CD, and Blu-ray discs. Often
used as a backup mechanism
• Not specified – the DMP was not specific
enough for us to record details
• No Data – Indicated the proposal will produce
no data products. Many were theoretical studies
(math), travel grants, or workshop planning
sessions.
• Local Template Used
11. All DMPs (including “no data”)
n = 1260
Category Number Percent
PI Server 503 39.9%
PI Website 529 41.9%
Campus 667 52.9%
Department 142 11.2%
Remote 353 28.0%
Disciplinary 275 21.8%
Publication 556 44.1%
Cloud 63 5.0%
Optical Disc 56 4.0%
Analog 131 10.4%
Specimens 111 8.8%
Not Specified 66 5.2%
Collaborative 164 13.0%
No Data 103 8.2%
12. Data Venue and Risk
Data Location
Submitted Proposals
Funded Proposals
Since July 2011
n = 1260
Risk of Loss,
Corruption, Breach n = 298
Risk of Loss,
Corruption, Breach
PI Server/Website 64% High 61% High
Departmental Server/Website 11.2% Medium to High 7% Medium to High
Campus-Wide Resource 52.9%
Low
45%
Low
IDEALS Institutional Repository 21.9% 19.8%
NCSA 4.3% 16.4%
Disciplinary Repository/Cloud 25.8% Medium to Low 21.4% Medium to Low
Remote Repository 28% Medium to High 22.8% Medium to High
Optical Disk, Specimens, Analog 19.4% Out of Scope 11% Out of Scope
13. Notables
• Funded: 298
• Used locally developed template: 254
• IDEALS: 275
• NCSA/XSEDE: 55
• Dryad: 22
• ICPSR: 17
• Genbank/Genetics Repository: 55
• ArX: 61
• Only 87 DMPS contained information about file
types
14. Analysis
• Any differences in storage venue or technologies
between the unfunded proposals and the funded
proposals?
• Any differences between the proposals from the
first year and the more current proposals?
• Can look at differences in any of the proposal
categories between funded and unfunded
• 734 active NSF awards, $861.8 million
15. Analysis
• Use of IDEALS institutional repository: 62
funded, 197 not funded: chi-square: 0.17
• Storing data on PI server or website: 183
funded, 569 not funded: chi-square: 0.7
• Disciplinary or Cloud: 67 funded, 241 not
funded: chi-square: 0.85
• Remote storage: 68 funded, 267 not funded:
chi-square: 3.01
16. Analysis
• Use of IDEALS before August 2012 = 108, after
(thru November 2013) = 166, chi-square: 4.59, p
< .05
• Use of disciplinary or Cloud before August 2012 =
121, after = 182, chi-square: 4.33, p < .05
17. Implications
• Conclusions: 1: no significant differences
between funded/unfunded proposals in storage
venues -- no advantage in IDEALS, Disciplinary; 2:
more recent proposals suggest IDEALS and
disciplinary repositories included at a
significantly higher level
• What is the role of the library? The campus? The
subject discipline?
• Connecting data to the literature important