NSF Data Management Plans

DATA MANAGEMENT
PLANS & PLANNING: March 7, 2012
MEETING THE NSF
REQUIREMENT

WHO ARE WE?

Heather Coates
Digital Scholarship & Data Management Librarian
University Library

Kristi Palmer
Digital Libraries Team Leader
University Library

LEARNING OBJECTIVES

After attending this workshop:

 You will understand the NSF data policies.
 You will be aware of the relevant data -related services at IUPUI.
 You will have resources to develop a data management plan
(DMP) for your NSF proposal(s).
 You will be able to write a comprehensive DMP for your NSF
proposal(s).
 You will send your DMP draft to the Data Services Program for
review and assistance as needed.

OVERVIEW

 Context for the NSF data policies

 Meeting the NSF DMP requirement
 The requirement: 5 elements
 Developing a Data Management Plan
 Implementing your plan

 Workshop Evaluation

CONTEXT: SCHOLARLY COMMUNICATIONS

 Funding agency requirements
 Scholarly Impact
 Exposure  increased citation
 More equal access (especially for students)
 Facilitates reproducibility
 Facilitate new discoveries via secondary analysis/data re -use
 Foster productive collaborations
 Lead to new computational techniques
 Planning for the future
 If we can’t find it, it doesn’t exist
 Persistent access
 Long-term preservation

CONTEXT: WHY THE LIBRARY?

preservation, curation, access
 Trusted member of the institution
 Organizational structure lends itself to collaboration with
researchers
 Interdisciplinary by nature
 Existing infrastructure for digital information
 Existing expertise in preserving and providing access to
information
 Program of Digital Scholarship
 Archives

CONTEXT: DATA SERVICES PROGRAM

 Part of the Program of Digital Scholarship
 Mission
 Identifying data issues and connecting you to the solutions
 Services
 Workshops
 Individual consultations
 Data repository
 Resources
 Guide to NSF Data Management Plan Requirement
 Website

CONTEXT: TERMINOLOGY

 Cyberinfrastructure: computing resources & networks, services,
& people (see Empowering People, 2009 for more)
 Data management: technical processing and preparation of data
for analysis
 Data curation: selection of data for preservation and adding
value for current and future use
 Data citation: mechanisms to enable easy reuse and verification,
track impact of data, and create structures to recognize and
reward researchers (DataCite)
 Data sharing: must take into account ethical and legal issues; a
spectrum with many options

CONTEXT: FEDERAL POLICIES

 Issues in scholarly communication
 Open access
 Open data & data citation
 Data management & curation
 Federal policies (incremental steps towards openness)
 National Research Council, 1985
 Office of Management & Budget, 1999: Circular A-110
 NIH Data Sharing Policy, 2003
 NIH Public Access Policy, 2008
 NSF DMP Requirement, 2011
 Other policies: Wellcome Trust, Howard Hughes Medical Institute, NOAA,
NEH

CONTEXT: IU STRATEGIC PLAN

IU Empowering People Strategic Plan for IT (2009) Action 33:

“IU should provision a data utility service for research data that
affords abundant near- and long-term storage, ease of use, and
preservation capabilities. This data utility will need to offer a range
of services for securing data, providing authorized access within
and beyond IU; ensuring metadata description, annotation, and
provenance; and providing backup/recovery services.”

CONTEXT: OPEN ACCESS

 What is Open Access?
 Freely available, online, and free of most copyright restrictions
 Why should you care?
 Right thing to do?
 Increase your citations
 “We analysed 119,924 conference articles in computer science and related
disciplines. The mean number of citations to offline articles is 2.74, and the
mean number of citations to online articles is 7.03, an increase of 157%.”
(Lawrence, 2008)
 Publisher functions need not reside in for profit hands
 "Between 1975 and 2005 the average cost of journals in chemistry and
physics rose from $76.84 to $1,879.56. In the same period, the cost of a
gallon of unleaded regular gasoline rose from 55 cents to $1.82. If the gallon
of gas had increased in price at the same rate as chemistry and physics
journals over this period it would have reached $12.43 in 2005, and would
be over $14.50 today.” (Lewis, 2008)

CONTEXT: OPEN ACCESS @ IUPUI

 IUPUI University Library Program of Digital Scholarship
 http://www.ulib.iupui.edu/digitalscholarship
 Open Journals
 IUPUIScholarWorks-Faculty Scholarship
 Electronic Theses and Dissertations
 Cultural Heritage Collections
 Data
 eArchives

CONTEXT: RESEARCH LIFE CYCLE

Source: DDI Structural Reform Group. “DDI Version 3.0 Conceptual Model." DDI Alliance. 2004. Accessed on 11 August 2008.
<http://www.icpsr.umich.edu/DDI/committee-info/Concept-Model-WD.pdf>.

CONTEXT: BENEFITS OF PLANNING

 Saves time
 Less reorganization down the road
 Increases efficiency
 Gathers necessary information for analysis and writing
 Prevents problems in understanding data and metadata
 Makes it easier to preserve your data
 Requirements from some funding agencies and institutions

DMP: THE REQUIREMENT

 Why?
 Increased impact of research money
 Reduce redundant data collection
 Enhance use and value of existing data
 Further scientific research
 Language is broad to allow input from research communities
 Implementation costs of the DMP CAN be included in direct costs

DMP: PRACTICAL TIPS

 The gist of it…
 Describe what you will do with your data during and after the proposed
project
 Ensures data is safe now and in the future
 DMP should reflect…
 Awareness of data management and curation in your discipline
 Feasible plan to utilize available cyberinfrastructure
 Try to…
 Explain the rationale for your choices
 Identify roles for data management and curation activities

DMP: ELEMENTS

 Types of data
 Standards and metadata
 Access and sharing
 Re-use, re-distribution, and the production of derivatives
 Long-term preservation
 [Budget]

DMP: TYPES OF DATA [1]

Use standards common in your research community

 Characterize the data to be generated or used
 Types of data?
 experimental, observational, raw or derived, models, simulations, curriculum
materials, software, images, audio, video, etc.
 What file formats will be used?
 Text, spreadsheet, database, etc.
 How will it be collected? (describe the process)
 How much data?
 Will the data be reproducible?
 How does the project relate to existing data?
 If dataset will be combined, how to ensure interoperability?


 How will data be collected?
 How? (tools, instruments, measurements, etc.)
 When? (timeframe, series)
 Where?

 How will data be processed?
 Workflows
 Software packages

 How will the data be stored and managed?
 File naming conventions
 Version control


 What QA & QC measures will be used?
 Identify steps during processing and analysis to eliminate bad data
points
 Examples: double data entry, data screening tests

 What is the backup and security plan?
 Plan for particular security or confidentiality issues
 Location & frequency

 Roles & responsibilities
 Who will carry out data collection, processing, and backup activities?

EXAMPLE: TYPES OF DATA

Atmospheric Concentrations of CO2, Mauna Loa Observatory,
Hawaii, 2011-2013
https://www.dataone.org /sites/all/documents/DMP_MaunaLoa_Fo
rmatted.pdf

Arthropod responses to grassland nutrient limitation
https://www.dataone.org /sites/all/documents/DMP_NutNet_Form
atted.pdf

DMP: STANDARDS & METADATA [1]

 Metadata describes the who, what, when, where, how, why of
the data

 Purpose of metadata is to enable finding, organization,
interoperability, identification, archiving & preservation

 Standards are commonly agreed upon terms and definitions in a
structured format

DMP: STANDARDS & METADATA [2]

 Will your datasets be self -explanatory or understandable in
isolation?

 Decisions to make about metadata
 Relevant standard(s)
 Format
 Content
 What information is needed to use and interpret in 5 years, 25 years?
 Ask your fellow researchers and check with data centers or repositories

 How are metadata created?
 Automatically generated
 Manually created

EXAMPLE: STANDARDS & METADATA [1]

Atmospheric Concentrations of CO2, Mauna Loa Observatory,
Hawaii, 2011-2013
https://www.dataone.org /sites/all/documents/DMP_MaunaLoa_Fo
rmatted.pdf

Metadata will be comprised of two formats —Contextual
information about the data in a text based document and ISO
19115 standard metadata in an xml file. These two formats for
metadata were chosen to provide a full explanation of the data
(text format) and to ensure compatibility with international
standards (xml format). The standard XML file will be more
complete; the document file will be a human -readable summary of
the XML file.

EXAMPLE: STANDARDS & METADATA [2]

R i o G ra n d e H yd rol ogic G e o d atabase C o m p e n di um
htt ps:/ /www. dataone .org /site s /al l/ doc ume nts /D M P_ Hydrol ogic _ Form atte d.pdf

M i c ro s o f t A c c e s s D a ta b a s e fo r ma t w i l l b e u s e d s i n c e i t i s re a d i l y a c c e s s i b l e a n d i t i s
co m p a t i b l e w i t h E S R I A rc G I S ( htt p : / / w w w. e s r i . co m/s o f twa re /a rcg i s /i n d ex . ht ml ), a
G e o g ra p h i c I nfo r m at i o n S y s te m s o f t w a re p a c ka g e u s e d by t h e s ta ke h o l d e rs . N a m i n g
co nv e nt i o n s w i l l b e co n s i s te nt – n o s p a c e s w i l l b e u s e d i n ta b l e n a m e s o r f i e l d n a m e s .
T h e f i l e n a m i n g co nv e nt i o n w i l l co n s i s t o f t h e d a ta s o u rc e _ d a ta t y p e fo r m a t fo r ra w d a ta
f i l e s . D a ta re p o r t i n g f u n c t i o n a l i t y w i l l b e b u i l t i nto t h e V B A p ro c e s s i n g p ro g ra m s to
p ro v i d e o u t p u t i n .t x t f i l e fo r m at fo r n u m b e r o f re co rd s p e r s o u rc e w h e n u p d ata b l e d ata
s o u rc e s a re ref re s h e d .
Ev e r y ef fo r t w i l l b e m a d e to g o b a c k to t h e a u t h o r i ta t i v e s o u rc e fo r a n i d e nt i f i e d d a ta s et .
Q u a l i t y co nt ro l o f t h e d a ta b a s e w i l l b e p e r fo r me d u s i n g S Q L s ta te m e nt s t h a t ca p i ta l i ze o n
t h e d a ta b a s e s t r u c t u re to e n s u re re l a t i o n a l d a ta b a s e i nte g r i t y. A p p ro p r i a te p r i m a r y key s
w i l l b e a s s i g n e d to m a n a g e p o s s i b l e d a ta d u p l i ca te s . Po te nt i a l d u p l i ca te s i te I D s , w i l l b e
h a n d l e d t h ro u g h a u to m a te d p ro c e d u re s a n d t h e c re a t i o n o f a l te r n a te I D ta b l e s .
A d a ta d i c t i o n a r y w i l l b e c re ate d t h a t d ef i n e s t h e ta b l e d ef i n i t i o n , ta b l e f i e l d s , a n d ta b l e
f i e l d d a ta t y p e s . A n e nt i t y - re l at i o n s h i p d i a g ra m w i l l b e c re a te d t h a t d ef i n e s t h e
re l a t i o n a l s t r u c t u re o f t h e d a ta b a s e .
A m eta d a ta re co rd w i l l b e p ro d u c e d u s i n g t h e F G D C s ta n d a rd t h a t d e s c r i b e s t h e e nt i re
g e o d a ta b a s e. T h e F G D C s ta n d a rd w a s c h o s e n d u e to re q u i re d Fe d e ra l g o v e r n m e nt
s t a n d a rd s .

DMP: ACCESS & SHARING

 What are your obligations for sharing?
 Funding agency, institution, other organization, legal, etc.
 What are the ethical or legal issues? (i.e., privacy,
confidentiality, security, intellectual property, or other rights)

 How will the data be made available?
 What is the process for gaining access?
 When will the data be made available?
 When will the data become available?
 For how long will the data be available?
 What is the process for gaining access?
 Who will have access to the data?

DMP: RE-USE, RE-DISTRIBUTION, ETC.

 What rights will you retain before data is made available?
 Will permission restrictions be necessary?
 Limits or conditions for political, commercial, or patent reasons?
 Is there an embargo period? Why?

 Future users and uses
 Who might be interested in the data?
 How might you anticipate this data being used?
 What value might the data have for these people?

EXAMPLE: ACCESS, SHARING, RE-USE

Development of a NanoKlein Calorimeter
http://libguides.unm.edu/content.php?pid=137795&sid=1422879
We expect to apply for a patent for this instrument. All of the
materials submitted as part of the patent process will be a matter
of public record. We will also make technical drawings, test data
and calibration data available through our institutional repository.

Cave Microbiology
http://libguides.unm.edu/content.php?pid=137795&sid=1422879

DMP: LONG-TERM PRESERVATION

 What data will be preserved?
 What transformations are necessary to prepare the data?
 How long do you think the data will be useful? How long will the
data be preserved?
 Contextual information needed to make the data reusable
 metadata, references, reports, manuscripts, grant proposal, etc.
 Where will it be preserved?
 Links to published materials and other outcomes? Use of persistent
citation?
 Procedures for preservation and back-up?
 Who will be the contact for the dataset?

EXAMPLE: LONG-TERM PRESERVATION [1]

Arthropod responses to grassland nutrient limitation
https://www.dataone.org /sites/all/documents/DMP_NutNet_Form
atted.pdf

We will preserve both arthropod datasets generated during this
project (abundance and stoichiometry) for the long term in the
Digital Conservancy at the U of M. We will include the .csv files,
along with the associated metadata files. We will also submit an
abstract with the datasets that describe their original context and
any potentially relevant project information. Borer will be
responsible for preparing data for long -term preservation and for
updating contact information for investigators.

EXAMPLE: LONG-TERM PRESERVATION [2]

Improving the long-term preservability of HDF-formatted data by
creating maps to file contents
https://www.dataone.org /sites/all/documents/DMP_HDFMap_For
matted.pdf

The writer software will be preserved by the HDF Group for the life
of the HDF libraries. The HDF Group uses industrystandard best
practices to ensure the integrity of their software and systems.
Once the map writer has been used to generate maps for every
HDF file in existence, the continued existence of the writer
software is not required. The reader software will be preserved at
SourceForge.org for as long as there is community interest. The
collection of HDF files will be preserved at NSIDC as long as utility
is deemed high.

IMPLEMENTING YOUR PLAN [1]

 The DMP is a working document
 NSF expects progress to be reported
 Incorporate implementation into the project startup process
 C&G, IRB, IACUC all have to be in place before data collection can begin
 Review, revise, and set up your system during startup
 Good documentation ensures…
 A shared understanding of the data throughout a project
 That future researchers will be able to understand data within the
relevant context
 That re-users of data are able to interpret the data appropriately
 Resources for backing up data during a project
 Research File System: http://pti.iu.edu/storage/rfs
 Scholarly Data Archive: http://pti.iu.edu/storage/sda

IMPLEMENTING YOUR PLAN [2]

Program of Digital Scholarship: http://ulib.iupui.edu/digitalscholarship
Center for Research & Learning: http://crl.iupui.edu/
OVCR: http://research.iupui.edu/development/
Office of Academic Affairs: http://www.academicaffairs.iupui.edu
Intellectual Property Policy: https://www.indiana.edu/~vpfaa/
academicguide/index.php/Policy_I-11

Research File System: http://pti.iu.edu/storage/rfs
Scholarly Data Archive: http://pti.iu.edu/storage/sda
Research Technologies, UITS: http://uits.iu.edu/page/avel
Core Ser vices, UITS: http://pti.iu.edu/cs
Scholarly Cyberinfrastructure, UITS: http://uits.iu.edu/page/amee
C TSI Tools: http://www.indianactsi.org /rct (Alfresco Share, REDCap )

IUWare: https://iuware.iu.edu
IUanyWare: https://iuanyware.iu.edu/vpn/index.html
StatMath: http://www.indiana.edu/~statmath/
Statistics Consulting Center: http://www.math.iupui.edu/asci/

RESOURCES [1]

Data Services Program site:
http://ulib.iupui.edu/digitalscholarship/
dataservices.html
National Science Board, Digital Research Data Sharing &
Management, 2012 (pre-publication):
http://www.nsf.gov/nsb/publications/2011/nsb1124.pdf
National Institutes of Health, Data Sharing Policy
http://grants.nih.gov/grants/policy/data_sharing /data_sharing_gui
dance.htm
NIH Public Access Policy Implications
http://publicaccess.nih.gov/public_access_policy_implications_20
12.pdf
IU New Employee Compliance Orientation (NECO)
http://researchadmin.iu.edu/EO/eo_sessions.html

RESOURCES [2]

UK Data Archive: Managing & Sharing Data Brochure:
http://www.data-archive.ac.uk/media/2894/managingsharing.pdf
UK Data Archive Costing Tool:
http://www.data-archive.ac.uk/media/257647/
ukda_jiscdmcosting.pdf
Creative Commons Licenses & Data:
http://wiki.creativecommons.org /Data
Licensing Research Data, Digital Curation Centre
http://www.dcc.ac.uk/resources/how -guides/license-research-data
CIC Author Addendum
http://www.cic.net/authors
DMPTool: https://dmp.cdlib.org /
DMPOnline: https://dmponline.dcc.ac.uk/

COMPELLING CASES FOR OPEN DATA

Tim Berners-Lee: http://www.ted.com/talks/tim_berners_lee_
on_the_next_web.html

Open-source cancer research: http://www.ted.com/talks/
jay_bradner_open_source_cancer_research.html

Polymath problem blogs:
http://polymathprojects.org /about/
http://stevekochscience.blogspot.com/2011/02/open -data-
success-story.html
http://eaves.ca/2011/09/07/the -economics-of-open-data-mini-
case-transit-data-translink/

REFERENCES

1. Higgins, S. ( nd). What are metadata standards. http://ww w.dcc.ac.uk/
resources/bri efing -papers/standards -watch-papers/what -are- metadata -
standards
2. Digital Curation Centre. ( nd). DCC Charter and Statement of Principles.
Retrieved from http://ww w.dcc.ac.uk/about -us/dcc- charter.
3. Indiana Universit y. (2011). Indiana Universit y ’s Advanced
Cyberinf rast ructure. Retri eved from
http://pti.iu.edu/cyberinf rast ructure.pdf.
4. Indiana Universit y. (2009). Empowering Peopl e: Indiana Universit y ’s
Strategic Plan for Information Technology. Retrieved from
http://ovpit.iu. edu/st rategic2/ .
5. National Science Foundati on. (2011 ). Award and Administration Guide:
Chapter IV C.4., Disseminati on and Sharing of Research Results. Ret ri eved
from
http://ww w.nsf. gov/pubs/policydocs/pappguide/nsf 1 1001/aag_6. jsp#VI D4 .
6. Lawrence, S., Free online availability substantially increases a paper ’s
impact, Nature, 31 May 2001. http://ww w.nat ure. com/nature/debates/e -
access/Articles/lawrence.html (accessed November 5, 2008,)
7. Lewis, David W. "Librar y budgets, open access, and the future of scholarl y
communication: Transformati ons in academic publishing." C&RL News, May
2008, Vol. 69, No. 5. [Available at:
http://ww w.ala.org /ala/mgrps/di vs/acrl/publicati ons/crlnews/
2008/may/ALA_print _layout _1_ 47113 9_471 139. cf m ]

THANK YOU

Tell us what you think, take a brief survey.

Find us @
http://ulib.iupui.edu/digitalscholarship
Heather Coates, hcoates@iupui.edu, 317-278-7125
Kristi Palmer, klpalmer@iupui.edu, 317-274-8230

EXTRA: NIH DATA SHARING POLICY

 $500,000 or more in direct costs in any year of the proposed
research
 Final research data, not summary statistics or tables, not underlying
pathology reports and other clinical source documents, might
include both raw data and derived variables
 If an application describes a data -sharing plan, NIH expects that
plan to be enacted.
 NIH expects the timely release and sharing of data to be no later
than the acceptance for publication of the main findings from the
final dataset.
 It is the responsibility of the investigators, their Institutional
Review Board (IRB), and their institution to protect the rights of
subjects and the confidentiality of the data. Prior to sharing, data
should be redacted to strip all identifiers, and effective strategies
should be adopted to minimize risks of unauthorized disclosure of
personal identifiers.

EXTRA: NIH DATA SHARING PLAN

 describe briefly the expected schedule for data sharing
 the format of the final dataset
 the documentation to be provided
 whether or not any analytic tools also will be provided
 whether or not a data -sharing agreement will be required
 if so, a brief description of such an agreement (including the criteria for
deciding who can receive the data and whether or not any conditions
will be placed on their use)
 mode of data sharing (e.g., under their own auspices by mailing
a disk or posting data on their institutional or personal website,
through a data archive or enclave)

 Applicants may request funds in their application for data
sharing.

NSF Data Management Plans

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a NSF Data Management Plans

Similar a NSF Data Management Plans (20)

Más de IUPUI

Más de IUPUI (20)

Último

Último (20)

NSF Data Management Plans

Notas del editor