March 7 version of the IUPUI workshop Meeting the NSF Data Management Plan Requirement: What you need to know. This workshop is co-sponsored by the Office of the Vice Chancellor for Research and the University Library.
2. WHO ARE WE?
Heather Coates
Digital Scholarship & Data Management Librarian
University Library
Kristi Palmer
Digital Libraries Team Leader
University Library
3. LEARNING OBJECTIVES
After attending this workshop:
You will understand the NSF data policies.
You will be aware of the relevant data -related services at IUPUI.
You will have resources to develop a data management plan
(DMP) for your NSF proposal(s).
You will be able to write a comprehensive DMP for your NSF
proposal(s).
You will send your DMP draft to the Data Services Program for
review and assistance as needed.
4. OVERVIEW
Context for the NSF data policies
Meeting the NSF DMP requirement
The requirement: 5 elements
Developing a Data Management Plan
Implementing your plan
Workshop Evaluation
5. CONTEXT: SCHOLARLY COMMUNICATIONS
Funding agency requirements
Scholarly Impact
Exposure increased citation
More equal access (especially for students)
Facilitates reproducibility
Facilitate new discoveries via secondary analysis/data re -use
Foster productive collaborations
Lead to new computational techniques
Planning for the future
If we can’t find it, it doesn’t exist
Persistent access
Long-term preservation
6. CONTEXT: WHY THE LIBRARY?
preservation, curation, access
Trusted member of the institution
Organizational structure lends itself to collaboration with
researchers
Interdisciplinary by nature
Existing infrastructure for digital information
Existing expertise in preserving and providing access to
information
Program of Digital Scholarship
Archives
7. CONTEXT: DATA SERVICES PROGRAM
Part of the Program of Digital Scholarship
Mission
Identifying data issues and connecting you to the solutions
Services
Workshops
Individual consultations
Data repository
Resources
Guide to NSF Data Management Plan Requirement
Website
8. CONTEXT: TERMINOLOGY
Cyberinfrastructure: computing resources & networks, services,
& people (see Empowering People, 2009 for more)
Data management: technical processing and preparation of data
for analysis
Data curation: selection of data for preservation and adding
value for current and future use
Data citation: mechanisms to enable easy reuse and verification,
track impact of data, and create structures to recognize and
reward researchers (DataCite)
Data sharing: must take into account ethical and legal issues; a
spectrum with many options
9. CONTEXT: FEDERAL POLICIES
Issues in scholarly communication
Open access
Open data & data citation
Data management & curation
Federal policies (incremental steps towards openness)
National Research Council, 1985
Office of Management & Budget, 1999: Circular A-110
NIH Data Sharing Policy, 2003
NIH Public Access Policy, 2008
NSF DMP Requirement, 2011
Other policies: Wellcome Trust, Howard Hughes Medical Institute, NOAA,
NEH
10. CONTEXT: IU STRATEGIC PLAN
IU Empowering People Strategic Plan for IT (2009) Action 33:
“IU should provision a data utility service for research data that
affords abundant near- and long-term storage, ease of use, and
preservation capabilities. This data utility will need to offer a range
of services for securing data, providing authorized access within
and beyond IU; ensuring metadata description, annotation, and
provenance; and providing backup/recovery services.”
11. CONTEXT: OPEN ACCESS
What is Open Access?
Freely available, online, and free of most copyright restrictions
Why should you care?
Right thing to do?
Increase your citations
“We analysed 119,924 conference articles in computer science and related
disciplines. The mean number of citations to offline articles is 2.74, and the
mean number of citations to online articles is 7.03, an increase of 157%.”
(Lawrence, 2008)
Publisher functions need not reside in for profit hands
"Between 1975 and 2005 the average cost of journals in chemistry and
physics rose from $76.84 to $1,879.56. In the same period, the cost of a
gallon of unleaded regular gasoline rose from 55 cents to $1.82. If the gallon
of gas had increased in price at the same rate as chemistry and physics
journals over this period it would have reached $12.43 in 2005, and would
be over $14.50 today.” (Lewis, 2008)
12. CONTEXT: OPEN ACCESS @ IUPUI
IUPUI University Library Program of Digital Scholarship
http://www.ulib.iupui.edu/digitalscholarship
Open Journals
IUPUIScholarWorks-Faculty Scholarship
Electronic Theses and Dissertations
Cultural Heritage Collections
Data
eArchives
13. CONTEXT: RESEARCH LIFE CYCLE
Source: DDI Structural Reform Group. “DDI Version 3.0 Conceptual Model." DDI Alliance. 2004. Accessed on 11 August 2008.
<http://www.icpsr.umich.edu/DDI/committee-info/Concept-Model-WD.pdf>.
14. CONTEXT: BENEFITS OF PLANNING
Saves time
Less reorganization down the road
Increases efficiency
Gathers necessary information for analysis and writing
Prevents problems in understanding data and metadata
Makes it easier to preserve your data
Requirements from some funding agencies and institutions
15. DMP: THE REQUIREMENT
Why?
Increased impact of research money
Reduce redundant data collection
Enhance use and value of existing data
Further scientific research
Language is broad to allow input from research communities
Implementation costs of the DMP CAN be included in direct costs
16. DMP: PRACTICAL TIPS
The gist of it…
Describe what you will do with your data during and after the proposed
project
Ensures data is safe now and in the future
DMP should reflect…
Awareness of data management and curation in your discipline
Feasible plan to utilize available cyberinfrastructure
Try to…
Explain the rationale for your choices
Identify roles for data management and curation activities
17. DMP: ELEMENTS
Types of data
Standards and metadata
Access and sharing
Re-use, re-distribution, and the production of derivatives
Long-term preservation
[Budget]
18. DMP: TYPES OF DATA [1]
Use standards common in your research community
Characterize the data to be generated or used
Types of data?
experimental, observational, raw or derived, models, simulations, curriculum
materials, software, images, audio, video, etc.
What file formats will be used?
Text, spreadsheet, database, etc.
How will it be collected? (describe the process)
How much data?
Will the data be reproducible?
How does the project relate to existing data?
If dataset will be combined, how to ensure interoperability?
19. DMP: TYPES OF DATA [2]
How will data be collected?
How? (tools, instruments, measurements, etc.)
When? (timeframe, series)
Where?
How will data be processed?
Workflows
Software packages
How will the data be stored and managed?
File naming conventions
Version control
20. DMP: TYPES OF DATA [3]
What QA & QC measures will be used?
Identify steps during processing and analysis to eliminate bad data
points
Examples: double data entry, data screening tests
What is the backup and security plan?
Plan for particular security or confidentiality issues
Location & frequency
Roles & responsibilities
Who will carry out data collection, processing, and backup activities?
21. EXAMPLE: TYPES OF DATA
Atmospheric Concentrations of CO2, Mauna Loa Observatory,
Hawaii, 2011-2013
https://www.dataone.org /sites/all/documents/DMP_MaunaLoa_Fo
rmatted.pdf
Arthropod responses to grassland nutrient limitation
https://www.dataone.org /sites/all/documents/DMP_NutNet_Form
atted.pdf
22. DMP: STANDARDS & METADATA [1]
Metadata describes the who, what, when, where, how, why of
the data
Purpose of metadata is to enable finding, organization,
interoperability, identification, archiving & preservation
Standards are commonly agreed upon terms and definitions in a
structured format
23. DMP: STANDARDS & METADATA [2]
Will your datasets be self -explanatory or understandable in
isolation?
Decisions to make about metadata
Relevant standard(s)
Format
Content
What information is needed to use and interpret in 5 years, 25 years?
Ask your fellow researchers and check with data centers or repositories
How are metadata created?
Automatically generated
Manually created
24. EXAMPLE: STANDARDS & METADATA [1]
Atmospheric Concentrations of CO2, Mauna Loa Observatory,
Hawaii, 2011-2013
https://www.dataone.org /sites/all/documents/DMP_MaunaLoa_Fo
rmatted.pdf
Metadata will be comprised of two formats —Contextual
information about the data in a text based document and ISO
19115 standard metadata in an xml file. These two formats for
metadata were chosen to provide a full explanation of the data
(text format) and to ensure compatibility with international
standards (xml format). The standard XML file will be more
complete; the document file will be a human -readable summary of
the XML file.
25. EXAMPLE: STANDARDS & METADATA [2]
R i o G ra n d e H yd rol ogic G e o d atabase C o m p e n di um
htt ps:/ /www. dataone .org /site s /al l/ doc ume nts /D M P_ Hydrol ogic _ Form atte d.pdf
M i c ro s o f t A c c e s s D a ta b a s e fo r ma t w i l l b e u s e d s i n c e i t i s re a d i l y a c c e s s i b l e a n d i t i s
co m p a t i b l e w i t h E S R I A rc G I S ( htt p : / / w w w. e s r i . co m/s o f twa re /a rcg i s /i n d ex . ht ml ), a
G e o g ra p h i c I nfo r m at i o n S y s te m s o f t w a re p a c ka g e u s e d by t h e s ta ke h o l d e rs . N a m i n g
co nv e nt i o n s w i l l b e co n s i s te nt – n o s p a c e s w i l l b e u s e d i n ta b l e n a m e s o r f i e l d n a m e s .
T h e f i l e n a m i n g co nv e nt i o n w i l l co n s i s t o f t h e d a ta s o u rc e _ d a ta t y p e fo r m a t fo r ra w d a ta
f i l e s . D a ta re p o r t i n g f u n c t i o n a l i t y w i l l b e b u i l t i nto t h e V B A p ro c e s s i n g p ro g ra m s to
p ro v i d e o u t p u t i n .t x t f i l e fo r m at fo r n u m b e r o f re co rd s p e r s o u rc e w h e n u p d ata b l e d ata
s o u rc e s a re ref re s h e d .
Ev e r y ef fo r t w i l l b e m a d e to g o b a c k to t h e a u t h o r i ta t i v e s o u rc e fo r a n i d e nt i f i e d d a ta s et .
Q u a l i t y co nt ro l o f t h e d a ta b a s e w i l l b e p e r fo r me d u s i n g S Q L s ta te m e nt s t h a t ca p i ta l i ze o n
t h e d a ta b a s e s t r u c t u re to e n s u re re l a t i o n a l d a ta b a s e i nte g r i t y. A p p ro p r i a te p r i m a r y key s
w i l l b e a s s i g n e d to m a n a g e p o s s i b l e d a ta d u p l i ca te s . Po te nt i a l d u p l i ca te s i te I D s , w i l l b e
h a n d l e d t h ro u g h a u to m a te d p ro c e d u re s a n d t h e c re a t i o n o f a l te r n a te I D ta b l e s .
A d a ta d i c t i o n a r y w i l l b e c re ate d t h a t d ef i n e s t h e ta b l e d ef i n i t i o n , ta b l e f i e l d s , a n d ta b l e
f i e l d d a ta t y p e s . A n e nt i t y - re l at i o n s h i p d i a g ra m w i l l b e c re a te d t h a t d ef i n e s t h e
re l a t i o n a l s t r u c t u re o f t h e d a ta b a s e .
A m eta d a ta re co rd w i l l b e p ro d u c e d u s i n g t h e F G D C s ta n d a rd t h a t d e s c r i b e s t h e e nt i re
g e o d a ta b a s e. T h e F G D C s ta n d a rd w a s c h o s e n d u e to re q u i re d Fe d e ra l g o v e r n m e nt
s t a n d a rd s .
26. DMP: ACCESS & SHARING
What are your obligations for sharing?
Funding agency, institution, other organization, legal, etc.
What are the ethical or legal issues? (i.e., privacy,
confidentiality, security, intellectual property, or other rights)
How will the data be made available?
What is the process for gaining access?
When will the data be made available?
When will the data become available?
For how long will the data be available?
What is the process for gaining access?
Who will have access to the data?
27. DMP: RE-USE, RE-DISTRIBUTION, ETC.
What rights will you retain before data is made available?
Will permission restrictions be necessary?
Limits or conditions for political, commercial, or patent reasons?
Is there an embargo period? Why?
Future users and uses
Who might be interested in the data?
How might you anticipate this data being used?
What value might the data have for these people?
28. EXAMPLE: ACCESS, SHARING, RE-USE
Development of a NanoKlein Calorimeter
http://libguides.unm.edu/content.php?pid=137795&sid=1422879
We expect to apply for a patent for this instrument. All of the
materials submitted as part of the patent process will be a matter
of public record. We will also make technical drawings, test data
and calibration data available through our institutional repository.
Cave Microbiology
http://libguides.unm.edu/content.php?pid=137795&sid=1422879
29. DMP: LONG-TERM PRESERVATION
What data will be preserved?
What transformations are necessary to prepare the data?
How long do you think the data will be useful? How long will the
data be preserved?
Contextual information needed to make the data reusable
metadata, references, reports, manuscripts, grant proposal, etc.
Where will it be preserved?
Links to published materials and other outcomes? Use of persistent
citation?
Procedures for preservation and back-up?
Who will be the contact for the dataset?
30. EXAMPLE: LONG-TERM PRESERVATION [1]
Arthropod responses to grassland nutrient limitation
https://www.dataone.org /sites/all/documents/DMP_NutNet_Form
atted.pdf
We will preserve both arthropod datasets generated during this
project (abundance and stoichiometry) for the long term in the
Digital Conservancy at the U of M. We will include the .csv files,
along with the associated metadata files. We will also submit an
abstract with the datasets that describe their original context and
any potentially relevant project information. Borer will be
responsible for preparing data for long -term preservation and for
updating contact information for investigators.
31. EXAMPLE: LONG-TERM PRESERVATION [2]
Improving the long-term preservability of HDF-formatted data by
creating maps to file contents
https://www.dataone.org /sites/all/documents/DMP_HDFMap_For
matted.pdf
The writer software will be preserved by the HDF Group for the life
of the HDF libraries. The HDF Group uses industrystandard best
practices to ensure the integrity of their software and systems.
Once the map writer has been used to generate maps for every
HDF file in existence, the continued existence of the writer
software is not required. The reader software will be preserved at
SourceForge.org for as long as there is community interest. The
collection of HDF files will be preserved at NSIDC as long as utility
is deemed high.
32. IMPLEMENTING YOUR PLAN [1]
The DMP is a working document
NSF expects progress to be reported
Incorporate implementation into the project startup process
C&G, IRB, IACUC all have to be in place before data collection can begin
Review, revise, and set up your system during startup
Good documentation ensures…
A shared understanding of the data throughout a project
That future researchers will be able to understand data within the
relevant context
That re-users of data are able to interpret the data appropriately
Resources for backing up data during a project
Research File System: http://pti.iu.edu/storage/rfs
Scholarly Data Archive: http://pti.iu.edu/storage/sda
33. IMPLEMENTING YOUR PLAN [2]
Program of Digital Scholarship: http://ulib.iupui.edu/digitalscholarship
Center for Research & Learning: http://crl.iupui.edu/
OVCR: http://research.iupui.edu/development/
Office of Academic Affairs: http://www.academicaffairs.iupui.edu
Intellectual Property Policy: https://www.indiana.edu/~vpfaa/
academicguide/index.php/Policy_I-11
Research File System: http://pti.iu.edu/storage/rfs
Scholarly Data Archive: http://pti.iu.edu/storage/sda
Research Technologies, UITS: http://uits.iu.edu/page/avel
Core Ser vices, UITS: http://pti.iu.edu/cs
Scholarly Cyberinfrastructure, UITS: http://uits.iu.edu/page/amee
C TSI Tools: http://www.indianactsi.org /rct (Alfresco Share, REDCap )
IUWare: https://iuware.iu.edu
IUanyWare: https://iuanyware.iu.edu/vpn/index.html
StatMath: http://www.indiana.edu/~statmath/
Statistics Consulting Center: http://www.math.iupui.edu/asci/
34. RESOURCES [1]
Data Services Program site:
http://ulib.iupui.edu/digitalscholarship/
dataservices.html
National Science Board, Digital Research Data Sharing &
Management, 2012 (pre-publication):
http://www.nsf.gov/nsb/publications/2011/nsb1124.pdf
National Institutes of Health, Data Sharing Policy
http://grants.nih.gov/grants/policy/data_sharing /data_sharing_gui
dance.htm
NIH Public Access Policy Implications
http://publicaccess.nih.gov/public_access_policy_implications_20
12.pdf
IU New Employee Compliance Orientation (NECO)
http://researchadmin.iu.edu/EO/eo_sessions.html
35. RESOURCES [2]
UK Data Archive: Managing & Sharing Data Brochure:
http://www.data-archive.ac.uk/media/2894/managingsharing.pdf
UK Data Archive Costing Tool:
http://www.data-archive.ac.uk/media/257647/
ukda_jiscdmcosting.pdf
Creative Commons Licenses & Data:
http://wiki.creativecommons.org /Data
Licensing Research Data, Digital Curation Centre
http://www.dcc.ac.uk/resources/how -guides/license-research-data
CIC Author Addendum
http://www.cic.net/authors
DMPTool: https://dmp.cdlib.org /
DMPOnline: https://dmponline.dcc.ac.uk/
36. COMPELLING CASES FOR OPEN DATA
Tim Berners-Lee: http://www.ted.com/talks/tim_berners_lee_
on_the_next_web.html
Open-source cancer research: http://www.ted.com/talks/
jay_bradner_open_source_cancer_research.html
Polymath problem blogs:
http://polymathprojects.org /about/
http://stevekochscience.blogspot.com/2011/02/open -data-
success-story.html
http://eaves.ca/2011/09/07/the -economics-of-open-data-mini-
case-transit-data-translink/
37. REFERENCES
1. Higgins, S. ( nd). What are metadata standards. http://ww w.dcc.ac.uk/
resources/bri efing -papers/standards -watch-papers/what -are- metadata -
standards
2. Digital Curation Centre. ( nd). DCC Charter and Statement of Principles.
Retrieved from http://ww w.dcc.ac.uk/about -us/dcc- charter.
3. Indiana Universit y. (2011). Indiana Universit y ’s Advanced
Cyberinf rast ructure. Retri eved from
http://pti.iu.edu/cyberinf rast ructure.pdf.
4. Indiana Universit y. (2009). Empowering Peopl e: Indiana Universit y ’s
Strategic Plan for Information Technology. Retrieved from
http://ovpit.iu. edu/st rategic2/ .
5. National Science Foundati on. (2011 ). Award and Administration Guide:
Chapter IV C.4., Disseminati on and Sharing of Research Results. Ret ri eved
from
http://ww w.nsf. gov/pubs/policydocs/pappguide/nsf 1 1001/aag_6. jsp#VI D4 .
6. Lawrence, S., Free online availability substantially increases a paper ’s
impact, Nature, 31 May 2001. http://ww w.nat ure. com/nature/debates/e -
access/Articles/lawrence.html (accessed November 5, 2008,)
7. Lewis, David W. "Librar y budgets, open access, and the future of scholarl y
communication: Transformati ons in academic publishing." C&RL News, May
2008, Vol. 69, No. 5. [Available at:
http://ww w.ala.org /ala/mgrps/di vs/acrl/publicati ons/crlnews/
2008/may/ALA_print _layout _1_ 47113 9_471 139. cf m ]
38. THANK YOU
Tell us what you think, take a brief survey.
Find us @
http://ulib.iupui.edu/digitalscholarship
Heather Coates, hcoates@iupui.edu, 317-278-7125
Kristi Palmer, klpalmer@iupui.edu, 317-274-8230
39. EXTRA: NIH DATA SHARING POLICY
$500,000 or more in direct costs in any year of the proposed
research
Final research data, not summary statistics or tables, not underlying
pathology reports and other clinical source documents, might
include both raw data and derived variables
If an application describes a data -sharing plan, NIH expects that
plan to be enacted.
NIH expects the timely release and sharing of data to be no later
than the acceptance for publication of the main findings from the
final dataset.
It is the responsibility of the investigators, their Institutional
Review Board (IRB), and their institution to protect the rights of
subjects and the confidentiality of the data. Prior to sharing, data
should be redacted to strip all identifiers, and effective strategies
should be adopted to minimize risks of unauthorized disclosure of
personal identifiers.
40. EXTRA: NIH DATA SHARING PLAN
describe briefly the expected schedule for data sharing
the format of the final dataset
the documentation to be provided
whether or not any analytic tools also will be provided
whether or not a data -sharing agreement will be required
if so, a brief description of such an agreement (including the criteria for
deciding who can receive the data and whether or not any conditions
will be placed on their use)
mode of data sharing (e.g., under their own auspices by mailing
a disk or posting data on their institutional or personal website,
through a data archive or enclave)
Applicants may request funds in their application for data
sharing.
Notas del editor
Housekeeping: hold questions until the end, make sure everyone has handoutsResources: SlidesDSP Guide to NSF DMPNSF Policy language handoutCIC Author Addendum
We’re going to spend the majority of our time today walking through each section of the DMP, but there are some basic things you need to know first.
The main reason people are talking about data management and curation right now are the funding agency requirements. These came about within the context of broader discussions about scholarly communications in the science, so we’ll quickly review that discussionbefore getting into the practical steps of developing a DMP.1) We want to prepare you to engage in discussions about scholarly communication, specifically open access and data management and sharing.2) We will provide information so that you are making informed decisions regarding copyright, IP, patent, and other issues when it comes to choosing how your research is disseminated, who has rights to it, and who can access it.There are many other compelling reasons to plan for curating and/or sharing your data. The good news is that data sharing can boost the scholarly impact of your data and research in general, which is always good for promotion and tenure. -exposure increased citation for scholarly works; people are working out ways to cite datasets as well-collaborations funders are increasingly looking for interdisciplinary and multi-institution collaborationsHowever, the benefits of digital data come with costs. Unlike with physical specimens or paper-based data, we can’t assume that we’ll be able to access and use digital data in 5, 10, or 50 years. We need to plan, manage, and preserve valuable digital data so that the scientific record isn’t lost. Essentially, if we can’t find it, it didn’t happen. These issues of persistent access and long-term preservation are challenges that libraries have been solving for a very long time.
Some people wonder why the library is taking on this challenge of helping researchers to manage and preserve their data. There are several good reasons.-every college or university has a library-our place within IUPUI facilitates collaboration; we have existing relationships with each department; these collaborations are another way to build capacity for data curation, making use of resources that are already available-libraries and librarians have been caring for information in many formats for thousands of years; while the formats change more rapidly these days, our core principles remain the sameLibrarians & librarieshave been preserving information in various forms for a very long time. Other campus units can help you with your research, but have a different focus, such as compliance with human subjects or animal use guidelines, contracts and grants, bioethics, etc. I’ll provide information about these resourcesat the end.
The Data Services Program is part of the University Library’s Program of Digital Scholarship. The Data Services Program offers workshops and consultations for developing an NSF data management plan as well as data management and curation in general. In addition, we have established a data repository for IUPUI research. The repository is one of many tools available to your for preserving and sharing your research data.On our website, we’ve provided links to :Sample NSF DMP from other institutionsvarious toolsGuidance from institutions like the ICPSR and Digital Curation Centre (UK)Significant publications discussing data management and curation
I want to clarify some terms so we’re all on the same page. Data management is largely seen as the purview of scientists and biostatisticians since it varies by research community and discipline. Data sharing is not an all-or-none proposition. It encompasses a wide spectrum of activities ranging from open data publishing on the internet without restriction to controlled access by pre-defined partners or collaboratorsData citation is a concept similar to citation of scholarly publications and refers to mechanisms that allowseasy reuse and verification of data (DataCite);the impact of data to be tracked (DataCite);And creates a scholarly structure that recognizes and rewards data producers (DataCite)
As I said earlier, these policies came about as a result of broader conversations about scholarly communication. In case you aren’t familiar with the term, it refers to the processes by which we produce and disseminate information relating to teaching, research, and other scholarly activities. Our goal is to provide you with the information necessary to engage in these discussions within IU and your research communities so that you are making informed decisions about how your research gets out there, who retains rights, and who can access it.The NSF policy is not a radical change that is likely to go away. These policies illustrate slow progress towards increasing public access to research results and products and greater awareness of data management challenges.http://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp
IU as an institution has been engaged in discussions about scholarly communications for several years and have voiced their commitment to these issues by including data management and curation in the IT Strategic Plan.
The open access conversation is focused on the dissemination of research products like peer-reviewed articles and books at the end of the research life cycle, whereas data management planning is most effective when it’s initiated before data collection begins and implemented throughout the research life cycle.
Important to know that the language was crafted to:Allow the research community to shape the implementationRole for communities of practice to develop relevant best practicesThe budget allocations and narrative should tell a cohesive story; if you identify big challenges in data storage and preservation, but do not allocate funds to address these challenges, it will likely raise a red flag for the review committee.
Ultimately, this document should demonstrate that you are aware of data management and preservation issues in general, more specifically the relevant practices within your research community or discipline, and that you have thought through how these affect your proposed project. As you develop each section of your DMP, it’s important to do two things: Explain your reasoning it could just be that it’s a standard practice in your field/communityIdentify roles for data management and curation activities think about who on your team or in another campus unit will carry out the activities described; this section should identify who will be carrying out the major elements of your plan. This may include the PI, staff, students, external contractors, institutional IT, the library, and external data repositories.The plans proposed should be feasible – for you and for us.
If you look at the guide I’ve provided, you’ll see these topics are broken down into a variety of specific questions to address. We’ll go through each section in more detail.It may be helpful to begin your DMP with a few sentences describing the research project in general, to provide context for the detailed information in each section.
In this first section, you want to describe two things: the data you will generate or use and the documentation you will create to facilitate data management and curation.
In addition to describing your plan in the DMP, data collection and processing activities should be described in working documents throughout the life of the project. Research methods, even within a single lab, change over time. Creating data documentation is easiest and most efficient at the beginning of a project. Good documentation ensures 3 things: a shared understanding of the data throughout a project; that future researchers will be able to understand data within the context they were created;that re-users of data are able to interpret the data appropriately. You don’t need to spend a lot of time or space describing the planned documentation, but it is worthwhile to mention what format it will take and who will be responsible for creating and maintaining it. This documentation is often deposited along with the data for preservation and sharing.Good documentation can facilitate efficient data collection and processing and preserve data integrity.
Data screening tests: histograms, boxplots, Z-scores, etc.
Who created the data?What is the content of the data?When were the data created?Where is it geographically?How were the data developed?Why were the data developed?
Ask yourself are your data self-explanatory? Consider it from the perspective of a typical reader of a journal you publish in or a colleague who might be interested in collaborating. The answer to this question is no; the solution is good documentation and metadata. More frequently, the people analyzing the data are not those who collected it. Metadata and good data documentation facilitatestronger understanding of the data: quality and appropriate useThere are a lot of standards out there; the best approach to determining which to use is to see what others in your discipline or research community are doing. Another option, if you know you will be depositing your data in a particular repository, is to ask them what their requirements or recommendations are. Interdisciplinary and longitudinal studies should think carefully about how their data will be used across multiple disciplines and the potential for re-use. You may want to consider standards that are well-supported and established over specialized standards that may complicate re-use and analysis in future.
Let’s take a look at the handout with the NSF policy language. Again, the language is broad and allows for practices to vary by research community. As you can see from the policy, data dissemination and sharing does not refer to publishing in scholarly journals. Also, the requirements vary by Directorate, so be sure to check to see if your Directorate has different expectations.In this section, you should define what you will share, how, and the procedures for access. If you plan to use a specific data repository, they can help you develop this section; likely, they will have standard processes in place.Acceptable practices for data sharing vary by discipline; some have very mature data repositories while others rely on informal channels. Best practices for persistent access indicate more permanent and secure mechanisms than a faculty or department website. The solution at IUPUI is our data repository (IUPUIDataWorks).In terms of the access procedures, you want to think about what mechanism will be used for requests, whether registration and authentication are necessary, and what information you want to keep for your own records about those who request and receive your data. This can be useful information to demonstrate the value and impact of your research.Data sharing encompasses a wide spectrum of activities. Even if you are part of a community in which data sharing is not common practice, I urge you to think about what data might be shared or re-used without compromising your intellectual property or competitiveness. You may have older data on which you’ve completed analysis. This data might be useful to students or beginning researchers in your field or here at IUPUI. We can help you figure out how to share your data securely.
This section relates back to the access and sharing information, but should focus on policies and permissions for re-use, re-distribution, and production of derivatives works as opposed to the mechanisms described in the previous section. It’s possible to protect your ability to use the data for ongoing analysis while sharing as much of it as possible with your research community and the general public. While you can’t plan for every case, it is useful to imagine who might be interested in the data, how it might be used, and set up a process for handling those cases. Depending on where you decide to deposit your data, this could be very formalized or relatively informal.If you decide to share your data through a repository, often there are mechanisms built in for applyingCreative Commons licenses. This is true for our data repository as well.
Here, you should build on the information you’ve outlined in previous sections to describe your long-term preservation strategy. A key component of your plan is the description of the cyberinfrastructure available to you and how you will use it to carry out your plan as a responsible data steward. Although your lab may be equipped to store and maintain the data for a project while it’s active, you may not have the capacity to make sure the data is preserved once the project is complete and your lab resources are dedicated to new endeavors. Neither IU nor NSF want to see scientific data lost and are investing significant effort and resources in maintaining the scientific record.This is an opportunity for you to discuss with us or an external data repository in your discipline, the long-term plan for keeping your data safe. If you are completely unsure how to approach this, feel free to contact the DSP for support. We can help you develop a feasible and appropriate preservation strategy that relies on existing services and infrastructure, whether at IU or elsewhere. These are activities that the Library specifically is invested in and equipped to do; our focus is on long-term preservation, curation, and access. What this means will likely vary by dataset, project, and lab; we’re happy to think this through with you to develop a plan that will meet the needs.
There are a wealth of resources at IU to help you with your research. These are just a few of those relevant to data management and curation.