This document provides guidance on how to write a data management plan (DMP). It discusses what a DMP is, why researchers should care about data management, and where data management fits into the research cycle. It also covers the key components of a successful DMP, including a data inventory, a strategy for describing the data, a plan for long-term data preservation, and methods for making the data accessible. The document provides examples and exercises to help researchers develop the sections of a DMP for their own research projects.
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Datat and donuts: how to write a data management plan
1. How to write
a data
management
plan
C. Tobin Magle, PhD
Sept 25, 2017
10:00-11:30 a.m.
Morgan Library Computer
Classroom 175
*inspired by content from CU
Boulder research computing
2. What is data
management?
The policies, practices and procedures needed to
manage the storage, access and preservation of data
produced from a research project
4. Why should I care about data management?
Rinehart, AK. “Getting emotional about data” College & Research Libraries News September 2015 vol. 76 no.
8 437-440
28. What is research data?
• “The recorded factual material
commonly accepted in the
scientific community as
necessary to validate research
findings”
- White House Office of
Management and Budget
• Reality: anything that is a
(digital) product or your
research
29. What is a data
management plan?
A description of how you plan to describe, preserve
and share your research data.
Often required by funding agencies
30. Successful DMPs include
• A data inventory, including type(s) and size
• A strategy for describing the data
• A plan for preserving the data long term
• A method for access to the data
Always make sure to follow funder requirements
31. Data inventory
• What type of data are you going to collect?
• What file type will be produced?
• What size will these files be? How many files?
• What other research outputs will be produced?
• Code/Software?
• Templates/protocols?
32. Example
miRNA sequences
FASTQ files
1 GB per file
x 64 strains
x 3 replicates
-------------------
~200 GB
R scripts for
analysis and
visualization
Data use tutorials
• What type of data are you going to collect?
• What file type will be produced?
• What size will these files be? How many files?
• What other research outputs will be produced?
• Code/Software?
• Templates/protocols?
33. Data formats
• Avoid proprietary formats
• Know what software can read your data
Proprietary Format Alternative Format
Excel (.xls, .xlsx) Comma Separated Values (.csv)
Word (.doc, .docx) plain text (.txt)
PowerPoint (.ppt, .pptx) PDF/A (.pdf)
Photoshop (.psd) TIFF (.tif, .tiff)
Quicktime (.mov) MPEG-4 (.mp4)
MPEG 4 Protected audio (.m4p) MP3 (.mp3)
34. Exercise: Data Inventory
What kind of data are you going to collect?
What file type will be produced?
What size will these files be? How many files?
What other research outputs will be produced?
35. A strategy for describing the data
• Metadata: Relevant information
for re-creation and re-use
• Contact info
• How data was collected
• Details about collection
• Date, location of collection
• Units
• Can be as simple as a text file
36. Genomics example (README)
This project contains next-generation miRNA sequencing data from 64 mouse strains.
Brain tissue from 10 week old male mice were harvested, stored in RNA later. RNA was
extracted using an RNeasy kit, and miRNA libraries were produced using an Illumina kit.
They were run on an Illumina mySeq sequencer. The FASTQ Files produced were analyzed
in R using Bioconductor.
The data and descriptive will be made available on NCBI in the bioproject (PRJXXXX). The
scripts used to analyzed the data are available on github (URL). Tutorials for data use will
be made available in the Digital Collections of Colorado (handle).
Contact Tobin Magle (tobin.magle@colostate.edu) for more information.
http://orcid.org/0000-0003-3185-7034
37. Metadata standards
• Dublin Core: http://dublincore.org/documents/dcmi-terms/
• Can be applied to anything
• Many discipline specific metadata standards
• EML: https://knb.ecoinformatics.org/#external//emlparser/docs/index.html
• MIAME: http://fged.org/projects/miame/
• Search for other standards:
• http://www.dcc.ac.uk/resources/metadata-standards
• https://fairsharing.org/standards/
39. Exercise: Describe your data
What do people need to know to reuse your data?
Are there any discipline-specific metadata standards?
What format will you describe your data in (text, XML, tabular)?
What fields will you include (author, date, format, identifier?)
40. A plan for preserving the data long term
• What will you do to ensure
data are properly stored and
preserved?
• Include metadata and other
products needed for reuse
• Short vs long term
41. Recommendations for backing up data
• Store in geographically distinct
locations
• Automation: Will you remember to do it
manually?
• Security: Are you working with PHI?
42. Preservation questions
• What will you store?
• Who will be in charge?
• How long will you store it?
• Where will you store it?
• Multiple copies
43. Exercise: Preservation plan
What will you store?
Who will be responsible for the data (person or position)?
How long will you store it?
Where will you store it?
How will you back it up?
44. A method to access the data
• Important to funding agencies
• Reproduce existing research
• Promote further research
• Must be easily available:
• No “by request only”
• Embargoes are “ok”
• Data security: consider privacy
and IP issues before sharing
45. Data access and sharing best practices
• Non-proprietary formats
• Include metadata
• Proper storage
• Stable identifier
• Licensing: conditions for reuse
46. Trusted Repositories: store and share
• Discipline specific repositories
• Search:
http://service.re3data.org/browse/by-
subject/
• Generic:
• Figshare - https://figshare.com/
• Dryad - http://datadryad.org/
• CSU Digital Repository:
• http://lib.colostate.edu/digital-collections/ http://67.media.tumblr.com/6228cbe58a9652f1a85e8a
b1ed08d715/tumblr_inline_n6oukhNlZW1qf11bs.png
47. Data archiving service
• Finished products for
sharing
• CSU Digital Repository
• Over 100 Datasets
• Satisfy requirements for
manuscripts and grants
• At no cost <1 TB
• $150/TB for 5 years
• $300/TB for >5 years
48. Stable identifiers
• URLs break
• Stable identifiers are
permanent in a database
• Some provide linking
capabilities
• DOI –
https://doi.org/10.1109/5.771073
• Handle-
http://hdl.handle.net/10217/177356
49. Licensing
• State your conditions for reuse
• Paper citation?
• Disclaimers
• Must justify limitations, describe
how you’ll advertise them
• Creative common licenses are a
good starting point
50. Exercise: Access methods
Where will people be able to access the data?
Does your discipline have a repository?
What kind of stable identifier will it have?
What are the conditions for reuse?
Are there any limitations to use of these data? Why?
51. DMPTool
• Review requirements from
different agencies
• https://dmptool.org/guidance
• Create new DMPs based on
funding agency templates
• Search public DMPs
52. Need help?
• Email: tobin.magle@colostate.edu
• DMPTool: http://dmptool.org/
• Data Management Services website:
http://lib.colostate.edu/services/data-management
Notas del editor
You already care deeply about your data
It’s your IP
But…
There are external pressures that make thinking about how to preserve research data more pressing
The number of PhDs is growing, hence….
Despite a steady increase in the number of PhDs, research funding is more or less flat