2. The Digital Curation Centre (DCC)
• UK national centre of expertise in digital preservation and data
management, established 2004
• Principal audience is the UK higher education sector, but we
increasingly work further afield (continental Europe, North
America, South Africa, Asia…)
• Provide guidance, training, tools (e.g. DMPonline) and other
services on all aspects of research data management and Open
Science
• Now offering tailored consultancy/training
• Organise national and international events and webinars
(International Digital Curation Conference, Research Data
Management Forum)
6. The old way of doing research
1. Researcher collects data (information)
3. Researcher writes paper based on data
4. Paper is published (and preserved)
5. Data is left to benign neglect,
and eventually ceases to be
accessible
2. Researcher interprets/synthesises data
7. Without intervention, data + time = no data
Vines et al. “examined the availability of data from 516 studies between 2 and 22
years old”
- The odds of a data set being reported as extant fell by 17% per year
- Broken e-mails and obsolete storage devices were the main obstacles to data sharing
- Policies mandating data archiving at publication are clearly needed
“The current system of leaving data with authors means that almost all of it is lost over time,
unavailable for validation of the original results or to use for entirely new purposes”
according to Timothy Vines, one of the researchers. This underscores the need for intentional
management of data from all disciplines and opened our conversation on potential roles for
librarians in this arena.(“80 Percent of Scientific Data Gone in 20 Years” HNGN, Dec. 20,
2013, http://www.hngn.com/articles/20083/20131220/80-percent-of-scientific-data-gone-in-
20-years.htm.)
Vines et al., The Availability of Research Data Declines Rapidly with Article Age,
Current Biology (2014), http://dx.doi.org/10.1016/j.cub.2013.11.014
8. Baker, M. (2016)
“1,500 scientists
lift the lid on
reproducibility”,
Nature,
533:7604,
http://www.nat
ure.com/news/1
-500-scientists-
lift-the-lid-on-
reproducibility-
1.19970
9. (Aside: from data to research objects?)
• ‘Research object’ is a term that is gaining in popularity, not
least in the humanities where the relevance of the term ‘data’
is not always recognised…
• Research objects can comprise any supporting material which
underpins or otherwise enriches the (written) outputs of
research
• Data (numeric, written, audiovisual….)
• Software code and algorithms
• Workflows and methodologies
• Slides, logs, lab books, sketchbooks, notebooks, etc
• See http://www.researchobject.org/ for more info
10. The new way of doing research
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
DEPOSIT
…and
RE-USE
The DataONE
lifecycle model
11. N.B. other models are available…
Ellyn Montgomery, US Geological Survey
12. Data sharing isn’t entirely new…
from Philosophical
Transactions of
the Royal Society,
(MDCCCLXI) (or
1861 if you’d
prefer)
13. …but what’s “normal” is shifting
Data management is a part of good research practice.
- RCUK Policy and Code of Conduct on the Governance of Good Research Conduct
14. The benefits of Open / managed data
• SPEED: The research process becomes faster
• EFFICIENCY: Data collection can be funded once, and used many
times for a variety of purposes
• ACCESSIBILITY: Interested third parties can (where appropriate)
access and build upon publicly-funded research resources with
minimal barriers to access
• IMPACT and LONGEVITY: Open publications and data receive more
citations, over longer periods (see for example recent DCC/SPARC-
Europe paper, “The Open Data Citation Advantage”)
• TRANSPARENCY and QUALITY: The evidence that underpins research
can be made open for anyone to scrutinise, and attempt to replicate
findings. This leads to a more robust scholarly record
• SECURITY: Not all data should be made available to everyone. Careful
management reduces the risk of inappropriate disclosure.
16. Open and/or Managed?
• Taking a managed and planned approach to research is not the same as
making everything open to everyone
• The purpose of research data management is twofold:
• To ensure that data remains accessible and understandable; or
• To ensure that data is not accessible or understandable (in its raw state, by the
wrong people, or at the wrong time)
• Which of these pertains will depend on the nature of the research. It is
increasingly expected that publications and data (and software,
algorithms, workflows etc) will be made Open by default, unless…
• There is an ethical reason to restrict access
• There is a public safety reason to restrict access
• There is a commercial or contractual reason to restrict access
• In some cases, data can be made partially-open (i.e. anonymised,
aggregated or redacted) in order to protect these interests
20. The FAIR Data Principles (0/4)
One of the grand challenges of data-intensive science is
to facilitate knowledge discovery by assisting humans
and machines in their discovery of, access to,
integration and analysis of, task-appropriate scientific
data and their associated algorithms and workflows.
FAIR is a set of guiding principles to make data
• Findable
• Accessible
• Interoperable, and
• Re-usable
21. The FAIR Data Principles (1/4)
To be Findable:
F1. (meta)data are assigned a globally unique and
eternally persistent identifier.
F2. data are described with rich metadata.
F3. (meta)data are registered or indexed in a
searchable resource.
F4. metadata specify the data identifier.
22. The FAIR Data Principles (2/4)
To be Accessible:
A1. (meta)data are retrievable by their
identifier using a standardized communications
protocol.
A1.1. the protocol is open, free, and universally
implementable.
A1.2. the protocol allows for an authentication
and authorization procedure, where necessary.
A2. metadata are accessible, even when the data
are no longer available.
23. The FAIR Data Principles (3/4)
To be Interoperable:
I1. (meta)data use a formal, accessible, shared,
and broadly applicable language for knowledge
representation.
I2. (meta)data use vocabularies that follow FAIR
principles.
I3. (meta)data include qualified references to
other (meta)data.
24. The FAIR Data Principles (4/4)
To be Re-usable:
R1. meta(data) have a plurality of accurate and
relevant attributes.
R1.1. (meta)data are released with a clear and
accessible data usage license.
R1.2. (meta)data are associated with
their provenance.
R1.3. (meta)data meet domain-relevant
community standards
26. FAIR in practice: European data policy
• The EC is currently midway through an extended pilot for Horizon
2020. Other projects can participate voluntarily, and opting in has
been more popular than opting out
• The pilot applies as minimum to research data underlying
publications, plus any other data as decided by the project
• Participants must:
• Create and maintain a DMP as a project deliverable
• Deposit data in a repository
• Make it possible for others to access, mine, exploit and reuse the data
• Share information on the tools needed
…unless there are compelling reasons not to do so.
(And these reasons should be recorded in the DMP.)
“As open as possible, as closed as necessary”
27. Horizon 2020 – extended pilot (i)
The DMP should include information on:
• the handling of research data during and after the
end of the project
• what data will be collected, processed and/or
generated
• which methodologies and standards will be applied
• whether data will be shared/made open access, and
• how data will be curated and preserved (including
after the end of the project)
28. Horizon 2020 – extended pilot (ii)
• Once project funding is approved and gets underway, the
first version of the DMP is submitted (as a deliverable)
within the first 6 months
• The EC provides a template (in the Guidelines), use of which
is recommended but voluntary
• The DMP needs to be updated over the course of the
project whenever significant changes arise (e.g. new
datasets created; changes in consortium policies; changes in
consortium members, etc.)
• DMP should be updated for each periodic evaluation/
assessment of the project, and at minimum in time for the
final review.
30. Making your data FAIR, step-by-step
1. Understand your funder’s policies (e.g. the EC Guidelines)
2. Create a data management plan (e.g. with DMPonline)
3. Decide which data to preserve using the DCC How-To guide and
checklist, “Five Steps to Decide what Data to Keep”
4. Identify a long-term home for your data (e.g. via re3data.org)
5. Link your data to your publications with a persistent identifier
(e.g. via DataCite)
• N.B. Many repositories will do this for you
6. Investigate infrastructure services and resources, e.g. EUDAT,
OpenAIRE, FOSTER, etc…
34. Thank you: any questions?
• For more information about the DCC:
• Website: www.dcc.ac.uk
• Director: Kevin Ashley
(kevin.ashley@ed.ac.uk)
• General enquiries: Alex Delipalta
(alexandra.delipalta@ed.ac.uk)
• Twitter: @digitalcuration
• My contact details:
• Email: martin.donnelly@ed.ac.uk
• Twitter: @mkdDCC
• Slideshare:
http://www.slideshare.net/martindonnelly
This work is licensed
under the Creative
Commons Attribution 2.5
UK: Scotland License.