Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving Research Landscape
1. PIDs, Data and Software:
How libraries can support researchers in an
evolving research landscape
Sarah Stewart, The British Library
M25 Consortium CPD25 Event – The Role of the Library in
Supporting Research. London Mathematical Society, June 28, 2018
2. www.bl.uk
Outline
• The Evolving (Digital) Research Landscape…
• Data
• Software
• PIDs
• Developing Research Support Services for Data
and Software
• Conclusion and Questions
2
4. www.bl.uk
An Evolving Research Landscape…
• Research is ‘always already’ digital, and becoming
increasingly linked and networked
• Open Research – Fosters transparency, validity and
reproducibility of research
• Strong mandates in the UK from funders (E.g. UKRI,
Wellcome) to make data open.
• increasingly, push from publishers to make ‘non-
traditional’ outputs such as data available on-line
• A role for Linked Open Data (LOD)?
4
6. www.bl.uk
Data and the Digital Research Landscape
• Data as a research output (=credit and impact for
researchers!)
• Emergence of data journals, data repositories, global
data-sharing initiatives, scientific working committees
• Mandate from funders to make research data available
for 10+ years – digital preservation
• Force11 (2016): Make data FAIR – Findable, Accessible,
Interoperable and Re-Useable
• Data Management Plans as part of applications to
funders (e.g. UKRI, Wellcome)
6
7. www.bl.uk
The Importance of Research Data Management…
“In their parents' attic, in boxes in the garage, or stored on now-defunct
floppy disks — these are just some of the inaccessible places in which
scientists have admitted to keeping their old research data.”
http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-
1.14416
8. www.bl.uk
Funder requirements…
“Publicly funded research data are a public good,
produced in the public interest, which should be made
openly available with as few restrictions as
possible…”
RCUK Common Principles on Data Policy
10. www.bl.uk
What are Data?
• Many formats, volumes, types, ranging
from physical specimens and archival
material to petabytes of high-throughput
automated measurements or
simulations
• Language of data is taken from the
STEM disciplines, but data also exists
for the arts and humanities
• Need a way to describe (to make
discoverable/findable), store, preserve
and ensure access, sharing, and re-use
if this is possible (it may not be!)
10
11. www.bl.uk
UKRI Definition of Data
“Research data are the evidence that underpins the answer to the
research question, and can be used to validate findings regardless
of its form (e.g. print, digital, or physical). These might be quantitative
information or qualitative statements collected by researchers in the
course of their work by experimentation, observation, modelling, interview
or other methods, or information derived from existing evidence. Data may
be raw or primary (e.g. direct from measurement or collection) or derived
from primary data for subsequent analysis or interpretation (e.g. cleaned
up or as an extract from a larger data set), or derived from existing
sources where the rights may be held by others….The primary purpose
of research data is to provide the information necessary to support
or validate a research project's observations, findings or outputs.”
– UKRI Concordat on Open Research Data, (2016)
https://www.ukri.org/files/legacy/documents/concordatonopenresearchdata-pdf/
13. www.bl.uk
Software: What do I do with it?
• Lots of emphasis on ‘data’ management, but software in
research is often neglected.
• Software is sensitive to changes in its ‘environment’
• There is a lot of variation inherent in software (languages,
versions, licensing, etc.)
14. www.bl.uk
Software as ‘Data’
• ‘Software is used to create, interpret, present, manipulate and
manage data’ (Software Sustainability Institute)
• Data: ‘recorded factual material commonly retained by and
accepted…as necessary to validate research findings’
(EPSRC)
• Software = Data!
16. www.bl.uk
Software should be preserved if:
• Software can’t be separated from the data or digital object.
• Software is classified as a research output
• Software has intrinsic value
• More resources available at the Software Sustainability
Institute:
https://www.software.ac.uk/software-sustainability-institute
17. www.bl.uk
Treat software as valuable research output
PyRDM Green Shoots project
Zenodo integrates with GitHub
College survey on distributed version control
Software Sustainability Institute – I a fellow
19. www.bl.uk
Why Use Persistent Identifiers?
• Use of persistent identifiers has
increased as scholarly
communications become
increasingly digital.
• ORCIDs and DOIs support open
science through supporting
interoperability in research
infrastructures.
• For instance, DataCite,
CrossRef can use DOIs and
ORCID iDs in addition to other
metadata to map and link
documents, data and
researchers.
21. www.bl.uk
What is an ORCID iD?
• ‘Open Researcher & Contributor ID’
• Developed by ORCID, a non-profit community-owned organisation
• Provides a solution to name ambiguity in research and scholarly
communications
• Unique, persistent identifier for you as a researcher/academic.
Linked to your name, rather than to your institution
• Can be applied to your research outputs to identify, validate and
confirm your authorship
• Can be used to track research outputs
24. www.bl.uk
DataCite (and DataCite UK)
• DataCite is a leading global non-profit organisation
that provides persistent identifiers (DOIs) for research
data. Our goal is to help the research community
locate, identify, and cite research data with
confidence.
• Supports the creation and allocation of DOIs and
accompanying metadata.
• Provides services that support the enhanced search
and discovery of research content.
• Promotes data citation and advocacy through our
community-building efforts and responsive
communication and outreach materials.
• DataCite UK is the UK’s national hub for the provision
of persistent identifiers (DOIs) for research data.
24
25. www.bl.uk
DOIs (Digital Object Identifiers)
• Persistent identifier used to uniquely identify objects (datasets,
software, journal articles, theses), standardised by the
International Standards Organisation (ISO)
• Presented as an alphanumeric code consisting of a prefix and
suffix separated by a slash ‘/’ . The ‘10’ at the start of the DOI
positions the DOI within DOI namespace. E.g.
10.1037/rmh0000008
• Uses a ‘handle’ system in which a DOI is ‘resolvable’ through
binding metadata (such as a URL) to the specific DOI that
describes it.
• DOI is persistent, so it is the publisher’s responsibility to
update the metadata attached to the DOI, otherwise, the DOI
will resolve to a dead link.
25
27. www.bl.uk
DOIs and FAIR Data
• DOIs ensure that data (and metadata about that data) are
preserved for the long-term
• Can be searched for and made discoverable and findable
(through DataCite and CrossRef, Google search, re3Data)
• Access and re-use conditions can be clarified. If the data
cannot be made open, the metadata can explicitly state the
terms and conditions of access.
27
32. www.bl.uk
Why spend time on RDM?
• It is not a distraction from ‘real work’.
• You can work effectively and efficiently.
• Save time and reduce frustration in the future.
• Set systems that work for you.
33. www.bl.uk
Engaging Directly with Researchers
• Embedded approach – meet with researchers in situ – in
their labs and offices
• One-on-one or group meetings
• Departmental meetings to inform on policy changes and
updates and provide insight into best practice.
34. Outreach – Love Your Data!
• PhD Training on RDM Basics and DMPOnline (including PhD-specific
DMPOnline template)
• RDM ‘Drop-in Clinics’
• RDM ‘Byte-Size’ sessions – informal sessions on various topics
• Imperial Data Circus
• Open Access Road Show
35. www.bl.uk
Findings from Imperial College RDM Policy
Development
• 60-100% of grant required to re-generate data used in
publications
• % of data that needs retaining to support publications: ~60%
• Data storage capacity will have to grow significantly
• Concerns around back-up and archiving, esp. considering data
volume
• Popularity of cloud services (as opposed to College storage)
Researchers want self-administered, secure, responsive
solution
for data sharing, storing and archiving; open APIs preferred
•(“Yes [storage] is really important. Basically, whenever we have been
out to talk to researchers, that's the thing they have latched on to and
want to talk about the most.” 10.1371/journal.pone.0114734)
38. www.bl.uk
The Library Supporting Researchers:
Infrastructure
• Consider workflows for research data
• Assist in the development of research data management plans
(use DMPOnline)
• Integration with existing systems (E.g. CRIS, grant systems)
• Use Your Metadata – Make work findable, discoverable and
accessible
Engagement
• Clear, direct communication
• Outreach and discussion
• Many benefits for researchers – increased efficiency and
impact of research
38
It may be difficult to think about what’s happening in 20 years’ time, but if policies change, your research might be discredited if there is no data to support it or possibly…
This is current for now, but policies do change, so keep up to date with what your funder, institution or publisher require.
Most of us find that we have many calls on our time, and that packing everything that needs to be done into the week is often a challenge. That being the case, it’s easy to feel as though research data management is simply one more thing to add to an already endless to-do list – or worse, that it’s a distraction from real work. However, there are a number of key reasons that it’s worth paying some attention to it.
Good data management does require an investment of effort – but ultimately it’s something that can actually save you time, by helping you work more efficiently. You want to complete your research project to the best of your ability, but with minimum stress – and good research data management is one of the tools that can help you to do that.
Think about:
the frustration of trying to track down a fact or a document we know we have somewhere. Good research data management – setting up an organizational system that works for you, and ensuring everything is properly filed or labelled to enable re-identification and retrieval – can make life a lot easier.
And it’s not just a matter of saving time and reducing unnecessary effort (though clearly that’s a major benefit): having everything well ordered can also help you get a better feel of the shape and scope of your research material, which in turn can enable you to spot patterns or connections that might otherwise get missed.
It’s also well worth doing, because the data you’re producing or working with is valuable
As well as this being true for your own research, the data might ultimately be of use to other researchers. Having everything well organized and properly labelled also has the potential to save you a lot of time at the end of a research project, when it comes to deciding what to do with your data – but more of that later.
Finally, there may be requirements imposed by your funding body and/or the university which you need to meet