As an important component of the scholarly record, research data, software and code are increasingly managed as research outputs in their own right, though are not typically subject to peer review.
In line with the broader ‘open research’ movement there is a growing impetus for datasets, software and code to be curated in repositories, openly available wherever possible subject to relevant legal and ethical constraints.
Data repositories such as Figshare, Dryad and Zenodo routinely allocate DOIs for deposited data while many universities in the UK also allocate and mint DOIs in their nascent institutionally based data repositories through Datacite which means they will be automatically tracked by altmetric.com in the same way as journal articles.
While the repository infrastructure continues to develop and there are pockets of best practice, data sharing and reuse is not yet fully established across UK HE. Reward mechanisms are immature and data citation, for example, is limited and not easy to track. Clarivate Analytics’ Data Citation Index coverage of UK based repositories is still relatively low and, as a subscription based product, is not widely accessible. COUNTER compliant downloads can be derived from IRUSdata-UK (beta) which currently tracks 27 UK based institutional data repositories.
Altmetrics therefore offers a low barrier method to track engagement with datasets and, in lieu of a more formal process, might be regarded as a type of informal peer review. We have undertaken a preliminary analysis of repositories that participate in IRUSdata-UK (beta) using it as a source of DOIs to run against the altmetric.com API to discover to what extent research data, software and code is being shared.
This talk will present these preliminary results and explore how and why datasets are being shared across the various platforms tracked by altmetric.com and potential barriers. It will consider how data repository managers can encourage and facilitate data sharing through social media networks, blogs and “data journalism” and will draw on the Research Data Management (RDM) Engagement Award at the University of Leeds which is exploring linking RDM with the Open Science movement via the Wikimedia suite of tools. What does the altmetric data currently tell us about how research data is being linked to this global platform
Has anyone seen my data? Incentivising #opendata sharing with altmetrics
1. Has anyone seen my data? Incentivising
#opendata sharing with altmetrics
Nick Sheppard
0000-0002-3400-0274
@mrnick
@OpenResLeeds
2. • Research data, software, code
–increasingly managed as research outputs in their own right
• Data repositories routinely allocate DOIs
–automatically tracked by altmetric.com in the same way as journal articles
• Reward mechanisms are immature
–data citation is limited and not easy to track
• Altmetrics offers a low barrier method to track engagement with datasets
• Results from UK based institutional data repositories
• How can data repository managers encourage and facilitate data sharing
through social media networks, blogs and Wikipedia?
Has anyone seen my data? (Original abstract)
3. Publications, Open Access and the REF
“to be eligible for submission to the REF 2021, authors’
final peer-reviewed manuscripts must have been
deposited in an institutional or subject repository.
Deposited material should be discoverable, and free to
read and download, for anyone with an internet
connection”
REF 2021 Open Access policy, HEFCE (2016)
4. Open Research Data
“The primary purpose of research data is to provide the
information necessary to support or validate a research
project's observations, findings or outputs.”
Concordat on Open Research Data (2016)
5. ‘Open Research’
“the central theme of open research is to make clear
accounts of the methodology freely available via the
internet, along with any data or results extracted or derived
from them.” - Wikipedia
As a charity, Wellcome works to ensure that the results of
the research we fund are applied for the public good. This
includes creating an environment that enables and
incentivises researchers to maximise the value of their
research outputs, including data, software and materials.
Policy on data, software and materials management and
sharing (Wellcome Trust)
6. Open Research and the REF
The revised template will also include a section on ‘open
research’, detailing the submitting unit’s open access
strategy, including where this goes above and beyond the
REF open access policy requirements, and wider activity to
encourage the effective sharing and management of
research data. The panels will set out further guidance on
this in the panel criteria.
Initial decisions on the Research Excellence Framework 2021
7. Data citation
All users of research data must formally cite the data they use […]
The obligation to recognise through citation and acknowledgement the
original creators of the data must be respected […] Publishers should
enable the formal citation of data in articles to support these practices.
Concordat on Open Research Data (2016)
• currently no standardised method
• many journals (still) include data as “supplementary” information
Advice at Leeds:
• data files should be deposited in a recognised repository
• data availability statement in the body of the paper AND as an entry
in the reference list
8. A word on peer review
“Ordinary” journals
“Referees may ask to see supporting data not submitted for publication"
Journal of the Royal Society Interface
From September 12th 2018, Nature Communications will be setting a higher standard of data
reporting for papers under peer review. Nature (2018)
Data journals
• A Survey of Peer Review Guidelines - Todd Carpenter (2017)
• 39 peer review policies
• Editorial, Metadata, Data Quality, Methodology Review, and Other
9.
10. Clarivate Analytics Data Citation Index
• Limited coverage
–Master Data Repository List
• Subscription based product
Scholexplorer
“Scholexplorer populates and provides access to
a graph of links between dataset and literature
objects and dataset and dataset objects.”
11. Data from: Towards a worldwide wood economics spectrum
• Cited once according to Data Citation Index
–Original citing paper
• One ‘real’ citation according to Scholexplorer
• 30,000 downloads
• Manual search of WRRO
–25 citing papers
• Dimensions from Digital Science
–198 hits
12. Altmetrics: what’s the score?
• Data source: IRUSdata-UK (beta)
–currently tracks 27 UK based institutional data repositories
–COUNTER compliant downloads
• Results (Google doc)
–Only one with altmetric score > 100
–Only 15 with altmetric score > 10
–Mostly Twitter
13. Data from: Towards a worldwide wood economics spectrum
• Cited once according to Data Citation Index
–Original citing paper
• One ‘real’ citation according to Scholexplorer
• 30,000 downloads
• Manual search of WRRO
–25 citing papers
• Altmetric Score?
• Data Dryad / UK Data Archive
–Limited sharing
–Anomalies
14.
15. A culture of data sharing?
• Promote data as a scholarly output in
its own right
–Repositories
–DOIs
• Open Research
–Share publications AND underlying data
• Social media networks
–Twitter / blogs
–Wikipedia
By Sander van der Wel from Netherlands (Young toekans sharing food) [CC BY-SA 2.0
(https://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons
16. Manage it locally to share it globally: RDM and Wikimedia Commons
• First Data Management Engagement Award
–Sponsored by SPARC Europe, University of
Cambridge, Jisc
• Link RDM with the open science movement
• Wikimedia suite of tools
–share openly licensed research material via
Wikimedia Commons
–can be used to improve Wikipedia
17. Example: Baxter the Robot
Alomari, Muhannad and Hogg, David C. and Cohn, Anthony G. (2017) Leeds Robotic Commands.
University of Leeds. [Dataset] https://doi.org/10.5518/110
Video from dataset uploaded to Wikimedia
Commons and embedded and cited on
Wikipedia.
18. Get involved
To register your interest in the project please
use this form:
RDM Engagement – we need your help!
Related blogposts:
–RDM Engagement: Project launch
–Open in order to…contribute to the global digital
commons: University collections and Wikimedia
–Wikipedia, information literacy and open access
Notas del editor
Nick
there is currently no standardised method to describe how supporting data can be accessed
which is unlikely to have a unique identifier, may sit behind a journal pay wall and not be readily discoverable
to provide long term curation with appropriate metadata and to enable proper citation
a prominent data availability statement should be included in the body of the paper AND as an entry in the reference list
“In referee comments I have received back on around 60 articles, I believe referees have only noted twice that they appreciated the availability of the data. “
But what constitutes peer review of research data? What are existing practices related to peer review of research datasets? Since a number of journals specifically focus on the review and publication of datasets, reviewing their policies seems an appropriate place to start in assessing what existing practice looks like in the “real world” of reviewing and publishing data.
Peer review of data is similar to peer review of an article, but it includes a lot more issues that make the process a lot more complicated. First, a reviewer has to deal with the overall complexity of a research dataset — these can be large and multifaceted information objects. Oftentimes, the data go through a variety of pre-processing and error-cleansing steps that should be monitored and tracked. Some datasets are constantly changing and being added to over time, so the question must be asked, does every new study based on a given dataset need a new review or could an earlier review still apply? To conduct a proper analysis, the methodology of the data collection should be considered, an examination that can go as deep as describing instrument calibration and maintenance. Even after a dataset is assembled, analysis can vary significantly according to the software used to process, render, or analyze it. Review of a dataset would likely require an examination of the software code used to process the data as well. All of these criteria create more work for already burdened reviewers.
Inevitably misses citations
Many data citations “informal” = hard to track
even where data are “properly” cited, not all publishers pass that information on properly
Data referred to in articles is usually not cited in a consistent or structured fashion. To address this, Force 11 have developed the Joint Declaration of Data Citation Principles.[1] JATS 1.1d1 has provisions for citing articles and other sources, but does not offer straightforward ways of expressing some of the concepts needed for data citation.
Mietchen et al 2015