Lightning talk given at UKSG 2018 conference in Glasgow. (See notes field for most of content.)
Conference site: https://www.uksg.org/event/conference18
Providing research data services in changing times
1. Providing research data services in
changing times
UKSG 2018 Conference
Robin Rice
Data Librarian and Head, Research Data Support
University of Edinburgh
10 April, 2018
14. By Shalom Jacobovitz - SJ1_8558, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=9511582
Hello, I’m Robin Rice, a data librarian from University of Edinburgh. I’m at my first UKSG conference today, here to talk with you about Providing research data services in changing times. I’m using a slide theme from The Data Librarian’s Handbook, written by myself and another data librarian at Oxford University, because many of the themes I’m exploring are covered in more depth in the book. Apologies, it’s a monograph, not a serial! Anyway, so until recently I was based in EDINA, a national data service centre, you can visit their stand in the breaks; as of this month I’m based in the University Library, which I think reflects a trend of mainstreaming of data librarianship within academic librarianship in the UK and elsewhere.
So here’s some terms I’ll be using and unpacking a bit. About ten years ago RDM as a term began to ‘stick’ in the sense that researchers and funders (which are often bodies made up of senior researchers) thought it a suitably sensible notion to support. Unlike open data, it wasn’t something anyone particularly could argue against. Funders viewed it as ensuring value for money on their investments; researchers could embrace it as something they had perhaps always done but not explicitly; and institutions, like my own, began to understand it was something they could improve through policies, support, and infrastructure.
At the University of Edinburgh one marker of the beginning of our research data management services was in 2008 when we built an open access, multi-disciplinary data repository, Edinburgh DataShare, as a collaborative project in the Jisc Managing Research Data Programme. We built it, but nobody really came until later, researchers were still not very interested in sharing data with so few incentives for doing so, even though they may have agreed with the principle.
At conferences then it was clear that many people didn’t really think that librarians were up to the task of managing research data – including many librarians! Since then librarians have stepped forward to make it clear they are up to the task. But echoes of the sentiment remain, with much of the talk about the importance of data skills emphasing data science, with much less emphasis on data stewardship & support.
In 2011 one of the major UK funders, the Engineering and Physical Sciences Research Council, published their ‘expectations’, including that funded institutions should have an RDM Roadmap in place within a year’s time. Suddenly RDM jobs were beginning to appear in the UK, sometimes though not always, based in academic libraries. Institutions couldn’t risk a major source of research funding by not complying. This was a game changer, flipping the onus of responsibility slightly away from the principal investigator and squarely onto the institution.
But research data management was implicitly always also about sharing data as outputs of research. Otherwise why would funders care that much about data being well managed if it was never going to be used for anyone else’s research? So this meant that training and awareness activities around RDM, including the Research Data MANTRA free online course we developed at Edinburgh, had to not only include tips and tools to help researchers organise their files and document their data, they also had to promote the benefits and ways of sharing data as an end result of good data management practice. Librarians could find their comfort zone in the strange new data landscape, introducing researchers to concepts of open data licences, trusted digital repositories (TDR), persistent identifiers and metadata for discovery and access.
It quickly became clear that to produce shareable data as outputs from research projects, planning would be necessary. Reusable data doesn’t come about by itself, and more importantly researchers are much more likely to share if they plan to share from the start. So the requirement to produce a data management plan (DMP) became a common way for funders to try to ensure good data management practice. Centres of expertise like the Digital Curation Centre helped by promoting the concept of data lifecycles and providing the DMPOnline tool, to provide templates and guidance for writing DMPs.
Many institutions, including my own, were able to map our services onto a data lifecycle in order to identify gaps in support, and to identify services that would need to be developed to meet researchers’ needs.
So what is shaking things up now? Well the General Data Protection Regulation coming into force 25th May certainly has both researchers and information professionals on edge. Although there is some continuity with the prior EC Data Protection Directive and its UK embodiment the 1998 Data Protection Act, there is still lack of clarity due around data sharing due to potentially massive fines for data controllers and processors alike, broader scope of what constitutes personal data, and recent changes in the UK legal interpretation of the law in terms of consent and the legal basis for processing research data. This new, untested legal environment combined with flagrant violations of public trust such as the Facebook Cambridge Analytica scandal, and ever increasing sophistication in methods of re-identification of individuals from ‘anonymous’ datasets, is making data producers more risk averse.
But is that really fair? Health and social science researchers who rely on human subjects data may suffer from restricted access to data, including painstaking applications for use, long waits for permission, and even the necessity to travel to use data in safe rooms disconnected from the internet, not to mention non-standard licensing through data use agreements that prohibit sharing or publishing of derived datasets. The alternative may be to use ‘safe’ public datasets that are so aggregated or anonymised that the variables of interest are diminished or virtually disappeared.
How does this situation allow researchers studying medical advances or the effect of human behaviour on grand challenge topics such as climate change or economic inequality to avail themselves of modern innovations of data science? … Innovations such as data integration, automatic record matching, and machine learning algorithms?
Clearly the easiest way to achieve FAIRness is by publishing open data, or at least open metadata, linked through to the publications that explain the datasets through persistent identifiers like DataCite DOIs. The EOSC or European Open Science Cloud’s Go FAIR and other initiatives are beginning to make great strides in standardising the way researchers and the data repositories that serve them can improve the findability, accessibility, interoperability and reusability of data produced by public funds. Re-usability stands for so much more in this new, open science/open research landscape. It stands for other R words, replication, reproducibility, rigour, robustness, which can be very hard to achieve when working with personal or sensitive data. It’s not as simple as open or closed – we need to help our researchers share data in a more nuanced way, which we’re doing at Edinburgh with our new Data Safe Haven and DataVault functions for sensitive data and managed long-term retention of data.
Open Science is about working in a transparent manner, such as publishing a research project’s plan or protocols before the research is begun; it’s about crowdsourcing solutions to intractable problems, about upskilling to use the best tools and techniques available for the research problem at hand, perhaps using open source software such as R, python, code repositories such as Github, perhaps learning from ‘the carpentries’ -
– data carpentry and software carpentry – now there’s even library carpentry for librarians!
All of this leaves many challenges for the librarian-as-data-steward, working in partnership with their research communities to advance the status quo of data curation. And so the intrepid data librarians must go on, problem-solving, supporting, building, and riding the mighty data wave.