1. Managing Research Data
for Open Science: the
UK experience
Professor Gobinda Chowdhury
Chair, iSchool@Northumbria
Northumbria University, Newcastle, UK
Chair elect, iSchools (www.ischools.org)
2. Open Science
In 2015 European Commissioner Moedas
identified three strategic priorities,
described in Open innovation, Open science,
Open to the world (the 3Os strategy)
Open Science aims at transforming science
through ICT tools, networks and media, to
make research more open, global,
collaborative, creative and closer to society
Open science is about the way research is
carried out, disseminated, deployed and
transformed by digital tools, networks and
media. It relies on the combined effects of
technological development and cultural
change towards collaboration and openness
in research. https://ec.europa.eu/digital-
single-market/en/news/open-innovation-
open-science-open-world-vision-europe
3. Open Science and Data Sharing: why?
Open science makes scientific processes more efficient, transparent and
effective by offering new tools for scientific collaboration, experiments and
analysis and by making scientific knowledge more easily accessible
(https://ec.europa.eu/digital-single-market/en/open-science)
Societal benefits from making research data open are potentially very
significant; including economic growth, increased resource efficiency, securing
public support for research funding and increasing public trust in research
(http://www.rcuk.ac.uk/documents/documents/concordatopenresearchdata-pdf/ )
Estimated that the $13 billion in government spending on the Human Genome
project and its successors has yielded a total economic benefit of about $1 trillion
A British study of its public economic and social research database found that for
every £1 invested by the government, an economic return of £5.40 (The Data
Harvest, 2014… An RDA Europe Report. https://rd-
alliance.org/sites/default/files/attachment/The%20Data%20Harvest%20Final.pdf
4. Open Research Data : Mandates
Stipulated under Article 29.3 of the Horizon 2020 Model Grant Agreement
(including the creation of a Data Management Plan)
EPSRC, UK:
Research organisations will ensure that appropriately structured metadata
describing the research data they hold is published (normally within 12 months of
the data being generated) and made freely accessible on the internet
in each case the metadata must be sufficient to allow others to understand what
research data exists, why, when and how it was generated, and how to access
Where the research data referred to in the metadata is a digital object it is
expected that the metadata will include use of a robust digital object identifier
(For example as available through the DataCite organisation ‐ http://datacite.org).
5. Open Research Data Management:
EPSRC, UK Mandate for Universities
Research organisations will ensure that EPSRC‐funded research data is
securely preserved for a minimum of 10‐years from the date that any
researcher ‘privileged access’ period expires or,
If others have accessed the data, from last date on which access to the data
was requested by a third party;
All reasonable steps will be taken to ensure that publicly‐funded data is not
held in any jurisdiction where the available legal safeguards provide lower
levels of protection than are available in the UK
Research organisations will ensure that effective data curation is provided
throughout the full data lifecycle, with ‘data curation’ and ‘data lifecycle’
being as defined by the Digital Curation Centre.
https://epsrc.ukri.org/files/aboutus/standards/clarificationsofexpectationsre
searchdatamanagement/
6. What is Research Data
Data is “glue of a collaboration” and the “lifeblood of research”
Data includes:
text, sound, still images, moving images, models, games, simulations ….
statistics, collections of digital images, sound recordings, transcripts of interviews,
survey data and fieldwork observations with appropriate annotations, an interpretation,
an artwork, archives, found objects, published texts or a manuscript (Concordat on Open
Research Data, https://www.ukri.org/files/legacy/documents/concordatonopenresearchdata-pdf/)
various types of laboratory data including spectrographic, genomic sequencing, and
electron microscopy data; observational data, such as remote sensing, geospatial, and
socioeconomic data, numerical data and other forms of data either generated or
compiled by humans or machines
(Borgman, C.L. (2012). The conundrum of sharing research data. Journal of the American Society for Information Science
and Technology, 63(6), 1059–1078.
Borgman, C.L., Wallis, J.C., & Mayernik, M.S. (2012). Who’s got the data? Interdependencies in science and technology
collaborations. Computer Supported Cooperative Work, 21(6), 485-523.)
7. Research Data Management
Good data management is fundamental to all stages of the research process
and should be established at the outset
“The careful management of data throughout the research process is crucial
if the data arising from research projects is to be rendered openly
discoverable, accessible, intelligible, assessable and usable.”
(https://www.ukri.org/files/legacy/documents/concordatonopenresearchdata-pdf/)
FAIR (Findable, Accessible, Interoperable and Reusable) guidelines
(http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-
mgt_en.pdf)
A DMP should include a description of all types of data, a description of all
types of metadata and policies used, plans for archiving and preservation, and
a description of resources required for data management (Strasser, C. (2015). Research
data management: a primer publication of the National Information Standards organization. Baltimore, MD: NISO)
8. RDM Challenges and Stakeholders
Good data management is fundamental to all stages of the research process
and should be established at the outset (Researchers + Data Librarian + Inst.)
Data management for Open Sc. (Data Librarian + Researchers + Institutions)
Data curation (Data Librarian/Curator + Institution + Govt./Funding Bodies)
Data Sharing Policies (Govt., Funding bodies, Institutions, Prof. Bodies)
9. RDM Technologies and Systems
National e.g. ANDS (https://www.ands.org.au/)
In-house/Institutional, e.g. Research data Oxford (http://researchdata.ox.ac.uk/);
RDS Edinburgh University (https://www.ed.ac.uk/information-services/research-
support/research-data-service) Not-for profit e.g. DataCite
(https://www.datacite.org/ )
Subject/Discipline, e.g. UK Data Archive (http://www.data-archive.ac.uk); Github
(https://github.com/) ………..
Commercial e.g. Figshare (https://figshare.com/)
Aggregator portal: Jisc research Data Discovery Service
(http://researchdiscoveryservice.jisc.ac.uk/dataset)
Whichever option is chosen RDM is resource-intensive and hence requires a
sustainable business model and supporting policies
10. A big question: Do researchers want to
share data?
Does every researcher want to share data?
Do the researchers have the necessary awareness and data management
skills?
Are there specific sharing practices and culture in specific disciplines?
Do the researchers have any concerns around data sharing?
What are the incentives of data sharing?
....... And many more related questions
11. RDM Training Policies
Support for the development of appropriate data skills is recognised as a responsibility for all
stakeholders (Principle 9 of the Concordat on Open Research Data, 2016
(http://www.rcuk.ac.uk/documents/documents/concordatopenresearchdata-pdf/)
Researchers:
For research institutions this should include the provision of researcher training opportunities provided in an
organised and professional manner.
It is imperative also that funding organisations, alongside research institutions, support the provision of such
training through appropriate funding routes.
Individual researchers must also ensure their own data skills are at a level sufficient to meet their own
obligations whilst understanding the benefits to themselves of a higher level of understanding.
Data Scientists:
“The specialised skills of data scientists are crucial in supporting the data management needs of researchers
and institutions
Research institutions and funders should work together to help build underpinning capacity and capability in
this area, and to attract and retain such specialists by developing well designed and sustainable career paths
for them”
12. Key RDM Challenges
Technology
ICT infrastructure for storage, management, curation
Software, metadata, interoperability
Access and reuse
People
Researchers: culture, data literacy, training requirements
Data Scientists: data management, data curation, training
Users: researchers, businesses, governments, policy-makers, general public ….
Policy
Governments, Funding agencies, Institutions, Professional bodies ….
Resources
Financial, human, legal
13. RDM: Technology Issues
Volume, variety & growth of data
Software dependence of data
Multiple file formats
Data curation
Retrieval issues
14. Is Data Retrieval = Information
Retrieval?
Most data retrieval services are based on the text retrieval paradigm
The key difference between IR and DR arises from the data elements
Using datasets often requires a no. of associated files
Search output in DR is often very large
Search output in DR requires downloading before access
Very little research has been undertaken on data seeking behaviour
No reliable data seeking and retrieval model exists
15. Discipline Keywords Data Retrieval
Average File
Size
Information
Retrieval
Average File Size
Arts &
Humanities
art museums 5.708 MB 0.820 MB
nineteenth century 2.537 MB 1.042 MB
“world war” 5.766 MB 0.508 MB
medieval 5.053 MB 1.091 MB
popular music 8.353 MB 1.000 MB
Social Sciences unemployment 3.059 MB 0.455 MB
cognition 11.681 MB 1.612 MB
imprisonment 1.837 MB 0.503 MB
“labour law” 1.667 MB 0.410 MB
“trade union” 2.073 MB 0.748 MB
Natural Sciences marine life 15.707 MB 1.491 MB
“climate change” 1.655 MB 2.497 MB
“renewable energy” 758.000 MB 3.606 MB
“ultraviolet light” 495.900 MB 1.991 MB
“oxidative phosphorlyation” 40.242 MB 1.895 MB
Computer &
Information
Science
search behaviour 656.000 MB 0.731 MB
face recognition 1.391 GB 1.535 MB
computer vision 1.330 GB 2.782 MB
research data sharing 1.014 MB 0.521 MB
social media data 16.329 MB 1.078 MB
16. Metadata for RDM
Tools:
DCC Metadata for Research disciplines
(http://www.dcc.ac.uk/resources/metadata-standards)
RDA (https://www.rd-alliance.org/groups/metadata-standards-catalog-working-
group.html)
Key questions:
How much metadata is required?
Who will do the tagging?
Who will check for consistency and standards?
How will it be used?
17. Data sharing: Researchers’ culture,
awareness, concerns…
Findings from a study on researchers from three countries:
nearly 80% of researchers do not want to share data with anyone
Less than 25% researchers agree that their university encourages OA data sharing
Only 31% researchers are familiar with the OA requirements of the funding bodies
Nearly 95% of researchers are either uncertain or do not know whether their
university has a prescribed metadata set
the key concerns for OA and data sharing include: legal and ethical issues, misuse
and misinterpretation of data, and fear of losing the scientific edge
only a third of the researchers have a unique researcher ID
Over 70% of researchers did not have any formal training in DMP, metadata,
consistent file naming and version control or data citation
18. TULIP: Information Management
Research to address RDM Challenges
Technology
Research data repository/services: Local vs. National repository services
Research data management: standards & practices -- ORCID, DOI, Metadata, Citation, Quality, Version Control…
Research data discovery & access -- from IR paradigm to DR paradigm: user-centred & discipline-specific design
Research data sharing/reuse: data quality metrics
Users: research culture, training
Data Literacy and RDM training and advocacy across all disciplines
Librarians
Education and training programmes for data librarians
Industries
New research data service industries; Public-private partnership; Sustainability
Policies
OA mandates; Incentives for researchers; Data quality; Ethics, Curation…
19. Resources
Bugaje, M. and Chowdhury, G. (2018). Data Retrieval = Text Retrieval?
iConference2018. In Chowdhury, G., McLeod, J., Gillet, V. and Willett, P.
(eds). Transforming digital worlds: proceedings of the iConference2018.
March 25-28, Sheffield, LNCS 10766, Springer, pp. 253-262.
Chowdhury, G. Boustany, J., Kurbanoglu, S., Unal, Y. and Walton, G. (2017).
Preparedness for Research Data Sharing: A Study of University Researchers in
Three European Countries, ICADL2017, Bangkok, 13-15 November, 2017,
LNCS10647, pp. 104-116
DCC Checklist for DMP:
http://www.dcc.ac.uk/sites/default/files/documents/resource/DMP/DMP_Ch
ecklist_2013.pdf
DCC Curation Lifecycle model (http://www.dcc.ac.uk/resources/curation-
lifecycle-model)