Research Data Management in GLAM: Managing Data for Cultural Heritage

Research Data Management in ‘GLAM’:
Managing Data for Cultural Heritage
Sarah A. Stewart, The British Library
@Biostew
‘Open Science Infrastructures for Big Cultural Data’ Masterclass,
Dec. 13-15th, Plovdiv, Bulgaria

www.bl.uk
Outline
Благодаря ви, че дойдохте днес!
• Introduction and Challenges: Data in Cultural Heritage
• Research Data Management ‘In a nutshell’: Key Concepts
• Software as Data
• PIDs for RDM – DataCite and DOIs
• RDM at the British Library – Developing Infrastructure and
Service around Data
• Conclusion and Questions?
2

www.bl.uk
The Digital Transformation of Research
3
http://researchgraph.org/gesis/

www.bl.uk
Digital Transformations in ‘GLAM’:
The ‘Inside-Out’ Museum
• GLAM institutions are ‘everting’ their collections and
research to the (open) web – ‘Collections without Walls’
• Dynamic, changing research landscape – development
of new tools and techniques for digital research
• New infrastructures to support digital collections,
research and scholarship
• Changing materiality of research – from ‘analog’ to digital
• Greater role for data and metadata
• Research Data Management will play a crucial role!
4

www.bl.uk
‘Inside-Out’ Museum: From Specimen to Data
5

www.bl.uk
The (Inside-Out) British Library…
6

“Challenges” (Opportunities?)
• Research is digital, are we?
• Are we still needed for
discovery?
• In an open world, do we still
have a role for access to
digital content?
• Will print become invisible?
• Global content grows so
fast, our collections are
shrinking (relatively)
• Resources? Funding, Time,
Labour

www.bl.uk
Many Types of Data in CH!

www.bl.uk
Big Data, Little Data, No Data…(Borgman, 2014)
• Language of ‘Data’ taken from the Sciences, but can be
defined and managed more broadly in all disciplines
• Big Data requires computational methods for analysis and
visualisation – Volume, Velocity, Variety
• Cultural Heritage Data might be ‘messy’ or ‘dirty’ – may be
incomplete, have gaps or require additional metadata (e.g.
‘Box of 19th Century Theatre Posters’)
• Sensitive data(?) Can still occur in CH!
• Broad definition of ‘data’ in Cultural Heritage
9

www.bl.uk
UKRI Concordat on Open Research Data (2016)
• Data are ‘evidence that underpins the answer to the
research question, and can be used to validate findings
regardless of its form (e.g. print, digital, or physical).’
• The primary purpose of research data is to provide the
information necessary to support or validate a research
project's observations, findings or outputs
10

www.bl.uk
Why Manage Research Data?
• Make data Findable, Accessible,
Interoperable and Re-Useable (FAIR Data)
• Preserve data for long-term use and re-use
• Make Research Transparent/Open
(Validation of Research!) and Reproducible
• Funder and Publisher mandates
• Good Research Practice – GLAM
Institutions are Research Institutions!
11

www.bl.uk
Why is Research Data Management Important?
Good Professional Practice:
• Funder mandates and requirements
• Supports institutional integrity
• Supports collaboration through data sharing and re-use
• Reduces redundancy in research
Value to you as a Researcher/Institution:
• Reduce the risk of data loss
• Increased efficiency
• Validated and replicable research
• Increased sharing and re-use (increased possibilities for collaboration)
• Increased citations
• Increased Research Impact!

www.bl.uk
Missing Data Inhibits Research
13https://www.nature.com/news/scientists-losing-
data-at-a-rapid-rate-1.14416

www.bl.uk
What does Managed Data Look like?
Well-managed data is:
• intelligible and verifiable, because it is well-documented
• findable, because it is well organised and uses useful
filenames
• protected against loss, corruption and authorised access,
because it is backed up and secured appropriately
• easy to share, because mechanisms for protecting
confidentiality and intellectual property have been considered
• maintainable, because it is managed in a way that suits the
research group that uses it
• compliant with relevant laws and policies
14

www.bl.uk
FAIR Data Principles (Force11, 2016)
https://www.force11.org/group/fairgroup/fairprinciples
• Findable
• Accessible
• Interoperable
• Reuseable
15

www.bl.uk
• To be Findable:
• F1. (meta)data are assigned a globally unique and eternally
persistent identifier.
F2. data are described with rich metadata.
F3. (meta)data are registered or indexed in a searchable
resource.
F4. metadata specify the data identifier.
• TO BE ACCESSIBLE:
• A1 (meta)data are retrievable by their identifier using a
standardized communications protocol.
A1.1 the protocol is open, free, and universally implementable.
A1.2 the protocol allows for an authentication and authorization
procedure, where necessary.
A2 metadata are accessible, even when the data are no longer
available.
16

www.bl.uk 17
TO BE INTEROPERABLE:
I1. (meta)data use a formal, accessible, shared, and broadly
applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles.
I3. (meta)data include qualified references to other (meta)data.
TO BE RE-USABLE:
R1. meta(data) have a plurality of accurate and relevant
attributes.
R1.1. (meta)data are released with a clear and accessible data
usage license.
R1.2. (meta)data are associated with their provenance.
R1.3. (meta)data meet domain-relevant community standards.

www.bl.uk
Metadata
18
Metadata should be Open
and as Rich as Possible!

www.bl.uk
The Research Data Lifecycle
19

www.bl.uk
Research Data Management (in a Nutshell)
• Data Management Planning
• Data Preservation (Both Short- and Long-Term)
• Data Sharing (and Sensitive Data)
• Data Discovery, Access and Re-Use
20

www.bl.uk
Data Creation: Data Management Plans
• Should have a data management plan in place at the
beginning of a project, as standard practice
• Data management plans should provide an outline of uses,
responsibilities, ownership, access and sharing (licensing),
storage, maintenance and archiving (even disposal) of
research data and software
• Online tools for data management plans include DMPOnline
(https://dmponline.dcc.ac.uk/)
21

www.bl.uk
Data Sharing: Why Share Data?
• Funder and publisher mandates
• Collaborations – Interdisciplinary and (often) International
• Validation/ Transparency, Support for
Open Research
• Citations = Research Impact!
• Sensitive data – Not All Data Can be Shared!

www.bl.uk
Data Re-Use? You Might Be Surprised!
23

www.bl.uk
Software as Data
• ‘Software is used to create, interpret, present, manipulate and
manage data’ (Software Sustainability Institute)
• Data: ‘recorded factual material commonly retained by and
accepted…as necessary to validate research findings’
(EPSRC)
• Software = Data!

www.bl.uk
Software should be preserved if:
• Software can’t be separated from the data or digital object.
• Software is classified as a research output
• Software has intrinsic value
• More resources available at the Software Sustainability
Institute:
https://www.software.ac.uk/software-sustainability-institute

www.bl.uk
Digital Preservation (Software and Data)
• Software is a digital object and is also often a vital prerequisite
for the preservation of other digital objects
• Storage, Retrieval, Reconstruction and Replay are all
complexities relating to code libraries, dependencies and
software engineering overall
• Planning is essential for subsequent preservation
• Software management should be part of a broader plan for
research data management.
27

www.bl.uk
Some Strategies for Digital Preservation
• Data integrity and file fixity checks (using checksums) for
source code
• Media and format migrations
• Refreshing (reduces bit-rot)
• Emulation (‘simulates’ the conditions of a legacy system)
• Replication – ‘Lots of Copies Keeps Stuff Safe’
• Encapsulation – linking content with all information required for
operation – rich metadata approach (e.g. ‘README’ file and
annotation)
• Version Control – metadata to support versioning of software
and data
28

Open Data and where to Find It
(and Store/Archive It, too…)
• Re3data.org – Directory of subject-specific repositories
• Zenodo.org – Open Data repository run by CERN
• Github – Software and code repository

www.bl.uk
Why Use Persistent Identifiers?
• Use of persistent identifiers has
increased as scholarly
communications become
increasingly digital.
• ORCIDs and DOIs support open
science through supporting
interoperability in research
infrastructures.
• For instance, DataCite,
CrossRef can use DOIs and
ORCID iDs in addition to other
metadata to map and link
documents, data and
researchers. (LOD)

www.bl.uk
DataCite and DataCite UK
• Non-profit organisation which provides infrastructure for DOIs,
(Digital Object Identifiers)
• DOIs make data discoverable, citable and link datasets with
other related research outputs
• The British Library is the DataCite hub for DOI creation in the
UK.
• https://www.datacite.org/
• ‘To help the research community locate, identify, and cite
research data with confidence.’
31

www.bl.uk
DOIs (Digital Object Identifiers)
• Persistent identifier used to uniquely identify objects (datasets,
software, journal articles, theses), standardised by the
International Standards Organisation (ISO)
• Presented as an alphanumeric code consisting of a prefix and
suffix separated by a slash ‘/’ . The ‘10’ at the start of the DOI
positions the DOI within DOI namespace. E.g.
10.1037/rmh0000008
• Uses a ‘handle’ system in which a DOI is ‘resolvable’ through
binding metadata (such as a URL) to the specific DOI that
describes it.
• DOI is persistent, so it is the publisher’s responsibility to
update the metadata attached to the DOI, otherwise, the DOI
will resolve to a dead link.
32

www.bl.uk
FREYA Ambassadors’ Programme
• 3-Year EU-funded Project to advance infrastructure for persistent identifiers as a
core component of Open Research
• For more info, or to join, contact info@project-freya.eu
• https://www.project-freya.eu/en/activities/ambassador-programme
• Funded partners of FREYA include: STFC, PANGAEA, DANS, DataCite and
CERN and the British Library
34

www.bl.uk
Build Bridges, Not Siloes!
• Use FAIR Data Principles, Open Metadata and Persistent
Identifiers for Data!
35

www.bl.uk
The British Library in Context
• National Library for
the United Kingdom
• Second Largest
Library in the World –
over 150 million items
in most known
languages
• Over 16,000 visitors
per day (on-site and
on-line)
• Legal Deposit
36

The British Library response to challenges
• Living Knowledge articulates the
vision of the British Library in
2023 as the most open, creative
and innovative institution of its
kind in the world.
• A new Service Strategy for
research and a new Content
Strategy.
• New approach for delivery that
brings together the researcher-
facing departments in joined-up
roadmap.
• Everything Available is a
strategic change management
portfolio designed to deliver the
transformation of the Library’s
services to researchers and
research organisations.

Six strategic priorities
• Unified discovery workflowFind
• Unified access workflow
• Registration and identity management
• Workspaces and tools
Use
• Digital collection unification
• Collection management as a service
Share

SHARE: part of a wider ambition

Digital service elements
Digitisation
•On demand
•For institutions
Metadata
•Enhance content
•Provide identifiers
•Build semantic links
•Licensing support
Preservation
•Born digital
•Digitised
•Print
•Preservation as
a service
Discovery
•BL & external
content
•Feed external
services (e.g.
Google)
•Discovery as a
service
•Single Digital
Presence for public
libraries
Analysis
•Text and data mining
•Machine interfaces
•Visualisation
•Machine learning
•Dedicated staff
support
Access
•Shared platform
•Institutional portals
•Machine interfaces
•Feed external
platforms

www.bl.uk
The UK ‘Research Data’ Landscape…
• UKRI – Data underpinning
research and policy must be
archived for 10+ years
• Data must be made as openly
available as possible (with
constraints for sensitive data)
• Data must have appropriate
metadata and be citable
41

Vision – Data Collections and Services
Our vision for the British Library is that research
data are as integrated into our collections,
research and services as text is today.
The British Library's users will be able to
consume research data online through tools that
enable it to be analysed, visualised and
understood by non-specialists.

www.bl.uk
British Library Data Strategy (2017)
• All will be easy to discover and linked to
related research outputs, be they text, data or
multimedia.”
43

www.bl.uk
Data Services at the British Library
• Development of Infrastructure to support research data
management for data use and re-use at the British Library
• DataCite UK
• FREYA Project for Persistent Identifier (PID) Infrastructure
• Data in the Research Repository
• Discovery Services for Research Data
• Software and Data Carpentry and Software as Data Initiatives
(TBA)
44

Four Themes
Data Archiving
and Preservation
Data Discovery,
Access and Reuse
Data CreationData Management

Data management training
Data Management Plan engagement
British Library Data Management Plans
Documented Data Management Processes
Data Management
Jo, BL staff member
I was working on a grant proposal for ESRC.
They require a data management plan, so when
I was given an outline plan that set out the
Library’s processes for data management, I was
able to reuse that and save myself days of extra
work!

Engaging and linking with others
Clarify approach to data collection
Data Creation
Sonja, Epidemiologist
I was able to use the British Library web archive
as a dataset, correlating positive and negative
messages about statin use with NHS
prescription data. The subset of data I extracted
is really useful to others, so I offered it to the
Library who now make it available alongside
their other datasets.

Digital shared storage
Data preservation services for third parties
Data Archiving
and Preservation
Robin, Consultant
We produce valuable reports and data on the
political environment of emerging market
economies. Now that the British Library is
archiving that data, we can ensure others get to
use it even if our consultancy closes down. We
can also give them DOIs, and track the impact
of the work we produce.

SHARE: Developing a repository platform
• Single BL repository platform
• Refresh national preservation
system (>5m items, petabyte-scale)
• Access layer with multiple
repositories, shared service model
• Repository pilot developed with:
Preservation Layer
Services Layer
Access Layer
EThOS Data.bl.uk
BL
Institutional
Repository
Partner
Repositories

www.bl.uk
Data Collections – data.bl.uk
51

Rosslyn, Social Historian
My research on perceptions of gender involves
looking at if and how gender-specific words evolve
into derogatory terms. The British Library gave me
great advice on which collections I could use, and
how to connect tools to them. This allowed me to
automate analysis and visualisation of the data,
finding things I didn’t expect.
New models of data access
Third-party data discovery
Discovery for Library data
Data Discovery,
Access, Reuse

Tools and skills for data exploration
Alice, Post-Graduate Researcher
Being able to persistently identify my data with a
DOI means I can make my research
reproducible. It also means that I can track when
my data is cited, which is really helpful when it
comes to looking at my research impact.
Widening access
DataCite UK
Data Discovery,
Access, Reuse

www.bl.uk
‘Take-Home’ Points
• Data in cultural heritage may be very broadly defined.
• Use FAIR Data Principles as best practice
• Plan for data management following the research data lifecycle
• Data Discovery – Build Bridges, not Siloes
• Consider software as ‘data’ in RDM
• Persistent Identifiers to build robust, citable and discoverable
metadata and link outputs
55

www.bl.uk
Join the FREYA Project!
56
info@project-freya.eu

www.bl.uk
Благодарим ви, че ни отделихте от
времето си!
Thank You!
Questions?
Email: sarah.stewart@bl.uk
datasets@bl.uk
@Biostew
57

Research Data Management in GLAM: Managing Data for Cultural Heritage

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Research Data Management in GLAM: Managing Data for Cultural Heritage

Similar a Research Data Management in GLAM: Managing Data for Cultural Heritage (20)

Más de Sarah Anna Stewart

Más de Sarah Anna Stewart (10)

Último

Último (20)

Research Data Management in GLAM: Managing Data for Cultural Heritage

Notas del editor