Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Disciplinary RDM
1. Research Data Management from
a disciplinary perspective
Sarah Jones
Digital Curation Centre
sarah.jones@glasgow.ac.uk
Twitter: @sjDCC
Stéphane Goldstein
Research Information Network
stephane.goldstein@researchinfonet.org
Twitter: @stephgold7
2. Disclaimer
Practice varies greatly by
discipline and sub-discipline
so it’s hard to generalise
Apologies for any sweeping
statements and groupings
that don’t fit your model
Image credit: Sweep by Judy Van der Velden CC-BY-NC-ND
www.flickr.com/photos/judy-van-der-velden/6757403261
3. Case studies on disciplinary practice
RIN Information Seeking and Sharing Behaviour
www.rin.ac.uk/our-work/using-and-accessing-information-resources
– Life sciences
– Humanities
– Physical sciences
RIN Open Science Case studies
www.rin.ac.uk/our-work/data-management-and-curation/open-science-case-studies
SCARP case studies www.dcc.ac.uk/resources/case-studies/scarp
Knowledge Exchange Incentives and motivations for sharing research
data (forthcoming)
RLUK research data typology (more from Stephane)
4. Groups and disciplines
Arts & Humanities
– Creative arts, languages, philosophy, archaeology…
Social Science
– Economics, history, politics, business, psychology...
Sciences & Engineering
– Physics, astronomy, earth sciences, computing…
Life Sciences
– Biology, ecology, medical and veterinary science…
5. Arts & Humanities
Outputs may not be termed
‘data’ e.g. sketches, writing,
performance, artefacts, ‘work’
Focus on literary outputs &
manuscripts in some disciplines
More use of standard tools e.g.
Word, Excel – less likely to
adapt technologies to fit
Arguably lower awareness and
uptake of RDM overall
6. Creative Arts
Several RDM projects in the creative arts e.g. Kultivate,
KAPTUR, VADS4R, CAiRO training...
Resistance to term ‘data’ – too scientific
Importance of personal websites for profile as work is
also conducted outside of academia
Visual Arts Data Service - www.vads.ac.uk
Institutional repositories at arts schools accept a broader
range of outputs and display content more visually to fill
the void e.g. http://research.gold.ac.uk
7. Sonic Arts Research Unit
Collaboration with IR as a
result of losing data
Tension between providing
access in a visual / usable
way and preserving data
Still use soundcloud and
personal websites for
access, but these link to
‘master’ copy of data held
in IR for preservation
www.dcc.ac.uk/resources/developing-rdm-services/repository-radar
8. Digital Humanities
Intentional creation of resources rather than just data as
by-product of research process
More use of standards e.g. XML & TEI in language
resources, image standards and capture quality for
digitisation, Dublin Core metadata…
Often include technical experts in project team
Links with cultural heritage collections
Negotiating copyright often a major issue
Sustainability a big challenge
9. Mapping Edinburgh’s Social History
Historical maps overlaid these with all kinds of
open data to chart how the town has changed
through time
Uses open source tools
Allows you to overlay maps
Picks up on common themes
www.mesh.ed.ac.uk
10. Social Sciences
Greater awareness and acceptance
of RDM by community
Methodology is as much a factor in
determining difference as discipline
Nature of data often poses
challenges for sharing
Lots of reuse of large survey data
Established metadata standards e.g.
Data Documentation Initiative (DDI)
Strong international data centre
infrastructure
11. Public health
Ethics predominant concern
– How to negotiate consent
– How to store, transfer & handle data securely
– How to anonymise and share data
Data integration / linking and curation of longitudinal studies is
major concern as data added to over decades
Need for data havens to help control access to data – role for
unis e.g. Grampian Data Safe Haven
UK Data Service - http://ukdataservice.ac.uk
12. Twenty-07: Public health study
Longitudinal study following 4510 people from West of Scotland
over 20 years to investigate the reasons for differences in health
Undertook interviews, questionnaires, physical measurements,
blood samples etc
Strict access controls and guidelines for data collection
Data managed within the MRC Social and Public Health Sciences
Unit and accessible under a data sharing agreement -
http://2007study.sphsu.mrc.ac.uk/Revised-Data-Sharing-Policy-has-been-
13. Life Sciences
Funders arguably more demanding
in terms of data sharing policy
Sharing can be problematic / resisted
given the nature of the data, fear of
misuse or loss of control over IPR
Data sharing agreements and access
committees more common
Data integration & mining key
drivers
Research is well-resourced so greater
capacity to fund local solutions and
tools for RDM during projects
14. Genetics
Vast quantities of data and rapid growth
– DNA sequence data is doubling every 6-8 months
Well established public databases for gene sequences e.g.
GenBank www.ncbi.nlm.nih.gov/genbank
– However even this is on short-term project funding!
Need accession number to publish so driver for sharing and
established workflow
European Data Infrastructure projects too e.g. ELIXIR
15. Neuroscience
Large data volumes due to use of medical imaging
Moving towards larger cohort studies integrating wider range of data types,
which strains the balance with ethical requirements around personal data
Costs of data gathering and advances in analysis technology are making field
more data intensive - computational methods
Small interdisciplinary teams provide the human infrastructure for RDM, but
historically low funder investment in data management at lab level
Disciplinary archives are immature, and has encouraged tendency for labs to
treat longitudinal datasets as intellectual capital
16. OMERO – Open Microscopy
Environment
Monash e-Research Centre
helps groups to adopt (and if
needed adapt) existing
technological solutions
Partnered a research group to
implement OMERO, a secure
central repository to help
researchers organise, analyze
and share images
Resulting tool more
sustainable as tailored to
specific community need
www.dcc.ac.uk/resources/developing-rdm-services/improving-rdm-monash
17. Science & Engineering
Large scale can mean RDM is built in
as standard and sharing part of
workflow e.g. facilities science
Often early adopters and advocates
of new technologies e.g. the Grid,
wikis & Arxiv in particle physics
Archiving established in some cases
as data can’t be recreated e.g. NERC
data centres for Earth Sciences
Commercial sensitivities can place
restrictions on sharing in some fields
Industry
partners
18. Mechanical Engineering
Several RDM projects at Bath e.g. ERIM, REDm-MED
Concept of repository well established in industrial engineering
– Product Lifecycle Management (PLM) systems
Preservation issues as data is challenging e.g. CAD files
Less information sharing than other disciplines
– Commercial sensitivities preclude sharing
– Consultancy-style research can lead to internal-only results
– Data generated from private systems, so less applicable to others
19. Crystallography
X-ray examinations, images and videos of crystal structures,
chemical crystallography diffraction images
Established metadata standards e.g. Crystallographic
Information Framework (CIF)
Advocates of open science and use of related tools
UsefulChem - http://usefulchem.wikispaces.com
LabTrove - www.labtrove.org
eCrystals Archive and Crystallography Open Database (COD)
National Crystallography Service - www.ncs.ac.uk
20. Astronomy
Established data standards (e.g. FITS and NOA) maintained by
community
Access to facilities requires the deposit of raw data, although
this can be embargoed
International data centres e.g. Sloan Digital Sky Survey -
www.sdss.org
Large volumes of data so transfer can be difficult
Few IPR issues compared to other disciplines
Data products are not always shared
21. Galaxy Zoo
Citizen Science project started to
classify a million galaxies imaged by
the Sloan Digital Sky Survey
Over 50 million classifications in the
first year, contributed by more than
150,000 people
Classifications were as good as those
from professional astronomers
Further projects in astronomy,
climatology, biology, humanities… www.galaxyzoo.org
22. Research data typology
Commissioned by RLUK
Aim: to help librarians improve their ability to
engage with researchers on RDM matters; and
to enable them to acquire a better
understanding of the needs of researchers
A resource structured around a suggested
typology of research data, looking at different
ways in which data might be categorised
23. Broad data types
1. How do researchers generate and process data, and
for what purpose?
1.1 Method of creation and collection of research data:
where the data comes from
1.2 Readiness of research data: extent to which data
has been processed
1.3 Use of research data: researchers' main purpose for
accessing and using data
2. In what file formats, media and volumes do researchers
generate data?
2.1 Medium and format for research data: objects in which
data is captured and recorded, electronic storage and file
types
2.2 Electronic data volumes: size of files (this is subjective,
and based largely on the perception of researchers
3. How do researchers manage and store their data? 3.1 Storage of research data: where and how data is kept
3.2 Types of metadata: not an exhaustive list, but these are
widely-recognised metadata standards
3.3 Metadata standards
3.4 Degree of openness: founded on Royal Society's
categorisation of 'intelligent openness'
3.5 Licensing of research data: legal rights appertaining the
use of the data
24. An expandable resource
A scaffold onto which disciplinary examples can be
hung
Dynamic resource: community input (from librarians,
but maybe others too?), crowdsourcing
Turning it into an online interactive tool
Refreshing, curating, adapting the resource
Basic introduction at
http://www.powtoon.com/show/fZDm1s0W6TI/research-data-typology-for-rluk-
draft/
25. Conclusions
Lots of work still to do!
Domains different in all respects: data, methods, key
RDM concerns, level of infrastructure and support…
Differences exist at sub-discipline level
Need to understand the area
Developing and using RLUK’s typology
26. How to plug the gaps?
Dozens of different repositories or databases
specialising in sub-domains or data types, but still major
gaps
– Shared services?
– Institutional services – specialising rather than generic?
– Role of publishers and learned societies?
– Funder calls for domain specific infrastructure?
– Unis to support ground-up development of tools / services?
• How can the sector help domain-specific solutions to
mature and thrive?