Presentation fro the CIRCE workshop on ISS data preservation and use. Presents finding from the RECODE project on the value of making data open from the perspective of different research disciplines.
Realising the value of open data: some disciplinary perspectives
1. Realising the
value of open
data:
Some disciplinary
perspective
Susan Reilly, LIBER Projects Manager
susan.reilly@kb.nl
@skreilly
2. Overview
• Introduction: Policy RECommendations
for Open access to research Data in Europe
(RECODE)
• The open research data agenda
• Case studies: drivers and barriers
• The way forward
3. Project ReCODE
The project will leverage existing
networks, communities and projects to
address challenges within the open
access and data dissemination and
preservation sector and produce policy
recommendations for open access to
research data based on existing good
practice.
4. Project ReCODE Objectives
• Reduce stakeholder fragmentation
• Identify stakeholder values and interrelationships
• Identify gaps, tensions and good practices
• Produce a set of guidelines for the sharing
of scientific data
• Engagement of stakeholders
• Use 5 cases from different disciplines
5. By Ken Lund (Flickr: Why, Arizona (2)) [CC-BY-SA-2.0
(http://creativecommons.org/licenses
6. Clear benefits of open data
But if we really want researchers to open
their data, maybe we should move from
the general to the specific
http://fav.me/d1y5efr
7. Because there are barriers too…
•
•
•
•
•
•
Cultural differences
Definition of research data
Lack of skills/education
Poorly defined roles and responsibilities
Lack of infrastructure
Lack of career incentives
9. Particle Physics
• Practice
– Large scale collaborative
– Numerical data, complex analysis software
and hardware
– Long time scale
– Grid anlysis
• Motivation
– Access for comparision, error testing, less
duplication of effort
10. Particle physics
• Barriers
– Size of data
– Relevance
– Cost of openness
– Complexity
– Needs context (metadata)
– Culture of collaboration
+ competition
11. Health Science
• Practices
– Interdisciplinary
– Different data types and sources
– Many stakeholders (commercial, government,
practice)
• Motivations
– Faster advancement, more reliable results,
access to negative result, duplication,
understand genome
12. Health Science
• Barriers
– Anonymisation
– Commericial interests (competition)
– Variety of formats
– Quality metadata
13. Archeology
• Practice
– Highly individual, fieldwork
– Lots of data formats
– Lacks standardisation in language,
terminology and measurement
• Motivations
– Not replicable, cumulative knowledge,
creating narrative
14. Archeology
• Barriers
– Legacy data
– Not digital
– Context is key- metadata, interoperability
– Unclear research parameters
– Specific skill sets needed (e.g. coding)
– Cost
15. How do we define open access to research data?
• We can define ‘open access’ (see Berlin
Declaration):
license to copy, use, distribute and display material subject to
proper attribution of authorship and appropriate standard format,
online repository, enable unrestricted distribution,interoperability,
and long-term archiving.
• But how do we define research data?
Data underlying publications, all experimental data? Disciplines
need to define what data should be made open
16. The entire data lifecycle must be addressed
• Open access to data extends across the life
cycle of the production of knowledge, from
ethical concerns about data collection,
characteristics of data collection, data analysis,
data management, access to findings, and the
status of findings.
• Although some developments are shared across
research practices, these are adapted within
specific disciplines
17. Stakeholder fragmentation
• What is the real cost of open data?
• Universities, publishers, public and private
research organizations, software developers,
libraries, funding bodies and repositories within
national, world regions and global science ecosystems
• High interdependency, but lack
of clarity around roles and
Responsibilities
By Oneblackline (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-3.0 (http://creativecommons.org/licenses/by/3.0)], via Wikimedia Commons
18. Infrastructure & technologies
• Interoperability
• Scalability
• Data quality
• Automatically
executable policies
By Anonymous (Guillaume Blanchard, Juillet 2004, Fujifilm S6900.) [CC-BY-SA-2.5-2.0-1.0 (http://creativecommons.org/licenses/by-sa/2.5-2.0-1.0), GFDL
(http://www.gnu.org/copyleft/fdl.html), CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/) or FAL], via Wikimedia Commons
19. Legal and ethical issues
• Intellectual property
– the database directive, copyright agreements
with publishers, can we (libraries/repositories)
change the format of data?
• Data protection
– right to be forgotten
20. A word on the long tail of research data…
• Data that does not fall within the scope of
discipline/government repositories
• https://rd-alliance.org/groups/long-tailresearch-data-ig/wiki/objectives-interestgroup.html