REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
1. (Open) Research Data Management in
H2020
ISERD – Tel Aviv, Oct 31, 2016
@openaire_eu
Natalia Manola
OpenAIRE managing director
Athena Research & Innovation Centre
Credits
The OpenAIRE team
Sara Jones, Data Curation Center (DCC), UK
Marjan Grootveld, DANS, NL
2. Outline
• OpenAIRE – who are we?
• H2020 policies – what’s involved?
• Research data management – what about?
• Data Management Plan (DMP) – how to?
• Lessons learnt – what to avoid?
• OpenAIRE – what do we offer?
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 2
3. Who we are
• EU project
• In 24x7 operation since Dec
2010
• OpenAIRE
• OpenAIREplus
• OpenAIRE2020
• Consortium of 50 partners
• One of 5 key EU e-
Infrastructures
• Institutional, national and
international perspectives on OA
policies & e-Infrastructures
Open Access experts
• Building efficient e-Infra technologies
• State of the art technologies (big
data, linked data)
Information & Computer
Science experts
• Legal &policy recommendations
Legal experts
• Best practices for data
• Linking to data infrastructures
Data communities
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 3
4. Human
Network
50 Partners from every EU country, and beyond
Data centers, universities, libraries, repositories, legal experts
Digital
Network
… fosters the social and technical links
that enable Open Science in Europe and beyond
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
National Open Access
Desks (NOADs)
33 OA expert nodes in all
Europe
• (OA) Policy aligning
• Technical assistance
• Training 4
5. Integrated Scientific Information System
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 5
17.2 mi unique publications
760 validated data providers
370Κ publications linked to
projects from 6 funders
28 K datasets linked to
publications or funders
3.5K links to software
6. RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
H2020 Policies
7. The following European Commission branded slides come
from the EC’s open access team and provide an overview
to the key points. Content from Jean-Francois Dechamp
and colleagues.
Mail: RTD-open-access@ec.europa.eu
Web: http://ec.europa.eu/research/openscience/index.cfm
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 7
8. H2020 Open Access
policies
• Publications
• Openly accessible and minable.
Eligible costs for APCs.
• Research data
• Openly accessible research data
can typically be accessed,
mined, exploited, reproduced
and disseminated free of charge
for the user.
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 8
9. RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
Publications
13. Three top reasons to opt out
Whether a (proposed) project participates
in the ORD or chooses to opt out does
not affect the evaluation of that project.
Proposals will not be penalised for opting
out
14.
15. The EC Open Research Data pilot
Key sources of information
• Guidelines on Open Access to Scientific Publications and Research Data in
Horizon 2020
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-
hi-oa-pilot-guide_en.pdf
• Guidelines on Data Management in Horizon 2020
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-
hi-oa-data-mgt_en.pdf
• Annotated model grant agreement, clause 29.3
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/amga/h2020-
amga_en.pdf
• New infographic summarising key policy points
http://ec.europa.eu/research/press/2016/pdf/opendata-infographic_072016.pdf
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 15
17. A FAIR approach to data
• Findable
– Assign persistent IDs, provide metadata, register in a searchable
resource...
• Accessible
– Retrievable by their ID using a standard protocol, metadata remain
accessible even if data aren’t...
• Interoperable
– Use formal, broadly applicable languages, use standard vocabularies,
qualified references...
• Reusable
– Rich metadata, clear licences, provenance, use of community
standards...
www.force11.org/group/fairgroup/fairprinciples
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 17
18. Findable
• Use metadata and specify standards for metadata creation (if
any). If there are no standards in your discipline describe what
type of metadata will be created and how
• Search keywords
• Persistent and unique identifiers such as DOIs or other
handles
• File and folder naming conventions
• Versioning of the datasets and clear version numbers
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 18
19. Metadata and documentation
• Metadata and documentation is needed to find and
understand research data
• Think about what others would need in order to find,
evaluate, understand, and reuse your data
• Get others to check the metadata to improve quality
• Use standards to enable interoperability
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 19
20. Where to find metadata standards
Metadata Standards Directory
Broad, disciplinary listing of
standards and tools
Maintained by RDA group
http://rd-alliance.github.io/metadata-directory
Biosharing
A portal of data standards,
databases, and policies for life,
environmental and biomedical
sciences
https://biosharing.org
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 20
21. Accessible
• Explain which data can’t be shared openly, if any
• Specify how access will be provided in case of restrictions,
e.g., through a data committee, a license, or arranged with the
repository
• Will methods or software tools needed to access the data (if
any) be included or documented?
• Deposit the data and associated metadata, documentation and
code preferably in certified repositories which support Open
Access
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 21
22. Where to find a repository?
More information: https://www.openaire.eu/opendatapilot-repository
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
What to deposit?
a. the data needed to validate the results
presented in scientific publications, including the
metadata;
b. any other data, including the metadata, as
specified in the DMP;
c. plus for a-b the documentation and the tools
that are needed to validate the results, e.g.
specialised software or software code,
algorithms and analysis protocols (when
possible, these instruments themselves).
22
24. How to select a repository?
• Look into your community, university, publisher, funder etc.
• Check they match your particular data needs: e.g. formats
accepted; mixture of Open and Restricted Access
• See if they provide guidance on how to cite the deposited data
• Do they assign a persistent & globally unique identifier for
sustainable citations and to links back to particular researchers
and grants?
• Look for certification as a ‘Trustworthy Digital Repository’ with
an explicit ambition to keep the data available in long term.
www.openaire.eu/opendatapilot-repository
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 24
25. Zenodo
Multi-disciplinary repository used for the long-tail of research
data
• An OpenAIRE-CERN joint effort
• Multidisciplinary repository accepting
– Multiple data types
– Publications
– Software – link to Github
• Assigns a Digital Object Identifier (DOI), up t 50GB per
dataset
• Links funding, publications, data & software
www.zenodo.org
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 25
26. Choose appropriate file formats
If you want your data to be re-used and sustainable in the long-term, you
typically want to opt for open, non-proprietary formats.
Type Recommended Avoid for data
sharing
Tabular data CSV, TSV, SPSS portable Excel
Text Plain text, HTML, RTF
PDF/A only if layout matters
Word
Media Container: MP4, Ogg
Codec: Theora, Dirac, FLAC
Quicktime
H264
Images TIFF, JPEG2000, PNG GIF, JPG
Structured data XML, RDF RDBMS
Further examples:
www.data-archive.ac.uk/create-manage/format/formats-table
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 26
27. File format considerations
• No clearcut definitions of “sustainable file format”
• Each archives has its own expertise, relative to its designated
community Examples:
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
4TU.ResearchData DANS
Level 1 Level 2 or 3 Preferred Accepted
audio .wav .ra, .mp3, .wma .wav, .flac .aiff, .mp3, .aac
chemistr
y
NMR, ChemDoodle,
… .pdb, .xyz
database
s
delimited flat file
w/DDL .mdb, .dbf, .acdb .sql, .siard, .csv
.mdb, .dbf, .hdf5
…
video
.mp1, .mp2, .mp4,
.mov …
.mpg2, .mpg4, .avi,
.mov .mkv
27
28. Interoperable
• Interoperability on data and metadata, on data exchange
formats and protocols
• Specify what data and metadata vocabularies, standards or
methodologies you will follow to facilitate interoperability
• Standard vocabulary to allow inter-disciplinary interoperability
or a mapping from your vocabulary to more commonly used
ontologies?
Aim for compliance to globally accepted practices
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 28
29. • Clarify licences early on
• License the data to permit the widest reuse possible
• Specify a data embargo, if needed
• If data re-use by third parties is restricted, explain why
• How long will the data remain reusable?
• Describe data quality assurance processes
Reusable
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 29
30. www.dcc.ac.uk/resources/how-guides/license-research-data
License research data openly
DCC guide outlines the pros and cons of
each approach and gives practical advice
on how to implement your licence
CREATIVE COMMONS LIMITATIONS
NC Non-Commercial
What counts as commercial?
ND No Derivatives
Severely restricts use
These clauses are not open licenses
Horizon 2020 Open Access
guidelines point to:
or
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 30
31. What should be preserved and shared?
• The data needed to validate results in scientific publications
(minimally!).
• The associated metadata: the dataset’s creator, title, year of publication,
repository, identifier etc.
• Follow a metadata standard in your line of work, or a generic
standard, e.g. OpenAIRE or DataCite, and be FAIR.
• Documentation: code books, lab journals, informed consent forms –
domain-dependent, and important for understanding the data and
combining them with other data sources.
• Software, hardware, tools, syntax queries, machine configurations –
domain-dependent, and important for using the data. (Alternative:
information about the software etc.)
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
Basically, everything that is needed to replicate a study should
be available. Plus everything that is potentially useful for others.
31
32. Rules of thumb for how long
RDNL Selection criteria: http://www.researchdata.nl/en/services/data-management/selecting-research-
data/
DCC How-to guide: http://www.dcc.ac.uk/resources/how-guides/appraise-select-data
• When regenerating data is cheaper than archiving, don’t
archive. Select what data you’ll need and want to retain.
• 10 years is often stated in data policies and academic codes,
but data can be valuable for ages, in climatology, sociology,
health sciences, astronomy, linguistics, … Look beyond
minimal retention periods where relevant.
• “The lifetime of software is generally not as long as that of
data” (Daniel Katz e.a. http://bit.ly/2eScCKp)
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 32
33. How much does it cost? Who pays?
• What are the costs for making data FAIR in your project?
• Resources needed for long term preservation
• Check the UK Data Service Costing model
• The High Level Expert Group on the European Open Science Cloud
recommends that “well budgeted data stewardship plans should be
made mandatory and we expect that on average about 5% of
research expenditure should be spent on properly managing and
stewarding data”
• Who pays? How?UKDS model http://www.data-archive.ac.uk/create-manage/planning-for-sharing/costing
HLEG report
http://ec.europa.eu/research/openscience/pdf/realising_the_european_open_science_cloud_2016.pdf#view=fit&pagemode=none
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 33
34. RDM in a nutshell
Develop a
DMP
Select which
data to make
open
License data
openly for the
widest reuse
Use established
community
standards for
interoperability
Provide
metadata for
data discovery
Deposit in
a data
repository
Share details
about the tools
and instruments
used to allow
verification
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 34
35. Managing and sharing data –
A best practice guide
http://data-archive.ac.uk/media/2894/managingsharing.pdf
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 35
37. What is a data management plan?
A brief plan written at the start of a project to define:
• how the data will be created?
• how it will be documented?
• who will access it?
• where it will be stored?
• who will back it up?
• whether (and how) it will be shared & preserved?
DMPs are often submitted as part of grant applications, but are useful
whenever researchers are creating data
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
The DMP is a living document.
You are not required to provide
detailed answers to all the
questions in the first version of
the DMP (due M6)
37
Explain any selection criteria in the DMP
38. DMPonline
A web-based tool to help researchers write DMPs
Includes a template for Horizon 2020 – FAIR principles
https://dmponline.dcc.ac.uk
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 38
39. How the tool works
Click to write a
generic DMP
Or choose your
funder to get their
specific template
Pick your uni to add
local guidance and
to get their template
if no funder applies
Choose any
additional optional
guidance
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 39
40. DMPonline for H2020
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
Focus on how you will
ensure your data are
“FAIR”
KEEP IT UP TO
DATE
40
42. Example plans
• 108 DMPs from the National Endowment for the Humanities
www.neh.gov/divisions/odh/grant-news/data-management-plans-successful-
grant-applications-2011-2014-now-available
• 20+ scientific DMPs submitted to the NSF (USA) provided by UCSD
• http://libraries.ucsd.edu/services/data-curation/data-management/ dmp-
samples.html
• Example DMP collection from Leeds University
• https://library.leeds.ac.uk/research-data-tools
• Further examples:
• www.dcc.ac.uk/resources/data-management-plans/guidance-examples
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 42
43. Example H2020 DMPs in Zenodo
• Helix Nebula – High Energy Physics example
• https://zenodo.org/record/48171#.WATexnriF40
• Tweether – engineering (micro-electronics) example
• https://zenodo.org/record/55791#.WATei3riF40
• AutoPost – ICT example
https://zenodo.org/record/56107#.WATefXriF40
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 43
44. Different DMP for different needs
• No strict rules – Every project has different
requirements
• Approach the DMP in whatever way best fits your project
• EC template is intended as a service, not an obligation
• Read the background information and the guidance, and use it
as a checklist.
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 44
45. Example: OpenMinTeD
OpenMinTed aims to create an
infrastructure for Text and Data
Mining (TDM) of scientific and
scholarly content
Have adopted own structure to
create a ‘Data and Software
Management Plan’
http://openminted.eu
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 45
46. Example: OpenMinTeD – Data chapter
Six types of datasets
identified
1. Scholarly publications
2. Language and knowledge resources
3. Services and workflows
4. Automatically and manually
generated annotations
5. Consortium publications
6. Metadata
Described in a table per
dataset
E-Infra structure: data in
transit
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 46
47. Example OpenMinTeD – Software
Lots of software components
Background vs. foreground
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 47
48. Example: CAPSELLA
CAPSELLA aims to develop ICT
solutions for farmers and other
actors engaged in
agrobiodiversity
Devised a questionnaire to
collate datset information from
project partners
Identified 13 datasets, 6 of which
are imported as is, 3 aggregated,
3 transformed and 1 generatedwww.capsella.eu
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 48
Focus on data produced
51. Example: tweether
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
Travelling wave tube based w‐band
wireless networks with high data rate
distribution, spectrum
& energy efficiency
The academic institutions participating in TWEETHER have
available appropriate depositories which in fact are linked to
OpenAIRE.
Apart from these repositories, TWEETHER will also use
ZENODO to ensure the maximum dissemination of the
information generated in the project (research publications and
data)…
51
52. Example metadata description
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
Minimal metadata
More than one dataset? Describe generically what is
possible and dataset-specific what is necessary.
52
53. Data description examples
The final dataset will include self-reported demographic and
behavioural data from interviews with the subjects and laboratory
data from urine specimens provided.
NIH data sharing statements
Every two days, we will subsample E. affinis populations growing
under our treatment conditions. We will use a microscope to identify
the life stage and sex of the subsampled individuals. We will document
the information first in a laboratory notebook and then copy the
data into an Excel spreadsheet. The Excel spreadsheet will be saved
as a comma separated value (.csv) file.
DataOne – E. affinis DMP example
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 53
54. Metadata examples
Metadata will be tagged in XML using the Data Documentation Initiative (DDI)
format. The codebook will contain information on study design, sampling
methodology, fieldwork, variable-level detail, and all information necessary for
a secondary analyst to use the data accurately and effectively.
ICPSR Framework for Creating a DMP
We will first document our metadata by taking careful notes in the laboratory notebook that refer to
specific data files and describe all columns, units, abbreviations, and missing value
identifiers. These notes will be transcribed into a .txt document that will be stored with the
data file. After all of the data are collected, we will then use EML (Ecological Metadata
Language) to digitize our metadata. EML is one of the accepted formats used in ecology, and
works well for the types of data we will be producing. We will create these metadata using Morpho
software, available through KNB. The metadata will fully describe the data files and the context of
the measurements.
DataOne – E. affinis DMP example
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 54
55. Data sharing examples
We will make the data and associated documentation available to users under a data-sharing agreement
that provides for: (1) a commitment to using the data only for research purposes and not to identify any
individual participant; (2) a commitment to securing the data using appropriate computer technology; and
(3) a commitment to destroying or returning the data after analyses are completed.
NIH data sharing statements
The videos will be made available via the bristol.ac.uk website (both as streaming media and
downloads) HD and SD versions will be provided to accommodate those with lower bandwidth. Videos
will also be made available via Vimeo, a platform that is already well used by research students at
Bristol. Appropriate metadata will also be provided to the existing Vimeo standard.
All video will also be available for download and re-editing by third parties. To facilitate this Creative
Commons licenses will be assigned to each item. In order to ensure this usage is possible, the required
permissions will be gathered from participants (using a suitable release form) before recording
commences.
University of Bristol Kitchen Cosmology DMP
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 55
56. Restriction on use examples
Because the STDs being studied are reportable diseases, we will be collecting
identifying information. Even though the final dataset will be stripped of identifiers
prior to release for sharing, we believe that there remains the possibility of
deductive disclosure of subjects with unusual characteristics. Thus, we will make
the data and associated documentation available to users only under a data-
sharing agreement.
NIH data sharing statements
1. Share data privately within 1 year.
Data will be held in Private Repository, but metadata will be public
2. Release data to public within 2 years.
Encouraged after one year to release data for public access.
3. Request, in writing, data privacy up to 4 years.
Extensions beyond 3 years will only be granted for compelling cases.
4. Consult with creators of private CZO datasets prior to use.
Pis required to seek consent before using private data they can access
Boulder Creek Critical Zone Observatory DMP
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 56
57. Archiving examples
Data will be provided in file formats considered appropriate for long-term access, as recommended by the UK
Data Service. For example, SPSS Portal format and tab-delimited text for qualitative tabular data and RTF
and PDF/A for interview transcripts. Appropriate documentation necessary to understand the data will also
be provided. Anonymised data will be held for a minimum of 10 years following project completion, in
compliance with LSHTM’s Records Retention and Disposal Schedule. Biological samples (output 3) will be
deposited with the UK BioBank for future use.
Writing a Wellcome Trust Data Management and Sharing Plan
The investigators will work with staff at the UKDA to determine what to archive and
how long the deposited data should be retained. Future long-term use of the data will
be ensured by placing a copy of the data into the repository.
ICPSR Framework for Creating a DMP
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 57
58. Lessons learnt and
a few planning
tricks
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 58
59. Start early
• Negotiation on licenses and consent agreement may
preclude later sharing if not careful
• Costs cannot be included retrospectively
• Useful to consider data issues at the consortium negotiation
stage to make sure potential issues are identified and sorted
asap
• Involve all work packages and partners to get a coherent
plan.
• Focus effort on datasets you’ll create rather than reuse
Decisions made early on affect what you can do laterRDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 59
60. Think backwards
What data organisation would a re-user like?
CREATING
DATA
PROCESSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
Think about the desired end result and plan for thi
“Sharing” means “outside the consortium”
60
62. Note: It’s more than open data
CC-BY Andreas Neuhold
https://commons.wikimedia.org/wiki/File:Open_Science_-_Prinzipien.png
• Access to research facilities
• Access to processing
capabilities
• Communication at all levels
of research life cycle
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 62
63. Limitations on current DMPs
• DMP valuation is project based
• Too much text – Text and data mining to get
aggregate knowledge
• Machine readability
• Linking to infrastructures
• Semi-automated evaluation and comparison
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 63
64. How can Open AIRE
help?
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 64
65. OpenAIRE support
materials
Briefing papers, factsheets,
webinars, workshops, FAQs
Information on
• Open Research Data Pilot
• Creating a data
management plan
• Selecting a data repository
• Personal data
Developing guidance to add
to DMPonline
https://www.openaire.eu/opendatapilot
https://www.openaire.eu/support
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 65
66. OpenAIRE services
• Researchers
• Zenodo for all types of publications, data and software
• Claiming – linking research results
• Amnesia, an anonymization tool for all
• Data providers – Interoperability Guidelines, validation,…
• Project coordinators – reporting
• Funders and institutions – monitoring
• Research communities – gathering, monitoring all research
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 66
DASHBOARDS
67. RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
Literature
Repositories
OA Journals
Funding Info
Validation
Cleaning
De-duplicating
Inferring
Linking
Organiz
ations
Projects
Authors
Dataset
s
Publicat
ions
Data
Provider
s
…
Monitoring
Reporting
Evaluation
Impact
Classification
Clustering
Analysis
CRIS systems
An EU-CRIS system
Data
Repositories
Metadata
Full text
Usage data
Discovery
Crowdsourcing
APIs
Trends
Aggregators
Enriching
ServicesOpenAIRE PlatformData Providers
Routing
Archives
Guidelines
68. Integrated Scientific Information System
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
17.3 mi unique publications
760+ validated data providers
370Κ publications linked to
projects from 6 funders
28 K datasets linked to
publications
3.5K links to software
repositories
33K organizations
Organization
s
Projects
AuthorsDatasets
Publications
Data
Providers
Software Facilities Methods
Research
Communities
OpenAIRE-Connect
From January 2017
68
69. World-wide alignment &
synergies
Interoperability alignment, sharing
technologies & services
• La Refencia: Latin America repository
network
• JAIRO – Japanese Institutional
Repositories Online
• REMERI – Mexican Network of
Institutional Repositories
• …
• ICSU/World Data Service – A network of
70 certified data centers
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 69
70. Monitoring & reporting
• Funders
• Institutions
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
+
NSF, NIH (US)
NOW (NL)
MSES, CSF (HR)
SFI (IR)
DFG (DE)
…..Towards an EU-CRIS system
70
71. Project level– Automatically export results
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
Project’s web site
EC front-end system
71
72. …to the EC's participant portal
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
EC back-end reporting
72
The use of a Data Management Plan (DMP) is required for projects participating in the Open Research Data Pilot, detailing what data the project will generate, whether and how they will be exploited or made accessible for verification and re-use, and how they will be curated and preserved.
In multi-beneficiary projects it is also possible for specific beneficiaries to keep their data closed if relevant provisions are made in the consortium agreement and are in line with the reasons for opting out
Please take time to read Background information and the guidance in the Annex, because the questions in the template are not all clear on their own.
What metadata will be created? In case metadata standards do not exist in your discipline, please outline what type of metadata will be created and how.
Are the data produced and/or used in the project discoverable with metadata, identifiable and locatable by means of a standard identification mechanism (e.g. persistent and unique identifiers such as Digital Object Identifiers)?
What naming conventions do you follow?
Will search keywords be provided that optimize possibilities for re-use?
Do you provide clear version numbers?
Metadata is needed to locate and understand the data. When you are deciding what information to capture, think about what others would need in order to find, evaluate, understand, and reuse your data; the EC template also mentions keywords. Also get others to check your metadata to improve the quality and make sure it’s understandable to others. Standards should be used where possible.
To make sure their data can be understood by themselves, their community and others, researchers should create metadata and documentation.
Metadata is basic descriptive information to help identify and understand the structure of the data e.g. title, author...
Documentation provides the wider context. It’s useful to share the methodology / workflow, software and any information needed to understand the data e.g. explanation of abbreviations or acronyms
For Accessibility the Guidance contains more questions:
Which data produced and/or used in the project will be made openly available as the default? If certain datasets cannot be shared (or need to be shared under restrictions), explain why, clearly separating legal and contractual reasons from voluntary restrictions.
Note that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their data closed if relevant provisions are made in the consortium agreement and are in line with the reasons for opting out.
How will the data be made accessible (e.g. by deposition in a repository)? What methods or software tools are needed to access the data?Is documentation about the software needed to access the data included? Is it possible to include the relevant software (e.g. in open source code)?
Where will the data and associated metadata, documentation and code be deposited? Preference should be given to certified repositories which support open access where possible.
Have you explored appropriate arrangements with the identified repository?If there are restrictions on use, how will access be provided?Is there a need for a data access committee?Are there well described conditions for access (i.e. a machine readable license)? How will the identity of the person accessing the data be ascertained?
Are the data produced in the project interoperable, that is allowing data exchange and re-use between researchers, institutions, organisations, countries, etc. (i.e. adhering to standards for formats, as much as possible compliant with available (open) software applications, and in particular facilitating re-combinations with different datasets from different origins)?
What data and metadata vocabularies, standards or methodologies will you follow to make your data interoperable?
Will you be using standard vocabularies for all data types present in your data set, to allow inter-disciplinary interoperability?
In case it is unavoidable that you use uncommon or generate project specific ontologies or vocabularies, will you provide mappings to more commonly used ontologies?
How will the data be licensed to permit the widest re-use possible?
When will the data be made available for re-use? If an embargo is sought to give time to publish or seek patents, specify why and how long this will apply, bearing in mind that research data should be made available as soon as possible.
Are the data produced and/or used in the project useable by third parties, in particular after the end of the project? If the re-use of some data is restricted, explain why.
How long is it intended that the data remains re-usable? Are data quality assurance processes described?
Guidance from the DCC can also help researchers to understand data licensing. This guide outlines the pros and cons of each approach e.g. the limitations of some CC options
The OA guidelines under Horizon 2020 point to CC-0 or CC-BY as a straightforward and effective way to make it possible for others to mine, exploit and reproduce the data. See p11 at: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf
Re Software etc: you might also think of virtual machines with the corresponding setup information.
In many cases copyright will prevent the archiving of software and tools. The alternative is a sensible description of configuration settings etc.
From the start, the DCC has offered guidance, independent of funder or discipline. EUDAT and OpenAIRE and others are developing extra guidance as well.
EC: “Since DMPs are expected to mature during the project, more developed versions of the plan can be included as additional deliverables at later stages. (…) New versions of the DMP should be created whenever important changes to the project occur due to inclusion of new data sets, changes in consortium policies or external factors.”
Make sure that you know what will be asked of you for the mid-term and the final review: the focus here is on enabling reuse of your data – by your future self and others.
In subsequent reviews (or any time they feel like) the PO and reviewers may check to see if the DMP is followed (e.g., data files deposited, access status, metadata format, ...).
Example: the OpenMinTed project combines software with the H2020 DMP issues.
OpenMinted is an EINFRA project, which means that it is building an e-Infrastructure and data is passing through
Basically, the project partners have selected from the long list of the SSI template what is relevant for them.
Capsella is an ICT project (RIA).
Let’s adopt the perspective of a future data user – maybe yourself: what should your data organisation – folders with data, metadata and documentation – look like at the moment that you start sharing - outside your team - and archiving?
When you are part of a large project which has been going on for some years already, this may be obvious, but for many researchers it isn’t clear from the start.
To answer that broad question, you want to come up, at an early stage, with answers regarding:
Types and formats of data;
New and/or existing;
Expected size;
Metadata;
Documentation;
Software.
It’s no fun to do the exercise by yourself, so use this as a communication opportunity.
[final bullet] Acting on requests from the community, DMPonline will add an ‘export to Zenodo’ feature alongside the other export options. You might want to use this to increase your project’s transparancy, share good practices, or maybe because you write your DMP as a (kind of ) data paper, which is interesting in its own right. At the moment there are a few H2020 DMPs in Zenodo and figshare.
The openaire infrastructure is a DECENTRALIZED infrastructure, collecting data from data providers all over Europe, and beyond.
Through openAIRE guidelines information from these repositories is presented and transferred in a standardised way. The combined data are enriched, cleaned, inferred through TDM to make it one EU-CRIS system, linking different information streams.
Having all this, we are able to build (or let others) evaluation/impact/assessement services.