The European Open Science Cloud (EOSC) has become a driving force behind the current evolution of e-Infrastructure to support research. The EOSC offers the vision of an integrated ecosystem of data, services and expertise providing a common platform for open cross-community research in Europe and beyond. In this session, I shall consider the aims of the EOSC and discuss some the opportunities it offers, and barriers it needs to overcome to realise the vision. I shall introduce the EOSC-Pilot project which is aiming to pave the way towards the EOSC by exploring the opportunities and barriers, and proposing how the EOSC should evolve, both technically, including its architecture, and organisationally, including how it should be managed. Participants will be invited to consider what the issues of the EOSC are and how it might affect their own domain.
Visit: https://www.eudat.eu/eudat-summer-school
Open Data and Cross Disciplinary Research - EUDAT Summer School (Brian Matthews, EOSC)
1. www.eudat.eu
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Open Data and Cross-disciplinary Research:
The European Open Science Cloud
Brian Matthews
Science and Technology Facilities Council
2. EUDAT Summer School, 3-7 July 2017, Crete
Contents
1. Why Open Science ?
2. Research Infrastructures and open science
3. Towards the European Open Science Cloud
4. What do we need to do build an EOSC ?
5. The EOSC Pilot Project
4. EUDAT Summer School, 3-7 July 2017, Crete
Open Science
Open science is the movement to make scientific research,
data and dissemination accessible to all levels of an
inquiring society, amateur or professional. (Wikipedia).
03/07/17
5. EUDAT Summer School, 3-7 July 2017, Crete
Open Science not new
http://www.darwinproject.ac.uk/darwins-letters
6. EUDAT Summer School, 3-7 July 2017, Crete
The Age of the Journals
But the world became too big
Journals became the main mechanism for scientific communications
• Printing
• Quality control
• Dissemination
• Priority
• Permanent record
• Credit
Worked pretty well for ~100 years
- Particularly since 50s
Main basis of evaluation
- Citation and impact factors
But it is a narrow, controlled viewpoint
7. EUDAT Summer School, 3-7 July 2017, Crete
Disruptive Technology
Computing technology has changed the way people do research
• Generating large amounts of data
• Aggregating large amounts of data
• Processing large amounts of data
• Visualising large amounts of data
Changed the way people talk about research
• Email, websites, newsgroups, blogs, social media, presentations …
Open science offers new ways to do science.
• Meet the new challenges – handling and processing large amounts of data
• Back to the “old ways” – but on a much larger scale
9. EUDAT Summer School, 3-7 July 2017, Crete
Why Open Science?
Opportunities for Data Exchange (ODE)
EC FP7 Project: 2010-12
Workshops and interviews
Conceptual model
Drivers, barriers, enablers to data sharing
R. Darby, S. Lambert, B. Matthews, M. Wilson, K. Gitmans, S. Dallmeier-
Tiessen, S. Mele, J. Suhonen Enabling Scientific Data Sharing and Re-use.
IEEE Conf. on E-Science, Chicago, Oct 2012.
http://www.alliancepermanentaccess.org/index.php/communit
y/current-projects/ode/
10. EUDAT Summer School, 3-7 July 2017, Crete
Drivers for Open Science
• Better scrutiny of research
• Validation and verification
• Opening up peer review
• Reproducing results
• 70% of researchers failed to reproduce others experiments (Nature, May 16
https://www.nature.com/polopoly_fs/1.19970!/menu/main/topColumns/topLeftColumn/
pdf/533452a.pdf )
• Prevalence of irreproducible preclinical research exceeds 50% (PLOS Biology 2015,
https://doi.org/10.1371/journal.pbio.1002165 )
• Confidence in the scientific method
03/07/17
11. EUDAT Summer School, 3-7 July 2017, Crete
Drivers to Open Science
• Better Reuse of research
• Easier to Find, Access, Interoperate, Reproduce data
• Not regenerating data needlessly
• Can try data in new situations
• Multidisciplinary science
• Public funded research belongs to the public
More science Impact
Funders see it as a way of getting more Research for the
same money
03/07/17
12. EUDAT Summer School, 3-7 July 2017, Crete
Barriers to open science?
• Availability of a Sustainable Data Management Infrastructure
• And expertise
• And ease of use
• http://cameronneylon.net/blog/as-a-researcher-im-a-bit-bloody-fed-up-with-
data-management/
• Not knowing where data is
• Not being able to access it
• Not being able to understand it sufficiently to reuse.
• Trustworthiness of the data,
• Data Usability,
• Finance
• Funding
• Legislation/Regulation
03/07/17
13. EUDAT Summer School, 3-7 July 2017, Crete
Cultural Barriers to Data Sharing
Publisher Practises:
Journal articles do not describe
available data as a publication
Data not recognised as a citable
publication
Lack of data reviewers to assess data
quality
Personal data confidentiality
Anonymity of subjects in medical and
social science in particular
Perceived conflicts between data
protection and FOI
Thus unrestricted data access has
ethical implications
Research Assessment
Publication and citation of data not
tracked
Not counted as part of performance
evaluation for careers
Academic Defensiveness
Fear that others will benefit from their data
and gain priority for results
Fear that their results will not be validated
Fear that misuse of data will harm the
data contributor
Fear that use of data to support
arguments the data contributor
disagrees with
14. EUDAT Summer School, 3-7 July 2017, Crete
INFRASTRUCTURES FOR OPEN
SCIENCE
03/07/17
15. EUDAT Summer School, 3-7 July 2017, Crete
WLCG: a Global Infrastructure
15
Varied distributed data model for multi-
petabyte datasets. Either:
1. Move, cache and locally process
2. Remote data access (AAA or FAX)
3. Hybrid of 1&2 (mainly cached)
4. Event put services for opportunistic HPC
and cloud computing
Which is used depends on many factors but
ever growing exploitation of wide area
network use of remote data access
30GB/s
Global Collaboration
• 42 countries
• 170 computer centres
• 300PB disk
• 380PB Tape
• 400,000 cores (1 usable
exaflop?)
• >2 million jobs/day
LHC Data placement service
16. EUDAT Summer School, 3-7 July 2017, Crete
... to construct and operate a shared data infrastructure for
Photon and Neutron laboratories...
Neutron
diffraction
X-ray
diffraction
High-quality structure
refinement
• Common data catalogue
• Integration of users data from
different facilities
• Track provenance of data through
analysis stages
• Deploy standards for long-term
curation
• Support scalability through
parallelisation
• Deploy infrastructure in three
different techniques
Open Data Infrastructure (Nov 11–Apr 14)
17. EUDAT Summer School, 3-7 July 2017, Crete
PaN-Data Integration
Shared Data Policy Framework
Federated User Authentication
Federated Data Catalogue
Common Data Format NeXus
Common data environment, common user experience
19. EUDAT Summer School, 3-7 July 2017, Crete03/07/17
ELIXIR connects national bioinformatics
centres and EMBL-EBI into a sustainable
European infrastructure for biological
research data
21. EUDAT Summer School, 3-7 July 2017, Crete
RCUK Principles on Data Policy
Common Principles
1. Public good
2. Preservation
3. Discoverability
4. Confidentiality
5. First use
6. Recognition
7. Costs
A tension between these principles
http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
Publicly funded research data are a public
good, produced in the public interest, which
should be made openly available with as few
restrictions as possible in a timely and
responsible manner that does not harm
intellectual property.
RCUK recognises that there are legal, ethical
and commercial constraints on release of
research data. To ensure that the research
process is not damaged by inappropriate
release of data, research organisation policies
and practices should ensure that these are
considered at all stages in the research
process
22. EUDAT Summer School, 3-7 July 2017, Crete
Context
• G8 Ministerial Communiqué, 2013
“… [publically funded] scientific research data should be
open…”
G7 Ministerial Communiqué, October 2015
Research Data Alliance
Open-data Science Environment
EC Communication on European Science Cloud Initiatives, 19th April
2016
European Open Science Cloud (EOSC)
European Data Infrastructure (EDI)
High Level Expert Group on EOSC
24. EUDAT Summer School, 3-7 July 2017, Crete
Why Europe is not fully tapping
into the potential of data:
Data not always open and lack of incentives and rewards for data sharing
Lack of interoperability required for data sharing … noting deep-rooted walls
between disciplines.
Fragmentation between data infrastructures that are split by scientific and
economic domains, countries and governance models
Surging demand for High Performance Computing at a scale above single member
state resources
Data reuse employing advance analysis techniques adequate protection of
personal data considering forthcoming revision of Copyright legislation.
25. EUDAT Summer School, 3-7 July 2017, Crete
Proposed a European Open Science Cloud
Make all scientific data produced by the Horizon 2020 programme open by
default.
Raise awareness and change incentive structures for academics industry
and public services to share their data.
Develop specification for interoperability and data sharing across
disciplines and infrastructures
Create a fit-for-purpose pan-European governance structure to federate
scientific data infrastructures and overcome fragmentation.
Develop cloud based services for Open science supported by the
necessary data infrastructure
Enlarge the scientific user base to researchers and innovators from all
disciplines.
26. EUDAT Summer School, 3-7 July 2017, Crete
High Level Expert Group for the
"European Open Science Cloud".
http://ec.europa.eu/research/openscience/pdf/hleg/hleg-eosc-first-report_(draft).pdf
27. EUDAT Summer School, 3-7 July 2017, Crete
Definitions
European:
research and innovation are global - EOSC cannot be built exclusively in and for
Europe
Europe, is in a strong position to lead this initiative as already distributed and
collaborative
Open:
not all data and tools can be open. E.g. confidentially and privacy.
Open is also often confused with ‘for free'. Free data and services do not exist.
Intelligently open is what we mean,
Science:
explicitly includes all disciplines including the arts and humanities,
Also societal innovation and productivity,
support broad societal participation in Open Innovation and Open Science.
Cloud:
It can be misinterpreted to indicate that the EOSC is mostly about hard ICT
infrastructure
But it is much more a commons of data, software, standards, expertise and policy
related to data-driven science and innovation.
29. EUDAT Summer School, 3-7 July 2017, Crete
WHAT DO WE NEED TO BUILD AN
EOSC?
03/07/17
30. EUDAT Summer School, 3-7 July 2017, Crete
Technical Challenges: developing technical solutions that
meet the scientific needs
31
EOSCpilot Challenges
Scientific Challenges are really Opportunities
Technical Challenges are Barriers to overcome
Cultural Challenges are also Barriers
Scientific Challenges: deploying the EOSC to
deliver Open Science
Cultural Challenges: adopting new, more open
ways of working
Three types of challenges addressed by the
EOSCpilot:
31. EUDAT Summer School, 3-7 July 2017, Crete
Challenges : Interoperability
Accessing and
understanding data within
and across disciplines
Interoperability of data, tools and
services
‒ Common services, common APIs,
service catalogues
‒ Common formats, common
metadata
‒ Persistent Identifiers: constancy of
reference for data, people, software,
things …
Deepening understanding
‒ Context and provenance: assessing
the quality of data
‒ Comparison between
experiment/observation and
simulation;
‒ Preserving the record of science
‒ Reproducible Science
Working internationally
‒ crossing borders and communities,
‒ across the world
32
32. EUDAT Summer School, 3-7 July 2017, Crete
Challenges : Social and cultural
Changing culture to make the most
of open science
Sharing of data and services
‒ Data: “You can use mine”
‒ Services : “I will use yours”
Developing Skills
‒ Data Scientists : data engineers, data
custodians, data analysts
‒ Expertise in quality software
engineering
Credit where credit’s due
‒ Recognition for sharing
‒ Recognition for contributing
‒ Rewards should follow the contribution
33
33. EUDAT Summer School, 3-7 July 2017, Crete
Challenges : Infrastructure
Accessing shared resources to realise
the promise of data intensive open
science
Accessing Data
‒ Storing, accessing and integrating data at scale:
common data centres and services
‒ Moving data at scale: limitations of networks
‒ Keeping data for the long-term: digital preservation
Accessing Compute
‒ Access to scarce large-scale computing
architectures (HPC, HTC, HPDA)
‒ Co-location of data and compute
‒ Cloud interfaces and Virtual Research Environments
‒ User identity and trusted work-spaces
Accessing Software
‒ Complex code for computational modelling and
simulation
‒ Adapting code to large-scale computing architectures
‒ Data analysis algorithms becoming more
sophisticated
‒ Sustainability of software for the long-term
www.eoscpilot.eu 34
34. EUDAT Summer School, 3-7 July 2017, Crete
So what do we need to do?• Bring the current Research Infrastructures together
• We do not want to replace their work
• Bring the e-Infrastructure projects together
• GEANT , PRACE
• EGI, EUDat, OpenAire
• Open up their services
• Catalogue of services
• Allow people to select services to build new infrastructures
• Open up their data
• FAIR services
• Interoperable standards and metadata
• Allow new resources to be added
• Cloud providers, HPC providers, data providers
• Within the common governance and resourcing processes
• Need some set of core services and processes to hold the EOSC together
03/07/17
37. EUDAT Summer School, 3-7 July 2017, Crete
EOSC-Pilot Project
Setting the EOSC in the right direction
First of the EOSC projects
10M€ over 2 years
• Jan 2017 – Dec 2018
33 Partners + 15 3rd parties
• Led by STFC
• A range of e-Infrastructure providers, research institutes, research consortia, across disciplines.
• EGI, EUDat, OpenAire, PRACE, GEANT
• ELIXIR, ICOS, ECRIN, BBMRI, DESY, CERN, XFEL, CEA
• STFC, CNR, DANS, DCC, BSC, MPG, CNRS
Try to answer some basic questions
• What is the EOSC going to provide?
• How is the EOSC going to operate ?
• How is the EOSC going to change how science is done ?
www.eoscpilot.eu 38
38. EUDAT Summer School, 3-7 July 2017, Crete
EOSCpilot: High Level Aims
The EOSCpilot project will support the first phase in the development of the
EOSC. It will
Establish the governance framework for the EOSC and contribute to the
development of European open science policy and best practice;
Develop a number of demonstrators functioning as high-profile pilots that
integrate services and infrastructures to show interoperability and its
benefits in a number of scientific domains;
Engage with a broad range of stakeholders, crossing borders and
communities, to build the trust and skills required for adoption of an open
approach to scientific research.
(More detailed objectives later)
39. EUDAT Summer School, 3-7 July 2017, Crete40
Workpackages
1. Governance
• Propose a governance framework
2. Policy
• Devise a policy environment
3. Demonstrators
• Use real demonstrators to drive the requirements
for the EOSC
4. Services
• Specify service architecture, catalogue and pilot
services
5. Interoperability
• Identify interfaces and standards to drive
interoperability
6. Skills
• Specify a skills and competencies framework for the
EOSC
7. Engagement
• involve as many stakeholders as possible.
40. EUDAT Summer School, 3-7 July 2017, Crete
Science Demonstrators
First 5 Demonstrators
• Environmental & Earth Sciences - ENVRI Radiative Forcing
Integration to enable harmonised data access and integration across
multiple research communities
• High Energy Physics - WLCG: large-scale, long-term preservation
and re-use of HEP data in the EOSC open to other researchers
• Humanities – TEXTCROWD: Collaborative semantic enrichment of
text-based datasets by make new software available on the EOSC.
• Life Sciences - Pan-Cancer Analyses & Cloud Computing within the
EOSC to accelerate genomic analysis on the EOSC
• Physics - The photon-neutron community to improve the community’s
computing facilities by creating a virtual platform for all users
www.eoscpilot.eu 41
41. EUDAT Summer School, 3-7 July 2017, Crete
2nd Set of Demonstrators
• HPCaaS for Fusion - Culham Science Centre, UK
• Life Science Leveraging EOSC to offload updating and standardizing
life sciences datasets and to improve studies reproducibility,
reusability and interoperability- CRG, Spain
• Seismology: EPOS Virtual Earthquake and Computational Earth
Science e-science environment in Europe- University of Liverpool, UK
• CryoEM Linking distributed data and data analysis resources as
workflows in Structural Biology with cryo-Electron Microscopy:
Interoperability and reuse CSIC, Spain
• Astronomy Open Science Cloud access to LOFAR data - ASTRON,
NL
• 5 more demonstrators to be selected in the autumn.
03/07/17
42. EUDAT Summer School, 3-7 July 2017, Crete
The Governance framework will:
• enable and encourage engagement from the key stakeholder
communities:
European e-Infrastructures, Data and Research Initiatives,
Service and cloud providers, Research funders, Research
Communities and Institutions, Research Infrastructures, Policy
makers.
• enable interoperability and co-ordination within a number of
different domains:
legal interoperability, interoperability of organisational processes,
technical interoperability, operational interoperability, data and
information interoperability
Governance: Approach
43. EUDAT Summer School, 3-7 July 2017, Crete
Undertaken:
Stakeholder mapping exercise
Progressing with a framework which will help conceptualise the range of
stakeholders and interoperability objectives
Assessing different governance approaches across these, and how
these may fit together.
Next Steps:
Planning to have a strawman framework late summer
To gather feedback from a broader community
Feedback initially via online tools and forum
Then via workshops, including EOSCpilot stakeholder event at end
Nov.
Governance: Status and Next Steps
44. EUDAT Summer School, 3-7 July 2017, Crete
Service Infrastructure for the
EOSC
• EOSC Architecture
• “Systems of Systems” approach
• EOSC Service Portfolio
• Rules of Engagement
• Service demonstrators
• with the Science Demos
• EGI and EUDat Services
03/07/17
45. EUDAT Summer School, 3-7 July 2017, Crete
Interoperability
• Service interoperability
‒gap analysis of service frameworks
• Data interoperability
• Recommendations on how to make data
interoperable in the EOSC
‒exploring how FAIR principles apply to EOSC.
‒Base line interoperability metadata
‒Schema.org
03/07/17
46. EUDAT Summer School, 3-7 July 2017, Crete
Reasons
for GAPs
Gap1: Diversity
and
incompatibility
of the AAIs
Gap5: Low
awareness of
the e-
infrastructure
s and
services
Gap2:
Network
services
Gap4:
Diversity of
access
policies
Gap3:
Diversity of
services and
providers
Gap6: Lack
of expertise,
training, easy
tools, human
networks
Service Interoperability: Gap Analysis
47. EUDAT Summer School, 3-7 July 2017, Crete
Bridging
the
GAPs
Gap1:
Global AAI
Gap5:
Common
vocabulary,
global services
catalogue,
dissemination
Gap2: Network
services
improvement
Gap4:
Multidisciplinary
mutualised space
Gap3: Services
technical
interoperability
Gap6: Foster
adoption,
expertise
sharing, user
friendly tools,
human networks
Service Interoperability: Bridging the gaps.
48. EUDAT Summer School, 3-7 July 2017, Crete
Operational and
managerial
independence
each system is independent and it achieves its purposes by itself
and for its own objective rather than for the purposes of the SoS
Geographical
distribution
a SoS is distributed over a large geographic extent
Emergent
behavior
a SoS has capabilities and properties that do not reside in the
component systems
Evolutionary
development
a SoS evolves with time and experience
Heterogeneity
of constituent
systems
a SoS consists of multiple, heterogeneous, operating systems
embedded in networks at multiple levels
Components:
Existing and emerging RIs,
e-Infras, data repositories,
registries,…
Architecture: Systems of Systems Approach
The EOSC needs to be developed as a data infrastructure
commons:
• an eco-system of infrastructures
• building on existing capacity and expertise where possible
49. EUDAT Summer School, 3-7 July 2017, Crete
As-a-service provision mode
a work in progress
Architecture: schematic
50. EUDAT Summer School, 3-7 July 2017, Crete
Skills and Training
Need to identify
• what skills individuals need to have to work with the EOSC.
• What competencies Organisations should have to effectively
take part in the EOSC
Some first recommendations
• Produce FAIR training material
• Provide “training-as-a-service”
• highlight the relevance of enabling and rewarding data skills
development.
• refine a Skills Framework in the development of careers and
expertise in data stewardship
03/07/17
51. EUDAT Summer School, 3-7 July 2017, Crete
Next Steps
EOSC Pilot goes on to December 2018
- Governance and Policy Framework
- Service architecture, portfolio, rules of engagement, trials
- Interoperability recommendations and trials
- Training and capability recommendations
EC setting the governance and funding framework with the EC.
Next set of EOSC Projects announced soon:
- Setting the core service provision
EC 2018-21 research infrastructure workprogramme built around EOSC.
03/07/17
52. EUDAT Summer School, 3-7 July 2017, Crete
Upcoming events
D4IR conference 30 Nov-1 Dec, Brussels
www.eoscpilot.eu The European Open
Science Cloud for
53