EGI: a spark to transform science, business and society
1. www.egi.eu
@EGI_eInfra
The work of the EGI Foundation
is partly funded by the European Commission
under H2020 Framework Programme
EGI: a spark to transform
science, business and society
Yannick.legre@egi.eu
Yannick Legré
2. @EGI_eInfrawww.egi.eu 4 November 2019 2
• Big science; Open Science; Innovation
• The EGI Federation
• EGI Services
• Use Cases
• European Open Science Cloud
Outline
4. @EGI_eInfrawww.egi.eu 4 November 2019 4
Science goes digital,
Science goes big
Square Kilometer Array
phase 1 (2023)
~ 300 PB/year
science data
Facebook uploads
180 PB
LHC science data
200 PB
LHC 15 PB
raw data
2016
Google searches
98 PB
Google Internet archive
~ 15 EB
Square Kilometer Array
phase 2 (mid 2020s)
~ 1 EB science data
High Luminosity
Large Hadron Collider
~ 600 PB raw data (2026)
High Luminosity
Large Hadron Collider
~ 1 EB Physics data (2026)
Yearly data volumes
5. @EGI_eInfrawww.egi.eu 4 November 2019 5
Challenges
From global research
to implementation in industry
• Data-driven research
• Skills
• Knowledge transfer
Researcher perspective
• Reputation, incentives, services
Industry perspective
• Return on investment
• Market impact
6. @EGI_eInfrawww.egi.eu 4 November 2019 6
Opportunities
New sharing culture via innovative
communication models
Knowledge extraction & exchange
strengthened by AI
Increasing number of platforms & tools
• Stimulate and facilitate adoption by end-users
• Favour common approaches between sectors
• Leverage value of both Big & Small data
Use pan-European e-infrastructures
8. @EGI_eInfrawww.egi.eu 4 November 2019 8
Mission
Create and deliver open solutions for research and innovation by
federating digital capabilities, resources and expertise between
domains and across geographical/organisational boundaries.
Vision
Researchers and Innovators from all disciplines have easy, integrated
and open access to the advanced scientific computing capabilities,
resources and expertise needed to collaborate and to carry out
data/compute intensive applications.
EGI: Advanced computing for research
9. @EGI_eInfrawww.egi.eu 4 November 2019 9
EGI is a federation of over 200 computing and data centres
spread across Europe and the rest of the world
47 Countries
71,000 users
12 Integrated
e-Infrastructures
1,700 Open Access
Publications in 2018
31 large-scale research
collaborations
20+ business
use cases
10. @EGI_eInfrawww.egi.eu 4 November 2019 10
EGI Federation & Cloud federation (Jun 2019)
5+ Billion
CPU core
wall time
>1 Million
computing
cores
> 740 PB disk
& tape
2,915 service
end-points
Resource centres
Cloud providers
15. @EGI_eInfrawww.egi.eu 4 November 2019 15
EGI Cloud and EGI Notebooks
An AI-ready platform
Federation of IaaS providers supporting
AI and Big Data needs:
• Powerful GPUs and high-memory
instances
• Bring computation near data
• Appliance Catalogue
Ready-to-use JupyterLab environment:
• No local installation, just login
• Access EGI services from your
notebooks
• Highly-customizable to support your
specific language/libraries need
EGI Cloud Compute
A multi-cloud IaaS with
Single-Sign-On
EGI Notebooks
Jupyter on the EGI cloud
16. @EGI_eInfrawww.egi.eu 4 November 2019 16
Open science with EGI Services
Reproducible and discoverable analysis
Notebooks
DataHub
Share Reproduce
Notebooks + Binder
DataHub
18. @EGI_eInfrawww.egi.eu 4 November 2019 18
S1 pro
consis
image
500K
parcel
practic
Machin
applie
series
Splits
Develo
now m
Coper
A Euro
demon
(+ EOS
Big Data Analytics for agricultural monitoring
using Copernicus Sentinels and EU open data
sets
• Agricultural monitoring with Earth
Observation data in Europe with an
hybrid cloud approach
• Appling Machine Learning to Sentinel-
1 time series to determine crop types
• Looking towards scaling to EU-
continental scale
19. @EGI_eInfrawww.egi.eu 4 November 2019 19
Deep Learning image classification
Convolutional Neural Networks trained to
identify species from images:
Phytoplankton, conus, flowers
Training on GPU-powered cloud VMs,
classification on regular VMs
Open web front-ends and mobile app
enable citizen science
20. @EGI_eInfrawww.egi.eu 4 November 2019 20
AGINFRA+: notebooks as
collaborative platform for big data
AGINFRA+ supports Agriculture and
Food research communities with VREs
EGI Notebooks:
Big data analysis prototyping
Customized deployment, integration
with other AGINFRA+ features
21. @EGI_eInfrawww.egi.eu 4 November 2019 21
21
EISCAT_3D
European Incoherent Scatter Scientific Association
Studying interactions in the auroral ionosphere and magnetospheric
cusp regions
• 109 Arrays of 91 antennas (+2 receiver stations) Up to 100
simultaneous beams
• Maximum data rate after beamforming > 50 Gb/s
First data expected in 2021…
22. @EGI_eInfrawww.egi.eu 4 November 2019 22
22
Service integration
• Data transfer
• Cloud & container computing
• Identity & Access Management
• User portal (DIRAC)
User engagement
• Early adoption
• Technical training
• Continuous service improvement
EISCAT_3D support
23. @EGI_eInfrawww.egi.eu 4 November 2019 23
23
medicine
environment
bioindustries
society
To build a sustainable
European infrastructure for
biological information,
supporting life science
research and its
translation to:
• ELIXIR Nodes build local bioinformatics
capacity based on national strengths
and priorities
• ELIXIR Hub provides European
coordination
ELIXIR
24. @EGI_eInfrawww.egi.eu 4 November 2019 24
ELIXIR Competence Centre
1. Technical integration
• Identity & Access Management
• Cloud Infrastructure
• Virtual Applications Store
• File replication
2. Business model
• Capacity allocation to users and to user
projects
3. Data access policies
• Sensitive data
4. Training
• Train the resource centers
25. @EGI_eInfrawww.egi.eu 4 November 2019 25
Bringing together complementary
research teams in the structural
biology and life science area into a
virtual research community at a
worldwide level
Provide them with a platform
integrating and streamlining the
computational approaches necessary
for data analysis and modelling
WeNMR: Structural Biology community
26. @EGI_eInfrawww.egi.eu 4 November 2019 26
WeNMR:
Architecture behind the portals and support from EGI
Cloud services
High-Throughput computing
Access to distributed CPUs
and GPGPUs
Platform-enabling services
Workload manager
Identity & Access
Management
28. @EGI_eInfrawww.egi.eu 4 November 2019 28
Role of the Industry in EGI
Procurement
Participate in the procurement framework
Customer
Making use of existing EGI services
Provider
Offering services to EGI
Partner
Co-development
29. @EGI_eInfrawww.egi.eu 4 November 2019 29
EGI Business Engagement Services
• Pilots/proofs of concepts
• Service/Product design
• PaaS/SaaS Integration
• Performance verification
• Testing
Piloting and co-design
• Technical consultancy
• Service management
• Commercialization and
business coaching
• Brokerage to funding and
opportunities
Training & Support
• Compute (HTC, HPC, Cloud)
• Storage (Online/Archive)
• Data management
• Research data
• Tools & applications
Technical access
• Media Exposure
• Participation to events
• Promotional print material
• Inclusion in marketplace
• Networking
Visibility
30. @EGI_eInfrawww.egi.eu 4 November 2019 30
MOXOFF
VAMOS - Video Analysis for Movement Optimization
and Statistical analysis
Real-time analysis of video
powered by GPU-enabled VMs on
EGI Cloud
Upload one video and get
complete report about the
training sessions
Consistency at
toss: 63%
Consistency at
impact: 28%
Overall
consistency: 32%
32. @EGI_eInfrawww.egi.eu 4 November 2019 32
European Open Science Cloud
A policy initiative from the European Commission
supported by the Member States
The role of the EOSC is to ensure that European scientists reap the full benefits of
data-driven science, by offering:
“1.7 million European researchers and 70 million professionals in science and
technology a virtual environment with free at the point of use, open and seamless
services for storage, management, analysis and re-use of research data, across
borders and scientific disciplines”
2016 Communication on the “European Cloud Initiative
33. @EGI_eInfrawww.egi.eu 4 November 2019 33
€ 33M
100+
partners
2018-2020
Coordinated
by EGI
Foundation
The EOSC-hub project mobilises providers of
European relevance offering services, software
and data for advanced data-driven research and
innovation.
These resources are offered via the Hub – the
integration and management system of the
European Open Science Cloud, acting as a
European-level entry point for all stakeholders.
EOSC-hub: mission
34. @EGI_eInfrawww.egi.eu 4 November 2019 34
The EOSC-hub project established ‘the hub’
• Data
• Applications & tools
• Baseline services
(storage, compute,
connectivity)…
• Training, consultants
• Marketplace
• AAI
• Accounting
• Monitoring
Usage according to
rules of participation
From the consortium and
from external contributors
• Lightweight certification
of providers
• SLA negotiation
• Customer Relationship
Management
Based on FitSM
• Security regulations,
• Compliance to standards,
• Terms of use,
• FAIR implementation
guidelines
35. @EGI_eInfrawww.egi.eu 4 November 2019 3535
For researchers accessing EOSC:
https://eosc-portal.eu
Some services
accessible without
authentication
Some services
work with
automated login
Some require
manual approval of
users
‘Simple’ requests:
passed to
individual service
providers
Complex requests:
supported by
EOSC-hub tech
team
36. This work by the EGI Foundation
is licensed under a Creative Commons
Attribution 4.0 International License.
Questions?
Thank you
for your attention.
www.egi.eu
@EGI_eInfra
EGI: Advanced Computing for Research
The work of the EGI Foundation
is partly funded by the European Commission
under H2020 Framework Programme
Notas del editor
EGI Cloud is federated multi-cloud IaaS with Single Sign-On. Main message is we do support AI requirements (typically GPUs ad high-performance/high memory VMs)
The federation allows to move your computation near the data easily and provides a comprehensive appliance catalogue covering the application needs of most research use cases.
EGI Notebooks are a user friendly way to create Jupyter notebooks (which can be considered the standard development environment for AI) with the modern JupyterLab interface. The service is accessed from your browser, no need for installing anything. It runs on the EGI Cloud and from the user notebooks you can easily access other EGI services as we are fully integrated with the rest of the environment.
We support the common libraries and kernels but if something it’s missing, the environment can be customised as needed
Finally we are looking into providing reproducible environments with binder (see slide later) that allow to share with confidence your results in a way that others can easily reproduce them
EGI supports Open Science with the Notebooks
Researchers can perform their analysis of open data sets by using the EGI Notebooks which has :
Enabling access to open datasets via the EGI DataHub, this a complete environment where researchers can produce meaningful analysis together with a description of their computing environment.
These analysis can be shared in GitHub and Zenodo, this turning them into citable artifacts that can be included in any publication. Fellow researchers can easily discover these artifacts which then can be reproduced into the notebooks environment extended with Binder, a powerful tool to recreate computing environments, so results can be guaranteed to be the same as the original analysis
Use case from EU JRC to distinguish, identify and measure the main crop production areas in Europe, estimate production early in the year and check the validity of farmers’ applications for EU subsidies.
Supported by EGI providers (CESNET) and 2 EO-focused cloud providers: CREODIAS and EODC
Now running with areas of Denmark and Germany but looking towards scaling to continental scale
This example shows we are capable of supporting big data and Machine Learning based applications with close collaboration with other cloud providers.
This use case was selected as EuroGEOSS demonstrator
LifeWatch is the European e-Science infrastructure for Biodiversity and Ecosystem Research (ESFRI)
Neurral Network tools can be applied in many cases:
Bird recognition (by sound)
Satellites data (land type, land use, water…)
Species classification
GPUs get much better performance in training NN than CPUs,
Citizen science has become a powerful force for scientific inquiry, providing researchers with access to a vast array of data points while connecting non scientists to the real process of science. This citizen-researcher relationship creates a very interesting synergy, allowing for the creation, execution, and analysis of research projects. With this in mind, a Convolutional Neural Network differnt neural networks have been trained to identify phytoplankton and conus
AGINFRA+ is an example of how we can tailor the EGI Notebooks to match the needs of a given community. With AGINFRA we have added tight integration with the VREs so users can prototype their big data analysis that then gets pushed to the AGINFRA analytics services (these services can be also invoked from the notebooks). From the notebooks the users can access a common workspace so they can easily share the notebooks. Besides the research can be shared in the VRE data catalogues that we can easily visualize without the need of running it.
Overview of ELIXIR– life science infrastructure for biological information
Fundamentally ELIXIR is about delivering a set of services that address the challenge of life-science data –
handling volumes and data access, ensuring quality through curation and metatdata and make sure different data-types can be integrated for analysis and computation.
We will make sure the services work together and are around for a long time. And we will train people in how to use them.
Build on Europe’s strength in curation of data – important for quality exploitation and of course the the systems bio community
See that the ELIXIR nodes often take important role in national Data management planning and data access policies (e.g. Swedish node is minting doi for datasets, come back to this in some of the pilot actions)
ELIXIR’s data infrastructure will…
data integration by making the best use of Europe's collective, expanding capacity
principles for optimisation of existing data capacity to meet rising demand - for example, assessing which data should be stored and made available to users
single interface to a distributed infrastructure
Compute infrastructure Capacity Compute & Storage
Life scientists produce an enormous quantity of heterogeneous data which needs to be integrated
Harmonise the staggering number of analytical software tools
Support typical analyses using multiple tools linked together into data processing 'pipelines'
Be developed based on the following principles:
discoverability
ease of use
benchmarking
Interoperability
Standards:
Incorporate :
Programmatic access
Nomenclatures
Controlled vocabularies and ontologies
Reporting requirements for guiding deposition and facilitating exchange of information
Training:
Be adequate to accommodate the increase in demand for training
Seek to complement, and build on, user training programmes in the Member States
Coordinate training capacity throughout Member Countries
Enable the development of new training programmes, including ‘training the trainer’, especially in new accession states
delivering interoperable and sustained services that will make up a distributed e-Infrastructure for bioinformatics.
As a EOSC-hub business pilot, Moxoff developed VAMOS - Video Analysis for Movement Optimization and Statistical analysis – a web-application where each authenticated user can analyze and monitor performance of an athletic gesture.
VAMOS is scalable and automatic, i.e. there is no need for human supervision: the final user will upload one video per gesture, together with some additional information and will receive as output a report with advanced statistics on his training sessions, such as consistency, best repetition, quantitative indicators.
Working within the EOSC-hub project gave Moxoff the possibility to reach a wide audience and a very active network working on mathematical modelling, data science and optimization. The international experience allowed created new opportunities for the future. From the technical point of view, the cloud infrastructure which has been made available increased Moxoff’s computational capability, necessary to scale up and to process huge amount of data.
Trusted and open virtual environment with seamless access to services (with highest TRLs) addressing the whole research data life cycle:
Federate and connect existing or planned RIs
Make data FAIR, store them, ensure long-term preservation
Services to find, access, combine, analyse and process data
Protected, personalized work environment
Multi-layered federation (federating core, rest of the ecosystem) bringing together supply and demand in a trusted environment
Open, transparent, rule of law based: no lock-in by individual service providers, data portability, IPR, cloud security…
Adaptively user-oriented and inclusive (across borders and disciplines)
Accessible through a simple, universal access point
Governed by clear Rules of Participation