Marios Chatziangelou presents the EGI applications database | OSFair2017 Workshop
Workshop overview:
This collaborative workshop comes in the context of coordinating EOSC related activities across large European infrastructures at European and national level. The workshop will offer an opportunity for cross-pollination on issues ranging from open scholarship to technical service provision, training, community engagement and support. OpenAIRE NOADs, EGI NGIs, GEANT NRENs and other national e-Infrastructure representatives will discuss gaps, synergies, coordination and service integration opportunities.
DAY 3 - PARALLEL SESSION 6 & 7
1. www.egi.eu EGI-Engage is co-funded by the Horizon 2020 Framework Programme
of the European Union under grant number 654142
Marios Chatziangelou, et al. <mhaggel@iasa.gr>
Institute of Accelerating Systems and Applications (IASA), Athens, Greece
http://www.iasa.gr/
EGI Applications Database
https://appdb.egi.eu/
2. 2
Capabilities
A community driven, central EGI service that stores and provides:
software solutions (in the form of native software and/or virtual appliances),
originated from almost every scientific area/discipline
reference of scientific datasets (pilot - under development)
the programmers and scientists responsible for them
the publications derived from the registered items (SW, VA & datasets)
https://appdb.egi.eu
3. 3
Software Marketplace
Registry for Software items:
Applications, tools, Workflow frameworks and
instances, Science Gateways, MW products)
Offers release management capabilities
unlimited series of releases
light-weight & collaborative, release
management process
Acts as a repository for binary artifacts
unlimited number of repositories per register software
generic tarballs, RPM & DEB (32bit/64bit) binaries
multiple flavor / operating system combinations
simplified, web-based, process for uploading the binary artifacts
YUM & APT repositories for automatic distribution
artifacts populated through the UMD Community Repository
4. 4
In the context of Life Sciences Data Replication VT,
AppDB is being extended into a dataset registry
Initial focus is on Life Sciences reference datasets
Integration with the Elixir Tools and Data
Services Registry is in the works
Key characteristics:
Primary datasets represent original datasets, as posted by the provider
Derived datasets are based on a primary dataset but only part of the information is
kept, or only part of the data entries are selected
Indicative metadata: name, description, disciplines, homepage link, licensing, and
a version list
Each dataset version may host one or more locations where data can be accessed
Locations may be tagged as master or replica
Reference Datasets
5. 5
Cloud Marketplace
Registry for virtual appliances (VA)
a logical container of versioned image file &
metadata bundles
Registry for software appliances (SA)
a logical container of VAs & contextualization
scripts bundles
VA distribution medium
distributing endorsed VAs to the resource
providers/sites
Resource providers catalogue
list of the VAs which are available by each site/resource provider
Virtual Organizations (VO) catalogue
list of the VAs which are available for each VO member
6. 6
The AppDB VMops dashboard
The objective (EGI-Engage DoW) : “The EGI
Applications Database (AppDB) will evolve from its
current role as catalogue of applications and
virtual machines images (VMI) to include a
graphical user interface allowing authorized users
to perform VM management operations
Highlighted features for the end-user
Create a new topology with one (or more)
VMs
Attach additional storage to the VM
instances
Deploy/Un-deploy a topology
Start/Stop a topology (= all the VM instances
of a topology)
Start/Stop a single VM instance
Configure VM (cloud-init & ansible)
Execute bash script on deployment time
https://dashboard.appdb.egi.eu
7. 7
Person profiles
Personal details
Access group rights
Contact details and communication mechanisms
Publications
Affiliated organizations
Linked projects
Personal activity:
list of registered software
list of registered Virtual & Software Appliances
list of registered Datasets
……
8. 8
General features (1/2)
dissemination of information
custom RSS/Atom news feeds
news e-mail subscription lists
user focused communication (messaging,
requests, etc)
special dissemination tool for sending ad-hoc
messages to scientists
'follow' button for receiving all the activity related
to a registered item
dissemination features customizable through user
preferences
sharing content with social networks
information retrieval
advanced searching mechanism (rated
search results)
'faceted search' mechanism for refinements
quality of information
content tagging, ratting, commenting
per registered item contact expertise
information
problem and comment abuse report
centrally managed quality control
taxonomy
technical classification
scientific classification
tagging
9. 9
General features (2/2)
AuthN/AuthZ and security
advanced AuthN/AuthZ mechanisms
(simpleSAML) integrated with EGI Checkin
service
support for multiple accounts for accessing user’s
personal profile
internally managed AuthZ, based on allowed
actions, roles and permissions
Relations…
… between all the entities listed below, are possible:
– software
– virtual appliances
– datasets
– persons
– virtual organizations
– sites / resource providers
– organizations
– projects
Integration with AppDB
RESTfull API, supports operations following a CRUD
convention.
flexible API stateless authentication mechanism
using Personal Access Tokens (no need for
X509)
or even, by adapting the AppDB Gadget
(easy – copy & paste, one line of code – no technical
skills required, you may get it here)
AppDB already integrated with many EGI services
EGI GOCDB
list of sites, their metadata & downtimes
Top-BDII
fetching sites dynamic information
EGI Checkin
for AuthN and high level AuthZ attributes
Perun and EGI Operations Portal
for VO related details, inc. membership & roles
Argo: retrieving the status of the Cloud-enabled
sites
11. 11
Need for creating relations
Entities/Digital Objects available
by the service (either hosted
or harvested):
Software
Datasets
Topologies
Virtual & Software Appliances
Virtual Machines
Researchers
Resource Providers (from the
EGI GOCDB)
Virtual Organizations (from the
EGI Ops Portal & Perun)
Publications (derived from the
registered items)
Globally defined entities/digital
objects to create relations
with:
Projects
Organizations
Publications
Contact profiles
Research Data
… etc
OpenAIRE
12. 12
Integration with OpenAIRE (1/3)
1. Developed a dedicated (not publicly accessible) service for:
periodically consuming the required data over OpenAIRE OAI-PMH interface
controlling the process (big data volume + complexity)
Mapping the OpenAIRE data to the AppDB ones
2. Made the necessary enhancements to our databases for storing the fetched
data/records as well as the produced relations
3. Extend our user’s interface in order to:
the end-user to be able to select/pick from a list of projects/organizations, thus
avoiding the data entry
the system to make ‘suggestions’ to the end-user based on the pre-existed relations,
contact-projects & contact-organizations, as those extracted from the OpenAIRE
data
14. 14
Integration with OpenAIRE (3/3)
Summarizing…., the AppDB acts as a consumer to the OpenAIRE repository, getting
data with respect to Projects, Organizations and Contact persons
Next steps….,
Consume, store and utilize data related to publications
Considerations: big (very big) data volume, overcome complexity
Stabilize the process & periodicity of data harvesting
Considerations: again, data volume ( takes more than a day for a single fetch)
Act as a repository (producer), populating enriched datasets back to the OpenAIRE
Considerations: need to develop the necessary mechanisms
15. www.egi.eu
Thank you for your attention.
Questions?
EGI-Engage is co-funded by the Horizon 2020 Framework Programme
of the European Union under grant number 654142
Notas del editor
TODO: revise the content
TODO: Arrows
TODO: revise the title i.e. Enrich metadata – Creating relations or something similar