Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Ontological Infrastructure for Interoperable Research Information Systems: HERCULES project
1. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
1
Ontological Infrastructure for Interoperable Research
Information Systems: HERCULES project
Reyes Hernández-Mora Martínez
Hércules Project Office
(reyes@um.es)
VIVO talks! December 7, 2022
Prof. Diego López-de-Ipiña
University of Deusto
(dipina@deusto.es)
@dipina
2. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
2
EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
Hercules is an initiative promoted by the Conference of Rectors of Spanish Universities (CRUE) and led by the University
of Murcia. It has a total budget of Five million four hundred and sixty two thousand six hundred euros (5.462.600 €). The
European Regional Development Fund (ERDF) is financing 80%, therefore the ERDF makes a contribution of Four million
three hundred seventy thousand eighty euros (4.370.080 €). This contribution is made through the Ministry of Economy,
Industry and Competitiveness (currently the Ministry of Science, Innovation and Universities) as Intermediate Body of
the Smart Growth Operational Programme (POCint), which now is known as the Cross regional Operational Programme
of Spain (POPE).
It started in 2017 and it is foreseen to end in 2022.
2
3. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
3
EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
3
➢ 84 Universities having different Research Management
Systems
➢ Each university has its own procedures and the
systems are based on different data models and
schemata
➢ Poor standardization
➢ Transfer of research results to the society is done in a
disaggregated way
➢ Inefficiencies in knowledge management and
dissemination
➢ Additional costs to use combined research information
➢ Difficulties to access European Funds
➢ Data representation is done with different granularity
• Needs
➢ Integration of normalized CV
➢ Transparency
➢ Collaborations between universities and
research groups
➢ Unify the criteria used to extract the information
➢ Open Access of research results to the society
➢ Availability of combined information from different
universities/sources
➢ Improvement of the technological transfer
and collaboration University-Company
3
4. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
4
Advanced National Research Portal for Spanish Universities
By using Semantic and Artificial Intelligence Technologies
Advanced Search
Dashboard
Research Indicators
Collaboration Clusters
3
Innovative
Prototype of
the Research
Information
System
2
4
Tool for the Management and Enrichment of researcher’s CV
By using Data Enrichment and Artificial Intelligence Technologies
Semantic Architecture and Research Ontology (ASIO)
1
5. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
5
ASIO - Ontological
Infrastructure
Semantic Arquitecture
Knowledge
Graph
ROH
API
Linked
Data
Server
HERCULES solution’s architecture
University
Library
Semantic
Enrichment
(ED)
API
API
CVN + Researcher
portal
API
Scopus, WOS, OpenAire,
ORCID, SemanticScholar,
CrossReference, Slideshare,
FigShare, Zenodo
Human
Resources,
Economic &
Academic
management
Research
Information
Management
System
(CRIS/RIMS)
API
API
API
Analysis
Methods
(MA)
Researcher portal
6. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
6
ASIO: Semantic Data Architecture and Ontological Infrastructure
• ASIO is an innovative solution grounded on a Semantic Architecture and Ontological
Infrastructure, to be used by both the University of Murcia and the rest of Spanish Universities
that belong to the CRUE with similar needs and responsibilities.
• ASIO’s core innovative features of the solution are:
o Ontological Infrastructure: creation of an Ontology Network that can be used to describe the data of
the research domain with fidelity and high granularity.
o Using already existing ontologies
o Aligned with FAIR principles.
o Semantic Data Architecture: developing an efficient platform to store, manage and publish research
information. It is based on the Ontological Infrastructure, with the ability to synchronize instances
installed in different universities.
7. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
7
Design of ROH: Hercules Ontology Network
1. Requirement analysis: an analysis of the requirements for modelling academic information was delivered, describing all
the concepts to be modelled within ROH, based on usage scenarios.
2. Selection and analysis of ontologies describing the academic domain: SOTA on academic domain ontologies was
performed, the set of ontologies to be reused during the development of ROH were selected.
3. Implementation of the main concepts and relationships for academic domain: from the requirements detected and
ontologies selected, the main concepts required for representing the academic domain were implemented, as well as the
relationships among them.
4. Evaluation of the flexibility, completeness and integrity of ROH: for that, three different evaluation processes were
carried out:
• Competency Questions set up by University of Murcia after a thorough survey issued to domain experts in order to
check if the developed network of ontologies fits to the requirements
• Use of SHACL (Shapes Constraint Language) for validating the data modelled according to ROH, particularly during
instantiation
• Mapping of FECYT’s CVN - Standardised Curriculum Vitae to ROH.
5. Continuous refinement validated by automated regression tests: a test suite based on SPARQL competency Questions
integrated in a CI/CD (Continuous Integration and Continuous Delivery) workflow
Design of ROH: Hercules Ontology Network
8. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
8
Design of ROH: 5 Design principles
• Reusability - remodelling of concepts avoided if an ontology exists that comprehensively models a given
concept.
o E.g, a position held by a person in an academic organization is extensively documented in VIVO Ontology for Research
Discovery.
• Extensibility - since, although academic information modelling shares many aspects universally, there are
aspects that are specific to the country in question.
o E.g, the six-year periods in Spain, or the University or research centre in question, for example, job positions
contemplated at the University of Murcia or other universities.
• Maintainability – the modularization of the network of ontologies in distinct contextualized refinements seeks
an easier maintainability of ROH
• Integrity – by the application of ontological restrictions and validation scripts in languages like SHACL, to
preserve also Integrity.
• Usability - ROH is not only comprehensive and exhaustive, but USABLE.
o In ontological design, often entities and properties superficially described, following the Open World principle.
o ROH seeks to make the devised network of ontologies usable by those that need to instantiate it
9. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
9
Design of ROH: Requirements analysis
Scenarios Identified entities
1. Improving the structure of research
funding and analytical accounting
Project (name, duration, consortium, researchers)
Funding entity
Income and types of income
Expenditure and types of expenditure
Project categories
Organisation (university, faculty, department,
research group)
Consortium
Participating researchers and roles
2. National knowledge map /
Identification of knowledge hubs in
different areas
Organisation
Geographical area (geonames)
Knowledge areas
RIS3 areas of specialisation
Lines of research
Thematic tags
Publications
Types of publications http://www.ris3mur.es/
3. Flexible research management
dashboard.
Areas of expertise
RIS specialisation areas
Project (type, funding sources, consortium, distribution)
Organisation and geographical distribution
Researchers and research groups involved
Indicators/metrics
4. Search engine for partners at national
level
Profile of a group
Areas of knowledge
Project
Organisation and geographical distribution
People (Researchers)
5. Group selection / automatic consortium
configuration
Research group Expert
Profile
6. Improving the chances of obtaining
European/ International research
funding
Call for proposals
Projects
Research groups
Profiles
7. Generator of researcher, group and
organization pages + CVs + Research
CV, research group website or research report
10. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
10
Design of ROH: Non-functional requirements
1. Follow Linked Open Data principles: 1) use URIs as names for things, 2) use HTTP URIs so that people can look up those
names, 3) when someone looks up a URI, provide useful information, and 4) include links to other URIs, so that they can
discover more things.
2. Follow FAIR principles: data must be Findable through a persistent identifier and including metadata, Accessible through
the universal HTTP protocol, Interoperable using widely adopted vocabularies and Reusable, published using user
licenses that promote reusability.
3. Use of persistent identifiers: use of IDs that are permanently assigned to a resource even if the location of the resource
changes over time, such as purl.org or w3id.org.
4. Multilingualism: labels and descriptions of both classes and properties of the ontology should be expressed in English,
Spanish and co-official languages, through rdfs:label and rdfs:comment properties.
5. Interoperability with existing ontologies: sometimes ontologies are no longer available on the Internet, usually because
of the lack of maintenance, hence, ontologies are hosted by ROH to avoid this.
6. Integration with existing information sources, both from the university itself and from third party organizations.
7. Integration of the CRIS/RMS with external knowledge networks, such as DBPedia or Wikidata.
8. Release of ontologies and source code, using GNU v3 LICENSE or equivalent license
11. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
11
Design of ROH: Identification of entities & relationships
12. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
12
Design of ROH: Modularity
• ROH network of ontologies is divided into 2 main parts:
• The generic ontology, core module , contains the most important entities and properties
to model information in the academic domain.
o It covers the academic domain, is agnostic to the country or the research organization.
• A set of vertical modules which include:
1. Specializations of some academic concepts for a given country domain. For instance, the figure
Associate Professor in the Spanish academic domain would be encountered in the vertical module
university-HR-es and is assigned the URI http://w3id.org/roh/university-
hr/es#ProfesorTitularDeUniversidad.
2. Controlled vocabularies, according to SKOS ontology, for different important areas in the academic
domain, namely, geographical locations (geopolitical) , knowledge areas, project types (project-
classification) or positions in universities.
14. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
14
Design of ROH: restrictions & classifications
• Created ontological constraints to regulate class instantiation and enable reasoning
• Created Defined classes, i.e. those whose instances are derived by reasoner by verifying that they
meet certain constraints:: roh:ResearchObject, roh:AccreditationIssuer &
vivo:FundingOrganization
• Combined class hierarchies with mapping to term taxonomies
• roh:ProjectClasification through
roh:hasProjectCategorization
15. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
15
ROH ontology: documentation
1. Automatic documentation generated with Widoco:
https://herculescrue.github.io/ROH/roh/ (ASIO-v1.0)
2. Manual documentation in MarkDown:
https://herculescrue.github.io/ROH/
3. Table with all entities, object properties and data:
https://herculescrue.github.io/ROH/1-%20OntologyDocumentation.pdf
23. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
23
ROH main entities: roh:Funding
❑ Use case: Q36 - List groups sorted by
funding received
❑ Dataset:
<http://purl.org/roh/data#another-collaborative-project>
a vivo:Project ;
:isSupportedBy [ a :Funding ;
ro:hasPart [ a :FundingAmount ;
:grants <http://purl.org/roh/data#centro-investigacion-1> ;
:monetaryAmount "25000"^^xsd:decimal ] ;
:fundedBy <http://purl.org/roh/data#european-funding-program> ;
:publicFunding "true"^^xsd:boolean
],
[ a :Funding ;
ro:hasPart [ a :FundingAmount ;
:grants <http://purl.org/roh/data#centro-investigacion-3> ;
:monetaryAmount "35000"^^xsd:decimal ] ;
:fundedBy <http://purl.org/roh/data#european-funding-program> ;
:publicFunding "true"^^xsd:boolean
];
vivo:dateTimeInterval [ a vivo:DateTimeInterval ;
vivo:end [ a vivo:DateTimeValue ;
vivo:dateTime "2021-06-30T00:00:00"^^xsd:dateTime
] ;
vivo:start [ a vivo:DateTimeValue ;
vivo:dateTime "2018-01-01T00:00:00"^^xsd:dateTime
]
] ;
vivo:relates [ a vivo:MemberRole ;
:roleOf <http://purl.org/roh/data#centro-investigacion-1> ;
vivo:relatedBy <http://purl.org/roh/data#another-collaborative-project>
] ;
vivo:relates [ a vivo:LeaderRole ;
:roleOf <http://purl.org/roh/data#centro-investigacion-3> ;
vivo:relatedBy <http://purl.org/roh/data#another-collaborative-project>
] .
❑ SPARQL query:
PREFIX roh: <http://purl.org/roh#>
PREFIX ro: <http://purl.org/roh/mirror/obo/ro#>
SELECT ?organization ?fundingProgram
(SUM(?monetaryAmount) as ?totalFunding)
WHERE {
?fundingProgram a roh:FundingProgram ;
roh:funds ?funding .
?funding ro:hasPart ?fundingAmount .
?fundingAmount roh:grants ?organization ;
roh:monetaryAmount ?monetaryAmount .
} GROUP BY ?organization ?fundingProgram
24. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
24
ROH main entities: roh:ResearchObject
Prefix Class
Prefi
x
Object property (bold
indicates explicit Domain,
otherwise a Restriction)
Range Class Prefix
Datatype
Property (bold
indicates
explicit
domain;
otherwise a
restriction)
Range
Datatype
(if typed)
roh ResearchObject roh hasKnowledgeArea skos:Concept
roh correspondingAuthor foaf:Person
roh producedBy roh:Project
bibo Collection bibo oclcnum rdfs:Literal
roh Dossier vivo relates
roh:ProjectContract or
bibo:Report or
roh:Project
vivo identifier xsd:string
vivo dateTimeInterval vivo:DateTimeInterval roh title xsd:string
vivo description
bibo Periodical vivo publisher foaf:Organization bibo eissn rdfs:Literal
bibo issn rdfs:Literal
bibo Journal vivo dateIssued vivo:DateTimeValue vivo abbreviation rdfs:Literal
bibo Magazine
bibo Document vivo publishedIn
bibo:Collection or
bibo:Book
bibo doi xsd:string
bibo authorList rdf:Seq bibo abstract xsd:string
vivo dateIssued vivo:DateTimeValue bibo pageStart
bibo editorList rdf:Seq bibo pageEnd
bibo volume rdfs:Literal
roh title xsd:string
vivo Abstract
bibo Article bibo issue
bibo
AcademicArtic
le
obo-iao:
JournalArticle
roh hasMetric roh:PublicationMetric
vivo
ConferencePa
per
bibo presentedAt bibo:Conference
roh:
WorkshopPape
r
vivo EditorialArticle
25. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
25
ROH main entities: roh:ResearchObject
❑ Use case: Q8 - Scientific output (ResearchObjects) of a research group
❑ Dataset:
<http://purl.org/roh/data#journal-article-1-metric>
a :PublicationMetric ;
:impactFactor "2.5"^^xsd:float ;
:quartile "Q2"^^xsd:string .
<http://purl.org/roh/data#journal-article-1>
a iao:IAO_0000013 ;
dc:title
"My great journal article" ;
:hasKnowledgeArea uneskos:1203 ;
:correspondingAuthor <http://purl.org/roh/data#investigador-1> ;
bibo:authorList [ a rdf:Seq ;
rdf:_1 <http://purl.org/roh/data#investigador-1> ;
rdf:_2 <http://purl.org/roh/data#investigador-3>
] ;
vivo:dateIssued
[ a vivo:DateTimeValue ;
vivo:dateTime
"2020-04-27T00:00:00"^^xsd:dateTime
] ;
vivo:hasPublicationVenue
<http://purl.org/roh/data#excelent-journal> ;
:hasMetric <http://purl.org/roh/data#journal-article-1-metric> .
❑ SPARQL query:
PREFIX vivo: <http://purl.org/roh/mirror/vivo#>
PREFIX roh: <http://purl.org/roh#>
PREFIX bibo: <http://purl.org/roh/mirror/bibo#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?researchGroup ?researchObject ?researchObjectClass
WHERE {
?researchObject a roh:ResearchObject ;
a ?researchObjectClass ;
bibo:authorList ?authorList .
?authorList ?order ?author .
?author roh:hasPosition ?position .
?position vivo:relates ?researchGroup .
?researchGroup a roh:ResearchGroup .
FILTER NOT EXISTS {
?researchObject a ?otherClass .
?otherClass rdfs:subClassOf ?researchObjectClass .
FILTER (?otherClass != ?researchObjectClass)
}
FILTER (str(?researchObjectClass) != "http://purl.org/roh#ResearchObject")
}
27. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
27
ROH main entities: roh:Activity
❑ Use case: Q26 - Get the list of congresses/workshops and science dissemination events in which I have participated
indicating the role I have played: organiser, exhibitor, etc.
❑ Dataset:
<http://purl.org/roh/data#a-great-conference>
a bibo:Conference ;
bfo:BFO_0000055 [ a vivo:AttendeeRole ;
ro:RO_0000052<http://purl.org/roh/data#investigador-2> ] .
<http://purl.org/roh/data#investigador-2>
a foaf:Person ;
:hasKnowledgeArea uneskos:120304 , uneskos:570508 ;
:hasPosition [ a :ResearcherPosition ;
vivo:dateTimeInterval
[ a vivo:DateTimeInterval ;
vivo:start
[ a vivo:DateTimeValue ;
vivo:dateTime
"2013-05-10T00:00:00"^^xsd:dateTime
]
] ;
vivo:relates
<http://purl.org/roh/data#centro-investigacion-1> ,
<http://purl.org/roh/data#investigador-2>
] ;
foaf:name
"Maria" ;
foaf:gender "female" ;
vivo:relatedBy [ a :ResearcherPosition ;
vivo:relates <http://purl.org/roh/data#centro-investigacion-1> ,
<http://purl.org/roh/data#investigador-2>
] ;
:hasCV :CurriculumVitae .
❑ SPARQL query:
PREFIX ro: <http://purl.org/roh/mirror/obo/ro#>
PREFIX bibo: <http://purl.org/roh/mirror/bibo#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bfo: <http://purl.org/roh/mirror/obo/bfo#>
SELECT ?researcher ?conference ?roleClass
WHERE {
?conference a bibo:Conference ;
bfo:BFO_0000055 ?role .
?role a ?roleClass ;
ro:RO_0000052 ?researcher .
FILTER NOT EXISTS {
?role a ?otherClass .
?otherClass rdfs:subClassOf ?roleClass .
FILTER (?otherClass != ?roleClass)
}
}
28. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
28
ROH validation: competence questions validation for assessing
flexibility, completeness and usability
1. A collection of SPARQL queries with diverse competency questions have been
made available at the ROH repo
2. These queries are executed over two instance datasets available at:
• Synthetic dataset, generated to give place to valid instance of diverse concepts
• Real dataset based on database of MORElab research group web page
• Generated with Morph-KGC tool
• Contains more than 600 researchers, 150 projects and 400 research articles
29. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
29
ROH validation: FECYT CVN mapping & SHACL validation for testing
interoperability with other systems
• The CVN defines a standard format to present researcher’s CV which allows the
interoperability among different databases of the Spanish public administration. CVN allows
researcher presenting their CV in a unified way in different funding calls from Spanish and
regional governments
• A CVN to ROH tool which takes the XML version of the CVN as an input and generates an RDF file mapping
of the CVN to ROH has been developed
• ROH graph materializations to test the scalability of our system loading a CRIS dataset from
different sources (e.g. CERIF or Hercules SGI Data Model)
• SHACL Shapes defined in a way that are aligned with the ontology definitions and have
been applied to the datasets before loading them into a graph modelled following ROH
30. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
30
ROH validation: Continuous refinement
• MLOps or ML Ops is a set of practices that aims to deploy and maintain machine learning
models in production reliably and efficiently.
• In ROH, DevOps practices have been applied to ontology development, which we can
name Ontology Ops or Onto Ops.
• OntoOps feasible through on a CI/CD workflow implemented through the GitHub Actions
• More details at
https://github.com/HerculesCRUE/ROH/blob/main/.github/workflows/widoco-and-
validation-questions.yaml
31. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
31
CRIS / RIMS
CRIS/RIMS
Scientific Production
Management
Calls, Request and
Projects Management
Research Groups
Management
Technology
-Based Companies
Management
Intellectual and Industrial
Property Management
Grants and Contracts
Management
Research Ethics
Management
31
32. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
32
Creation of a knowledge
graph and research portal
with semantic capabilities and
infrastructures.
Joint exploitation
of university
research
information
Enabling greater
scientific
dissemination of
research results
Facilitate the
detection of R&D
synergies
between
universities
and/or research
groups.
Unify the criteria for
information gathering,
offering greater
guarantees of an
adequate interpretation
and, therefore, the
accuracy of the obtained
indicators.
Strengthen the
transfer of R&D
results to the
private sector
and university-
industry
collaboration.
Researcher portal
32
33. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
33
University SGI
Internet:
Scopus
Wos
OpenAire
FigShare
SlideShare
…
Persona data, Projects,
Education, Research
Areas, …
Entity linking
Semantic Similarity
Similar ROs
Research Objects: Code, Datasets, Experimental Protocols,
Papers, …
Researcher Dashboard
Knowledge Graphs
CVN
33
34. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
34
Conclusions
➢ HÉRCULES will entail the incorporation into the Spanish University System of a
highly innovative product that incorporates semantic web technologies that will
allow us to obtain new information by integrating existing information from multiple
nodes with heterogeneous ontologies and vocabularies, enabling greater inductive
knowledge from apparently unconnected data.
➢ As open-source software, these techniques will be applicable to a wide range of
geographical and knowledge environments.
➢ https://github.com/HerculesCRUE/ROH
34
35. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
35
Conclusions
➢ As a public service, the implementation of HÉRCULES will be a great step forward as it will allow the
different University Information and Research Nodes to interoperate in a semantic way, resulting in:
• Improvement in the efficiency of research management in terms of licensing and software
maintenance costs.
• Greater efficiency in public investment by decreasing duplication of R&D&I investment.
• Enhancement in the efficiency of research in universities
35
36. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
36
Conclusions
➢ Improve the scientific dissemination of research results.
➢ Create systems to support the detection of R&D&I synergies between universities.
➢ Possess semantic capabilities to strengthen the transfer of R&D&I results to companies.
➢ Create a system for university research data analysis → ACADEMIC ANALYTICS
➢ Improve interoperability with the corresponding European, national and regional governmental
bodies.
36
37. EUROPEAN REGIONAL DEVELOPMENT FUND (ERDF)
A way to make Europe
37
HERCULES WEBSITE
http://hercules.um.es
Thank you for your attention
hercules@um.es
37
Prof. Diego López-de-Ipiña
University of Deusto
(dipina@deusto.es)
@dipina