Towards an Open Research Knowledge Graph

Towards an
Open Research Knowledge
Graph
Sören Auer

Gottfried Wilhelm Leibniz
* 21. Juni/ 1. Juli 1646 in Leipzig
† 14. November 1716 in Hannover
Namesake Member of
Library of
Namesake

Had to do some research on
serials…

16
New means adapted to the new posibilities were developed, e.g. „zooming“,
dynamics
Business models changed completely
More focus on data, interlinking of data and services and search in the data
Integration, crowdsourcing play an important role
The World of Publishing & Communication
has profundely changed

What about Scholarly
Communication?

18
Scientific publishing in the 17th
century
One of the earliest research
journals: Philosophical Transactions of the
Royal Society
© CC BY Henry Oldenburg

20
Scientific publishing today
We have:
BUT
• Mainly based on PDF
• Is only partially machine-readable
• Does not preserve structure
• Does not allow embedding of semantics
• Does not facilitate interactivity/dynamicity/
repurposing
• …

21
Proliferation of scientific literature
Duplication and inefficiency
Deficiency of peer-review
Reproducibility crisis
Science is Seriously Flawed

22
Science and engineering articles by region, country: 2004 and 2014
Proliferation of scientific literature
National Science Foundation: Science and Engineering Publication Output Trends: https://www.nsf.gov/statistics/2018/nsf18300/nsf18300.pdf

23
1,500 scientists lift the lid on reproducibility
Monya Baker in Nature, 2016. 533 (7604): 452–454. doi:10.1038/533452a:
• 70% failed to reproduce at least one other scientist's experiment
• 50% failed to reproduce one of their own
experiments
Failure to reproduce results among disciplines
(in brackets own results):
• chemistry: 87% (64%),
• biology: 77% (60%),
• physics and engineering: 69% (51%),
• Earth sciences: 64% (41%).
Reproducibility Crisis
© Stanford Medicine - Stanford University

24
How can we avoid duplication if the terminology, research problems, approaches,
methods, characteristics, evaluations, … are not properly defined and identified?
How would you build an engine/building without properly defining their parts,
relationships, materials, characteristics … ?
Duplication and Inefficiency

25
Lack of:
• Transparency – information is hidden in text
• Integratability – fitting different research results together
• Machine assistance – unstructured content is hard to process
• Identifyability of concepts beyond metadata
• Collaboration – one brain barrier
• Overview – scientists look for the needle in the haystack
Root Cause - Deficiency of Scholarly
Communication?

27
Realizing Vannevar Bush‘s
vision of Memex

Linked Data Principles
1. Use URIs to identify the “things” in your data
2. Use http:// URIs so people (and machines) can look them up on the web
3. When a URI is looked up, return a description of the thing in the W3C
Resource Description Format (RDF)
4. Include links to related things
http://www.w3.org/DesignIssues/LinkedData.html
28
[1] Auer, Lehmann, Ngomo, Zaveri: Introduction to Linked Data and Its Lifecycle on the Web. Reasoning Web 2013

1. Graph based RDF data model consisting of S-P-O statements (facts)
RDF & Linked Data in a Nutshell
NasigConf2018
dbpedia:Atlanta
09.06.2018
NASIG
conf:organizes
conf:starts
conf:takesPlaceIn
2. Serialised as RDF Triples:
NASIG conf:organizes NasigConf2018 .
NasigConf2018 conf:starts “2018-06-09”^^xsd:date .
NasigConf2018 conf:takesPlaceAt dbpedia:Atlanta .
3. Publication under URL in Web, Intranet, Extranet
Subject Predicate Object

Creating Knowledge Graphs with RDF
Linked Data
located in
label
industry
headquarters
full nameDHL
Post Tower
162.5 m
Bonn
Logistics Logistik
DHL International GmbH
height
物流
label

Graph consists of:
 Resources (identified via URIs)
 Literals: data values with data type (URI) or language (multilinguality integrated)
 Attributes of resources are also URI-identified (from vocabularies)
Various data sources and vocabularies can be arbitrarily mixed and meshed
URIs can be shortened with namespace prefixes; e.g. dbp: → http://dbpedia.org/resource/
RDF Data Model (a bit more technical)
gn:locatedIn
rdfs:label
dbo:industry
ex:headquarters
foaf:namedbp:DHL_International_GmbH
dbp:Post_Tower
"162.5"^^xsd:decimal
dbp:Bonn
dbp:Logistics
"Logistik"@de
"DHL International GmbH"^^xsd:string
ex:height
"物流"@zh
rdfs:label
rdf:value
unit:Meter
ex:unit

• Fabric of concept, class, property, relationships, entity descriptions
• Uses a knowledge representation formalism
(typically RDF, RDF-Schema, OWL)
• Holistic knowledge (multi-domain, source, granularity):
• instance data (ground truth),
• open (e.g. DBpedia, WikiData), private (e.g. supply chain data),
closed data (product models),
• derived, aggregated data,
• schema data (vocabularies, ontologies)
• meta-data (e.g. provenance, versioning, documentation licensing)
• comprehensive taxonomies to categorize entities
• links between internal and external data
• mappings to data stored in other systems and databases
Knowledge Graphs – A definition
Smart Data for Machine
Learning

Search Engine Optimization & Web-Commerce
 Schema.org used by >20% of Web sites
 Major search engines exploit semantic descriptions
Pharma, Lifesciences
 Mature, comprehensive vocabularies and ontologies
 Billions of disease, drug, clinical trial descriptions
Digital Libraries
 Many established vocabularies (DublinCore, FRBR, EDM)
 Millions of aggregated from thousands of memory institutions in
Europeana, German Digital Library
Emerging Knowledge Graphs & Data Spaces

Paradigm Change in Scholarly Communication
Towards more Knowledge-based Information Flows

36
Paradigm Change in Scholarly Communication Knowledge-based
Information Flows in Science & Technology
Challenges: Digitalisation of Science, monopolisation by commercial actors,
Proliferation of publications, Reproducibility Crisis

37
Mathematics
• Definitions
• Theorems
• Proofs
• Methods
• …
Physics
• Experiments
• Data
• Models
• …
Chemistry
• Substances
• Structures
• Reactions
• …
Computer Science
• Concepts
• Implemen-
tations
• Evaluations
• …
Technology
• Standards
• Processes
• Elements
• Units,
Sensor data
Architecture
• Regulations
• Elements
• Models
• …
Open Research Knowledge Graph
Overarching Concepts
 Research problems
 Definitions
 Research approaches
 Methods
Artefacts
 Publications
 Data
 Software
 Image/Audio/Video
 Knowledge Graphs / Ontologies
Domain specific concepts
Open Research Knowledge
Graph makes comprehensive
and subject-specific concepts
clearly identifiable and links
them semantically (with
clearly described relations)
with each other and with
relevant further artifacts.

39
Search for CRISPR:
>4.000 Results

40
Chemistry Example: CRISPR/Cas Genome Editing

41
Semantic Representation using a Knowledge Graph
Author Robert Reed
Research Problem
Methods
Experimental Data
related Concepts
Genome editing in Lepidoptera
CRISPR/cas9
Lepidoptera; Genome editing; CRSIPR
https://doi.org/10.5281/zenodo.896916
A practial guide to CRISPR/cas9
editing in Lepidoptera
<https://doi.org/10.1101/130344>
Robert Reed
<https://orcid.org/0000-0002-
6065-6728>
Genome editing in
Lepidoptera
Experimental Data
https://doi.org/10.528
1/zenodo.896916
isAuthorOf
adresses
CRSPRS/cas9
isImplementedBy
isEvaluatedWith
Genome editing
<https://www.wikidata.or
g/wiki/Q24630389>
relatesConcept
3. Graph representation
2. Graph Curation Form
1. Original Publication

42
Automatic Generation of Comparisons/Surveys

43
Open Research Knowledge Graph
interlinks existing Services and Resources

Interlinking Article, Software, Video and
Graph resources describing the research

47
Advantages of knowledge based scholarly communication
 Clear identification of all relevant artifacts, concepts, attributes, relationships 
terminological and conceptual precision and sharpness, less ambiguity
 Better and explicit networking of all relevant artifacts and information sources 
traceability
 ORKG machine-readability  new search, retrieval, mining and assistance
applications
 Avoidance of media discontinuities in the different phases of scientific work 
Increased efficiency
 Use of concepts and relationships across disciplinary boundaries 
Interdisciplinarity and transdisciplinarity
 Halting the proliferation of scientific publications  less duplication
 Facilitating the entry of young academics or laypersons  Open Science

48
There is a lot to do:
• Equip existing services with Linked Data interfaces
• Enable the deep semantic description of research, requires
• Good user interfaces
• Scalable storage and search facility
• Collaboration between scientists, libariens, knowledge engineers, machines
Stay tuned
• Mailinglist/group: https://groups.google.com/forum/#!forum/orkg
• Comming soon: Open Research Knowledge Graph: https://orkg.org
• Next workshop at TIB on November, 22nd (after DILS Conference:
https://events.tib.eu/dils2018/)
Outlook

https://de.linkedin.com/in/soerenauer
https://twitter.com/soerenauer
https://www.xing.com/profile/Soeren_Auer
http://www.researchgate.net/profile/Soeren_Auer
TIB & Leibniz University of Hannover
Soeren.Auer@tib.eu
Sören Auer

50
Said Fathalla, Sahar Vahdati, Sören Auer, Christoph Lange:
Towards a Knowledge Graph Representing Research Findings by Semantifying
Survey Articles. TPDL 2017: 315-327,
https://www.researchgate.net/publication/319419350
Sahar Vahdati, Natanael Arndt, Sören Auer, Christoph Lange:
OpenResearch: Collaborative Management of Scholarly Communication Metadata.
EKAW 2016: 778-793, https://www.researchgate.net/publication/309700661
Sören Auer: Towards an Open Research Knowledge Graph
https://zenodo.org/record/1157185
Sören Auer, Viktor Kovtun, Manuel Prinz, Anna Kasprzik, Markus Stocker: Towards a
Knowledge Graph for Science. https://doi.org/10.15488/3401
References

Towards an Open Research Knowledge Graph

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Towards an Open Research Knowledge Graph

Similar a Towards an Open Research Knowledge Graph (20)

Más de Sören Auer

Más de Sören Auer (20)

Último

Último (20)

Towards an Open Research Knowledge Graph

Notas del editor