The document-oriented workflows in science have reached (or already exceeded) the limits of adequacy as highlighted for example by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. Now it is possible to rethink this dominant paradigm of document-centered knowledge exchange and transform it into knowledge-based information flows by representing and expressing knowledge through semantically rich, interlinked knowledge graphs. The core of the establishment of knowledge-based information flows is the creation and evolution of information models for the establishment of a common understanding of data and information between the various stakeholders as well as the integration of these technologies into the infrastructure and processes of search and knowledge exchange in the research library of the future. By integrating these information models into existing and new research infrastructure services, the information structures that are currently still implicit and deeply hidden in documents can be made explicit and directly usable. This has the potential to revolutionize scientific work because information and research results can be seamlessly interlinked with each other and better mapped to complex information needs. Also research results become directly comparable and easier to reuse.
16. 16
New means adapted to the new posibilities were developed, e.g. „zooming“,
dynamics
Business models changed completely
More focus on data, interlinking of data and services and search in the data
Integration, crowdsourcing play an important role
The World of Publishing & Communication
has profundely changed
20. 20
Scientific publishing today
We have:
BUT
• Mainly based on PDF
• Is only partially machine-readable
• Does not preserve structure
• Does not allow embedding of semantics
• Does not facilitate interactivity/dynamicity/
repurposing
• …
21. 21
Proliferation of scientific literature
Duplication and inefficiency
Deficiency of peer-review
Reproducibility crisis
Science is Seriously Flawed
22. 22
Science and engineering articles by region, country: 2004 and 2014
Proliferation of scientific literature
National Science Foundation: Science and Engineering Publication Output Trends: https://www.nsf.gov/statistics/2018/nsf18300/nsf18300.pdf
24. 24
How can we avoid duplication if the terminology, research problems, approaches,
methods, characteristics, evaluations, … are not properly defined and identified?
How would you build an engine/building without properly defining their parts,
relationships, materials, characteristics … ?
Duplication and Inefficiency
25. 25
Lack of:
• Transparency – information is hidden in text
• Integratability – fitting different research results together
• Machine assistance – unstructured content is hard to process
• Identifyability of concepts beyond metadata
• Collaboration – one brain barrier
• Overview – scientists look for the needle in the haystack
Root Cause - Deficiency of Scholarly
Communication?
28. Linked Data Principles
1. Use URIs to identify the “things” in your data
2. Use http:// URIs so people (and machines) can look them up on the web
3. When a URI is looked up, return a description of the thing in the W3C
Resource Description Format (RDF)
4. Include links to related things
http://www.w3.org/DesignIssues/LinkedData.html
28
[1] Auer, Lehmann, Ngomo, Zaveri: Introduction to Linked Data and Its Lifecycle on the Web. Reasoning Web 2013
29. Page 29
1. Graph based RDF data model consisting of S-P-O statements (facts)
RDF & Linked Data in a Nutshell
NasigConf2018
dbpedia:Atlanta
09.06.2018
NASIG
conf:organizes
conf:starts
conf:takesPlaceIn
2. Serialised as RDF Triples:
NASIG conf:organizes NasigConf2018 .
NasigConf2018 conf:starts “2018-06-09”^^xsd:date .
NasigConf2018 conf:takesPlaceAt dbpedia:Atlanta .
3. Publication under URL in Web, Intranet, Extranet
Subject Predicate Object
30. Page 30
Creating Knowledge Graphs with RDF
Linked Data
located in
label
industry
headquarters
full nameDHL
Post Tower
162.5 m
Bonn
Logistics Logistik
DHL International GmbH
height
物流
label
31. Page 31
Graph consists of:
Resources (identified via URIs)
Literals: data values with data type (URI) or language (multilinguality integrated)
Attributes of resources are also URI-identified (from vocabularies)
Various data sources and vocabularies can be arbitrarily mixed and meshed
URIs can be shortened with namespace prefixes; e.g. dbp: → http://dbpedia.org/resource/
RDF Data Model (a bit more technical)
gn:locatedIn
rdfs:label
dbo:industry
ex:headquarters
foaf:namedbp:DHL_International_GmbH
dbp:Post_Tower
"162.5"^^xsd:decimal
dbp:Bonn
dbp:Logistics
"Logistik"@de
"DHL International GmbH"^^xsd:string
ex:height
"物流"@zh
rdfs:label
rdf:value
unit:Meter
ex:unit
32. Page 32
• Fabric of concept, class, property, relationships, entity descriptions
• Uses a knowledge representation formalism
(typically RDF, RDF-Schema, OWL)
• Holistic knowledge (multi-domain, source, granularity):
• instance data (ground truth),
• open (e.g. DBpedia, WikiData), private (e.g. supply chain data),
closed data (product models),
• derived, aggregated data,
• schema data (vocabularies, ontologies)
• meta-data (e.g. provenance, versioning, documentation licensing)
• comprehensive taxonomies to categorize entities
• links between internal and external data
• mappings to data stored in other systems and databases
Knowledge Graphs – A definition
Smart Data for Machine
Learning
34. Page 34
Search Engine Optimization & Web-Commerce
Schema.org used by >20% of Web sites
Major search engines exploit semantic descriptions
Pharma, Lifesciences
Mature, comprehensive vocabularies and ontologies
Billions of disease, drug, clinical trial descriptions
Digital Libraries
Many established vocabularies (DublinCore, FRBR, EDM)
Millions of aggregated from thousands of memory institutions in
Europeana, German Digital Library
Emerging Knowledge Graphs & Data Spaces
35. Paradigm Change in Scholarly Communication
Towards more Knowledge-based Information Flows
36. 36
Paradigm Change in Scholarly Communication Knowledge-based
Information Flows in Science & Technology
Challenges: Digitalisation of Science, monopolisation by commercial actors,
Proliferation of publications, Reproducibility Crisis
37. 37
Mathematics
• Definitions
• Theorems
• Proofs
• Methods
• …
Physics
• Experiments
• Data
• Models
• …
Chemistry
• Substances
• Structures
• Reactions
• …
Computer Science
• Concepts
• Implemen-
tations
• Evaluations
• …
Technology
• Standards
• Processes
• Elements
• Units,
Sensor data
Architecture
• Regulations
• Elements
• Models
• …
Open Research Knowledge Graph
Overarching Concepts
Research problems
Definitions
Research approaches
Methods
Artefacts
Publications
Data
Software
Image/Audio/Video
Knowledge Graphs / Ontologies
Domain specific concepts
Open Research Knowledge
Graph makes comprehensive
and subject-specific concepts
clearly identifiable and links
them semantically (with
clearly described relations)
with each other and with
relevant further artifacts.
41. 41
Semantic Representation using a Knowledge Graph
Author Robert Reed
Research Problem
Methods
Experimental Data
related Concepts
Genome editing in Lepidoptera
CRISPR/cas9
Lepidoptera; Genome editing; CRSIPR
https://doi.org/10.5281/zenodo.896916
A practial guide to CRISPR/cas9
editing in Lepidoptera
<https://doi.org/10.1101/130344>
Robert Reed
<https://orcid.org/0000-0002-
6065-6728>
Genome editing in
Lepidoptera
Experimental Data
https://doi.org/10.528
1/zenodo.896916
isAuthorOf
adresses
CRSPRS/cas9
isImplementedBy
isEvaluatedWith
Genome editing
<https://www.wikidata.or
g/wiki/Q24630389>
relatesConcept
3. Graph representation
2. Graph Curation Form
1. Original Publication
47. 47
Advantages of knowledge based scholarly communication
Clear identification of all relevant artifacts, concepts, attributes, relationships
terminological and conceptual precision and sharpness, less ambiguity
Better and explicit networking of all relevant artifacts and information sources
traceability
ORKG machine-readability new search, retrieval, mining and assistance
applications
Avoidance of media discontinuities in the different phases of scientific work
Increased efficiency
Use of concepts and relationships across disciplinary boundaries
Interdisciplinarity and transdisciplinarity
Halting the proliferation of scientific publications less duplication
Facilitating the entry of young academics or laypersons Open Science
48. 48
There is a lot to do:
• Equip existing services with Linked Data interfaces
• Enable the deep semantic description of research, requires
• Good user interfaces
• Scalable storage and search facility
• Collaboration between scientists, libariens, knowledge engineers, machines
Stay tuned
• Mailinglist/group: https://groups.google.com/forum/#!forum/orkg
• Comming soon: Open Research Knowledge Graph: https://orkg.org
• Next workshop at TIB on November, 22nd (after DILS Conference:
https://events.tib.eu/dils2018/)
Outlook
50. 50
Said Fathalla, Sahar Vahdati, Sören Auer, Christoph Lange:
Towards a Knowledge Graph Representing Research Findings by Semantifying
Survey Articles. TPDL 2017: 315-327,
https://www.researchgate.net/publication/319419350
Sahar Vahdati, Natanael Arndt, Sören Auer, Christoph Lange:
OpenResearch: Collaborative Management of Scholarly Communication Metadata.
EKAW 2016: 778-793, https://www.researchgate.net/publication/309700661
Sören Auer: Towards an Open Research Knowledge Graph
https://zenodo.org/record/1157185
Sören Auer, Viktor Kovtun, Manuel Prinz, Anna Kasprzik, Markus Stocker: Towards a
Knowledge Graph for Science. https://doi.org/10.15488/3401
References
Notas del editor
SITUATION Wissensaustausch erfolgt nach wie vor mittels Dokumenten
■ HOHER AUFWAND beim Erstellen und Lesen der Dokumente / Artikel
■ Maschinelle Unterstützung bei der Verarbeitung / Suche nur begrenzt möglich
■ Viele REIBUNGSVERLUSTE durch Ambiguität, fehlende Vergleichbarkeit
ZIEL Digitalisierung der Wissenschaft durch Etablierung wissensbasierter Informationsflüsse
■ Repräsentation und Kommunikation mittels Wissensgraphen
■ GEMEINSAMES VERSTÄNDNIS von Daten und Informationen durch dezentrale, kollaborative Kuratierung von Wissensgraphen
■ INTEGRATION in existierende und neue Dienste
ERGEBNIS Wissenschaftliches Arbeiten wird revolutioniert
■ Informationen und Forschungsergebnisse können MITEINANDER VERNETZT und besser mit komplexen Informationsbedürfnissen in Verbindung gebracht werden.
■ EFFIZIENZGEWINNE, da Ergebnisse direkt vergleichbar und leichter wiederverwendbar
Verfügbare Genome editing Verfahren
Site-specificity
Hohe Zielgenauigkeit: Wird eine Region ab 18 Nukleotiden sicher erkannt, spricht man von einer eineindeutigen Erkennungsrate der Nukleotidsequenz. Liegt der Wert darunter, steigt die Wahrscheinlichkeit, einen unerwünschten Bereich des Genoms zu erwischen
Ease-of-Use / Cost-Efficiency
Meganukleasen. Erkennen zwar lange Nukleotidsequenzen, aber dafür ist es sehr aufwändig eine passende Meganuklease für eine gewünschte Sequenz zu finden. Sowohl das Engineering als auch das Screening sind kostenintensiv
ZFN. Hohe Screening-Kosten, da Specifity schwer vorherzusagen
Der von den TIB mit Partnerorganisationen entwickelte Open Research Knowledge Graph (1) repräsentiert originäre Forschungsergebnisse explizit semantisch und (2) verknüpft vorhandene Metadaten, Daten, Wissens- & Informationsressourcen reichhaltig miteinander. Der Graph kann von Forschungsgemeinschaften kollaborativ kuratiert werden, sichert die Herkunft (Provenance), repräsentiert den wissenschaftlichen Diskurs und Evolution.