SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Research Knowledge Graphs at NFDI4DS & GESIS
RKG Workshop
Stefan Dietze, 23.06.2022
NFDI4DS - Consortium
2
FHI
Universität
Leipzig
 Several universities as centers of
excellence in DS and AI.
 Several non-university research
institutes as key contributors.
 Scientific information centers
and infrastructure providers as
providers of data and for access to
domain user groups.
● Paradigm shift in computer science towards data- and deep
learning driven methods (DS = fourth scientific paradigm).
● Reproducibility crisis.
○ Study on neural recommender publications at top-tier-
venues finds only 7/18 reproducible and only 1/18 beating
state-of-the-art. [Dacrema et al., RecSys2019]
● Bias and discrimination from learned data-induced biases.
○ Gender classification performance of neural approaches
performs significantly worse for darker skinned women.
[Buolamwini et al., FAccT2018]
○ Health risk of black people consistently underestimated
by predictive algorithms. [Obermeyer et al., Science2019]
Data Science & AI Challenges
3
▪ Research Data
▪ Publications
▪ Code/Scripts
▪ ML Models
▪ Methods
▪ Claims
▪ Metrics
Relations between scientific resources, data, knowledge Research Data Cycle
Provenance & Dependencies of Research Data, Resources, Knowledge
Common questions for researchers
• Which top-tier publications cite which data/method?
(„dataset authority“)
• Which data was used to train/evaluate which method?
Which method to produce what data?
• Which claims are supported/cited/rejected by what
dataset or publication?
▪ Research Data
▪ Publications
▪ Code/Scripts
▪ ML Models
▪ Methods
▪ Claims
▪ Metrics
Relations between scientific resources, data, knowledge
Provenance & Dependencies of Research Data, Resources, Knowledge
Challenges
• Data & metadata about resources and concepts not
represented in structured, machine-interpretable,
integrated manner (hidden in publications, web pages
etc)
• Persistent identifiers (e.g. DOIs) used inconsistently
(e.g. on publications/datasets)
• Relations and semantics not explicit
• Reproducibility crisis in CS/DS/AI
▪ Research Data
▪ Publications
▪ Code/Scripts
▪ ML Models
▪ Methods
▪ Claims
▪ Metrics
Relations between scientific resources, data, knowledge
Provenance & Dependencies of Research Data, Resources, Knowledge
• Data interoperability and reuse through established W3C standards for data
sharing (on the Web), e.g. RDF, JSON, shared vocabularies
(e.g. schema.org, DCAT, DDI), APIs for data reuse and linking
• Making links between resources and concepts explicit & machine-
interpretable
(e.g. which publications cite what dataset?)
• Consistent use of persisent IDs (e.g. URIs, DOIs) across all data, e.g.
concepts, resources etc („DOIs for all“)
• Partners in NFDI4DS/TA2 (“RKGs”)
Knowledge Graphs for FAIR Research Data in NFDI4DS
Resources
▪ Datasets
▪ Publications
▪ Code
▪ Software
Concepts
▪ Terms &
Definitions
▪ Claims
▪ Methods
▪ Topics
▪ Entities
GESIS & TIB RKGs in NFDI4DS
10
• Community
• Expert-curation/annotation
workflow/tools
• Focus point & hub
• PIDs
https://www.nfdi4datascience.de/
• Large-scale knowledge
graphs
• Automated deep learning-
based methods for
extracting KGs
• Populating ORKG
Resources
▪ Datasets
▪ Publications
▪ Code
▪ Software
Concepts
▪ Terms &
Definitions
▪ Claims
▪ Methods
▪ Topics
▪ Entities
11
Research KGs in Practice: integrated search @ GESIS
https://search.gesis.org/
Dataset
Rel. Publications
12
From publications to machine-interpretable metadata KGs
Disambiguation of dataset & software/script citations
https://data.gesis.org/softwarekg
▪ Training deep learning-
based model for extraction
software & data references
in large-scale data
(3.5 M publications)
▪ Data lifting into KG
▪ 300+ M triples / statements
▪ Search across
data/software/publications
(GESIS Search)
From publications to machine-interpretable metadata KGs
Understanding scientific software/data usage
https://data.gesis.org/softwarekg
(Schindler et al., CIKM2021)
▪ Understanding SW
usage, citation habits
and their evolution
across disciplines
▪ Rise of data science =
rise of software usage
https://data.gesis.org/softwarekg
▪ Top adopters of data
science/AI/software…
From publications to machine-interpretable metadata KGs
Understanding scientific software/data usage
https://data.gesis.org/softwarekg
▪ Top adopters of data
science/AI/software…
▪ …follow the worst
citation habits
From publications to machine-interpretable metadata KGs
Understanding scientific software/data usage
http://dbpedia.org/resource/Tim_Berners-Lee
wna:positive-emotion
onyx:Intensity "0.75"
onyx:Intensity "0.0"
http://dbpedia.org/resource/Solid
wna:negative-emotion
From social media to machine-interpretable research data KGs
Building a public research knowledge graph from Twitter data
https://data.gesis.org/tweetskb
From social media to machine-interpretable research data KGs
https://data.gesis.org/tweetskb
TweetsKB – a large-scale research KG of societal opinions
▪ Harvesting & archiving of 10 Billion tweets
(permanent collection from Twitter 1% sample since
2013)
▪ Information extraction pipeline to build a KG of
entities, interactions & sentiments
(distributed batch processing via Hadoop
Map/Reduce)
o Entity linking with knowledge graph/DBpedia
(“president”/“potus”/”trump” =>
dbp:DonaldTrump)
o Sentiment analysis/annotation
o Geotagging
o Lifting into knowledge graph schema
▪ Public, privacy-aware, large-scale research corpus
of public opinions and their evolution
=> interdisciplinary research
P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze, TweetsKB: A Public
and Large-Scale RDF Corpus of Annotated Tweets, ESWC'18.
RKG-based social science research using TweetsKB
https://dd4p.gesis.org
Investigating Vaccine Hesitancy in DACH countries
Germany suspends
vaccinations with Astra
Zeneca
Twitter discourse zu “Impfbereitschaft”
RKG-based social science research using TweetsKB
Investigating Vaccine Hesitancy in DACH countries
https://dd4p.gesis.org
RKG-based discourse analysis using TweetsKB
Vaccine Hesitancy– key topics in “safety” category
„Schwangerschaft“ „Kimmich“
„Alter“
„Nebenwirkungen“
„Herzinfarkt“
„Zulassung“
https://dd4p.gesis.org
Summary: Research KGs @ GESIS
21
Tools for constructing scholarly knowledge graphs
● NLP and deep learning-powered methods for extracting large-scale KGs
about methods, claims, data, software involved in the scientific process
Large-scale scholarly KGs, e.g.
● KGs about scholarly use of software & research data
(e.g. SoftwareKG: 1.8 M disambiguated software mentions extracted
from 3 M publications, https://data.gesis.org/softwarekg/)
● Web mined KGs of social science research data, e.g. public opinions,
claims and attitudes expressed on social media
(e.g. TweetsKB: > 10 Bn semantically annotated tweets, sentiments,
https://data.gesis.org/tweetskb)
Semantic Search powered by KGs and related tools
● RKG-powered search across scholarly publications, datasets, methods
and their relations (e.g. GESIS Search, https://search.gesis.org)
https://gesis.org/en/kts
https://search.gesis.org
Outlook: a joint Research KG in NFDI4DS
22
Resources
▪ Datasets
▪ Publications
▪ Code
▪ Software
Concepts
▪ Terms &
Definitions
▪ Claims
▪ Methods
▪ Topics
▪ Entities
• Community
• Expert-curation/annotation
workflow/tools
• Focus point & hub
• PIDs
https://www.nfdi4datascience.de/
• Large-scale knowledge
graphs
• Automated deep learning-
based methods for
extracting KGs
• Populating ORKG
23
@stefandietze
http://stefandietze.net

Más contenido relacionado

Similar a Research Knowledge Graphs at NFDI4DS & GESIS

Research Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group SessionResearch Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group Sessionamiraryani
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environmentphilipdurbin
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonAfrican Open Science Platform
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonAfrican Open Science Platform
 
DCC and FAIR initiatives
DCC and FAIR initiativesDCC and FAIR initiatives
DCC and FAIR initiativesSarah Jones
 
Rda nitrd 2015 berman - final
Rda nitrd 2015 berman  - finalRda nitrd 2015 berman  - final
Rda nitrd 2015 berman - finalKathy Fontaine
 
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...African Open Science Platform
 
Birgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International PerspectiveBirgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International Perspectivedri_ireland
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Robert Oostenveld
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018Susanna-Assunta Sansone
 
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...Academy of Science of South Africa (ASSAf)
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryRobin Rice
 
Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries? Robin Rice
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things dataARDC
 
International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)albert ca
 
Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018 Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018 Clare Dean
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...LIBER Europe
 
Open Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesOpen Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesMartin Donnelly
 

Similar a Research Knowledge Graphs at NFDI4DS & GESIS (20)

Research Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group SessionResearch Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group Session
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
CODATA: Open Data, FAIR Data and Open Science/Simon Hodson
CODATA: Open Data, FAIR Data and Open Science/Simon HodsonCODATA: Open Data, FAIR Data and Open Science/Simon Hodson
CODATA: Open Data, FAIR Data and Open Science/Simon Hodson
 
DCC and FAIR initiatives
DCC and FAIR initiativesDCC and FAIR initiatives
DCC and FAIR initiatives
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
Rda nitrd 2015 berman - final
Rda nitrd 2015 berman  - finalRda nitrd 2015 berman  - final
Rda nitrd 2015 berman - final
 
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
 
Birgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International PerspectiveBirgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International Perspective
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018
 
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
 
Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
 
International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)
 
Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018 Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...
 
Open Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesOpen Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practices
 

Más de Stefan Dietze

AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...Stefan Dietze
 
An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...Stefan Dietze
 
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Stefan Dietze
 
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Stefan Dietze
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphsStefan Dietze
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Stefan Dietze
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...Stefan Dietze
 
Using AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebUsing AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebStefan Dietze
 
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesAnalysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesStefan Dietze
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebStefan Dietze
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebStefan Dietze
 
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningStefan Dietze
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebStefan Dietze
 
Mining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebMining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebStefan Dietze
 
Towards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebTowards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebStefan Dietze
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesStefan Dietze
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Stefan Dietze
 
Dietze linked data-vr-es
Dietze linked data-vr-esDietze linked data-vr-es
Dietze linked data-vr-esStefan Dietze
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Stefan Dietze
 
Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)Stefan Dietze
 

Más de Stefan Dietze (20)

AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...
 
An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...
 
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
 
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphs
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
 
Using AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebUsing AI to understand everyday learning on the Web
Using AI to understand everyday learning on the Web
 
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesAnalysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online Activities
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the Web
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
 
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday Learning
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
 
Mining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebMining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the Web
 
Towards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebTowards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the Web
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital Libraries
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
Dietze linked data-vr-es
Dietze linked data-vr-esDietze linked data-vr-es
Dietze linked data-vr-es
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
 
Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)
 

Último

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Último (20)

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Research Knowledge Graphs at NFDI4DS & GESIS

  • 1. Research Knowledge Graphs at NFDI4DS & GESIS RKG Workshop Stefan Dietze, 23.06.2022
  • 2. NFDI4DS - Consortium 2 FHI Universität Leipzig  Several universities as centers of excellence in DS and AI.  Several non-university research institutes as key contributors.  Scientific information centers and infrastructure providers as providers of data and for access to domain user groups.
  • 3. ● Paradigm shift in computer science towards data- and deep learning driven methods (DS = fourth scientific paradigm). ● Reproducibility crisis. ○ Study on neural recommender publications at top-tier- venues finds only 7/18 reproducible and only 1/18 beating state-of-the-art. [Dacrema et al., RecSys2019] ● Bias and discrimination from learned data-induced biases. ○ Gender classification performance of neural approaches performs significantly worse for darker skinned women. [Buolamwini et al., FAccT2018] ○ Health risk of black people consistently underestimated by predictive algorithms. [Obermeyer et al., Science2019] Data Science & AI Challenges 3
  • 4. ▪ Research Data ▪ Publications ▪ Code/Scripts ▪ ML Models ▪ Methods ▪ Claims ▪ Metrics Relations between scientific resources, data, knowledge Research Data Cycle Provenance & Dependencies of Research Data, Resources, Knowledge
  • 5. Common questions for researchers • Which top-tier publications cite which data/method? („dataset authority“) • Which data was used to train/evaluate which method? Which method to produce what data? • Which claims are supported/cited/rejected by what dataset or publication? ▪ Research Data ▪ Publications ▪ Code/Scripts ▪ ML Models ▪ Methods ▪ Claims ▪ Metrics Relations between scientific resources, data, knowledge Provenance & Dependencies of Research Data, Resources, Knowledge
  • 6. Challenges • Data & metadata about resources and concepts not represented in structured, machine-interpretable, integrated manner (hidden in publications, web pages etc) • Persistent identifiers (e.g. DOIs) used inconsistently (e.g. on publications/datasets) • Relations and semantics not explicit • Reproducibility crisis in CS/DS/AI ▪ Research Data ▪ Publications ▪ Code/Scripts ▪ ML Models ▪ Methods ▪ Claims ▪ Metrics Relations between scientific resources, data, knowledge Provenance & Dependencies of Research Data, Resources, Knowledge
  • 7. • Data interoperability and reuse through established W3C standards for data sharing (on the Web), e.g. RDF, JSON, shared vocabularies (e.g. schema.org, DCAT, DDI), APIs for data reuse and linking • Making links between resources and concepts explicit & machine- interpretable (e.g. which publications cite what dataset?) • Consistent use of persisent IDs (e.g. URIs, DOIs) across all data, e.g. concepts, resources etc („DOIs for all“) • Partners in NFDI4DS/TA2 (“RKGs”) Knowledge Graphs for FAIR Research Data in NFDI4DS Resources ▪ Datasets ▪ Publications ▪ Code ▪ Software Concepts ▪ Terms & Definitions ▪ Claims ▪ Methods ▪ Topics ▪ Entities
  • 8. GESIS & TIB RKGs in NFDI4DS 10 • Community • Expert-curation/annotation workflow/tools • Focus point & hub • PIDs https://www.nfdi4datascience.de/ • Large-scale knowledge graphs • Automated deep learning- based methods for extracting KGs • Populating ORKG Resources ▪ Datasets ▪ Publications ▪ Code ▪ Software Concepts ▪ Terms & Definitions ▪ Claims ▪ Methods ▪ Topics ▪ Entities
  • 9. 11 Research KGs in Practice: integrated search @ GESIS https://search.gesis.org/ Dataset Rel. Publications
  • 10. 12 From publications to machine-interpretable metadata KGs Disambiguation of dataset & software/script citations https://data.gesis.org/softwarekg ▪ Training deep learning- based model for extraction software & data references in large-scale data (3.5 M publications) ▪ Data lifting into KG ▪ 300+ M triples / statements ▪ Search across data/software/publications (GESIS Search)
  • 11. From publications to machine-interpretable metadata KGs Understanding scientific software/data usage https://data.gesis.org/softwarekg (Schindler et al., CIKM2021) ▪ Understanding SW usage, citation habits and their evolution across disciplines ▪ Rise of data science = rise of software usage
  • 12. https://data.gesis.org/softwarekg ▪ Top adopters of data science/AI/software… From publications to machine-interpretable metadata KGs Understanding scientific software/data usage
  • 13. https://data.gesis.org/softwarekg ▪ Top adopters of data science/AI/software… ▪ …follow the worst citation habits From publications to machine-interpretable metadata KGs Understanding scientific software/data usage
  • 14. http://dbpedia.org/resource/Tim_Berners-Lee wna:positive-emotion onyx:Intensity "0.75" onyx:Intensity "0.0" http://dbpedia.org/resource/Solid wna:negative-emotion From social media to machine-interpretable research data KGs Building a public research knowledge graph from Twitter data https://data.gesis.org/tweetskb
  • 15. From social media to machine-interpretable research data KGs https://data.gesis.org/tweetskb TweetsKB – a large-scale research KG of societal opinions ▪ Harvesting & archiving of 10 Billion tweets (permanent collection from Twitter 1% sample since 2013) ▪ Information extraction pipeline to build a KG of entities, interactions & sentiments (distributed batch processing via Hadoop Map/Reduce) o Entity linking with knowledge graph/DBpedia (“president”/“potus”/”trump” => dbp:DonaldTrump) o Sentiment analysis/annotation o Geotagging o Lifting into knowledge graph schema ▪ Public, privacy-aware, large-scale research corpus of public opinions and their evolution => interdisciplinary research P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze, TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets, ESWC'18.
  • 16. RKG-based social science research using TweetsKB https://dd4p.gesis.org Investigating Vaccine Hesitancy in DACH countries
  • 17. Germany suspends vaccinations with Astra Zeneca Twitter discourse zu “Impfbereitschaft” RKG-based social science research using TweetsKB Investigating Vaccine Hesitancy in DACH countries https://dd4p.gesis.org
  • 18. RKG-based discourse analysis using TweetsKB Vaccine Hesitancy– key topics in “safety” category „Schwangerschaft“ „Kimmich“ „Alter“ „Nebenwirkungen“ „Herzinfarkt“ „Zulassung“ https://dd4p.gesis.org
  • 19. Summary: Research KGs @ GESIS 21 Tools for constructing scholarly knowledge graphs ● NLP and deep learning-powered methods for extracting large-scale KGs about methods, claims, data, software involved in the scientific process Large-scale scholarly KGs, e.g. ● KGs about scholarly use of software & research data (e.g. SoftwareKG: 1.8 M disambiguated software mentions extracted from 3 M publications, https://data.gesis.org/softwarekg/) ● Web mined KGs of social science research data, e.g. public opinions, claims and attitudes expressed on social media (e.g. TweetsKB: > 10 Bn semantically annotated tweets, sentiments, https://data.gesis.org/tweetskb) Semantic Search powered by KGs and related tools ● RKG-powered search across scholarly publications, datasets, methods and their relations (e.g. GESIS Search, https://search.gesis.org) https://gesis.org/en/kts https://search.gesis.org
  • 20. Outlook: a joint Research KG in NFDI4DS 22 Resources ▪ Datasets ▪ Publications ▪ Code ▪ Software Concepts ▪ Terms & Definitions ▪ Claims ▪ Methods ▪ Topics ▪ Entities • Community • Expert-curation/annotation workflow/tools • Focus point & hub • PIDs https://www.nfdi4datascience.de/ • Large-scale knowledge graphs • Automated deep learning- based methods for extracting KGs • Populating ORKG