SlideShare a Scribd company logo
1 of 19
dans.knaw.nl
DANS is een instituut van KNAW en NWO
Data standardization process
for arts and humanities
Vyacheslav Tykhonov
Senior Information Scientist
(DANS-KNAW, Netherlands)
Developing the SSHOC Reference Ontology workshop
ICS-FORTH , Heraklion, Crete
21-22 May, 2019
DANS-KNAW core services
Outline
• Standardization process during data deposit and archiving
(metadata level created by users)
• Research data management and harmonization of deposited
datasets (file level)
• Standardization and enrichment of harvested content (metadata
level provided by different data providers)
• Tracking provenance information for data and tools, moving to FAIR
Big problem: researchers and librarians are not talking to each other
and there is no common Reference model!
Metadata schemas
• EASY TDR has own metadata schema developed for Dutch
scientific landscape but allows Dublin Core export from OAI-
PMH endpoint
• NARCIS is an aggregator that harvesting metadata from
various repositories, no standardization pipeline
• Metadata from Dataverse can be exported as:
Controlled vocabulary and thesaurus
• Linked data is one step forward (or actually backward in the right
direction) on solving some of standardization problems.
• By having shared controlled vocabularies (CV) created and
maintained by experts on various domains, the digital items can
be annotated with them and easily retrieved by other experts
from the same domain without being librarian. It’s clear
indication which vocabulary is good enough and shared by a
critical mass.
• A thesaurus is a semantic network of unique concepts, including
relationships between synonyms, broader and narrower
(parent/child) contexts, and other related concepts. Thesaurus is
hierarchy for controlled vocabularies.
SSHOC data repository
DANS-KNAW is leading the development of SSHOC DataverseEU project.
We’re developing multilingual web interface and localizing metadata fields and developed data
standardization technique based on APIs for CESSDA CVs, Topic Classification and CESSDA CV Manager
services.
SSHOC/CESSDA DataverseEU:
• Hungary (TARKI)
• Sweden (SND)
• Slovenia (ADP)
• Germany (GESIS)
• France (SciencesPro)
• Austria (AUSSDA)
• United Kingdom (UKDA)
• Italy (CNR, UniData)
• Belgium (SODA)
• Latvia (LSZDA)
• Poland (PSNC)
• Norway (DataverseNO)
• Netherlands (DANS-KNAW)
SKOS RDF Vocabularies (CESSDA)
We’re importing thesaurus delivered as SKOS RDF, for example:
Rest API endpoint delivers back JSON suitable for web applications.
Metadata standardization during deposit process
Standardized metadata in Dataverse
Standardized metadata in RDF
All relations exported and available in the Knowledge Graph
and ready for the further querying and exploration:
Research data management
Data standardization process plays a key role in the data
management plan of any organization but current situation in
research data management is very complex:
• too much data chaos in datasets
• no data transparency
• sometimes no standards available
• no provenance information attached to data
• homonyms, synonyms, generalizations, specializations,
spelling variations and mistakes, language versions are all
complicating the keyword-based search and retrieval of
information
Data standardization pipeline based on
chatbot
Mapping produced by AI as result
mappings:
Image-image:
predicateobjects:
- [a, 'http://xmlns.com/foaf/0.1/Image']
- [a, 'http://schema.org/ImageObject']
- [a, 'http://schema.org/CreativeWork']
- [a, 'http://xmlns.com/foaf/0.1/Document']
- ['http://www.w3.org/2000/01/rdf-schema#label', $(image)]
- ['http://schema.org/image', $(image)]
source: dataset-source
subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Image/$(image)
Person-name:
predicateobjects:
- [a, 'http://schema.org/Person']
- [a, 'http://www.w3.org/2000/10/swap/pim/contact#Person']
- [a, 'http://xmlns.com/foaf/0.1/Person']
- [a, 'http://purl.org/dc/terms/Agent']
- [a, 'http://purl.org/goodrelations/v1#BusinessEntity']
- [a, 'http://rhizomik.net/ontologies/copyrightonto.owl#LegalPerson']
- [a, 'http://schema.org/Thing']
- [a, 'http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing']
- [a, 'http://xmlns.com/foaf/0.1/Agent']
- ['http://www.w3.org/2000/01/rdf-schema#label', $(name)]
source: dataset-source
subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Person/$(name)
Place-birth_cty:
predicateobjects:
- [a, 'http://schema.org/Place']
- [a, 'http://purl.org/goodrelations/v1#Location']
- [a, 'http://rdfs.co/juso/SpatialThing']
- [a, 'http://schema.org/Thing']
- ['http://www.w3.org/2000/01/rdf-schema#label', $(birth_cty)]
- ['http://dbpedia.org/ontology/era', $(era)]
- ['http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#isDescribedBy', $(era)]
- ['http://dbpedia.org/ontology/birthDate', $(birth), 'http://www.w3.org/2001/XMLSchema#datetime']
source: dataset-source
subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Place/$(birth_cty)
Place-birth_prv:
predicateobjects:
- [a, 'http://schema.org/Place']
- [a, 'http://purl.org/goodrelations/v1#Location']
- [a, 'http://rdfs.co/juso/SpatialThing']
- [a, 'http://schema.org/Thing']
- ['http://www.w3.org/2000/01/rdf-schema#label', $(birth_prv)]
- ['http://dbpedia.org/ontology/deathDate', $(death), 'http://www.w3.org/2001/XMLSchema#datetime']
source: dataset-source
subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Place/$(birth_prv)
NARCIS metadata (example)
No authority linking or controlled vocabularies support, but…
Tracking Provenance information
Prov-O example from PARTHENOS project
Time Machine association
• large scale project with 300+
partners
• development and support of
sustainable networked services
• trends watching and tracking of
software maturity level
• reliable governance model
• the foundation for the further
innovation!
Conclusion
• development of large-scale networked services out of research
pipelines
• every service should be mature enough, maintainable and follow
continuous integration pipeline
• tracking provenance information for every tool and dataset is the
highest priority
• creation and governance of standardization pipelines based on
services providing access to domain specific controlled vocabularies
and ontologies
• providing access to data, metadata and provenance (processes) in the
Knowledge Graph
• further integration of services maintained by different partners and
deployed in the Cloud
Questions?
Feel free to ask questions!
Vyacheslav (Slava) Tykhonov
e-mail: vyacheslav.tykhonov@dans.knaw.nl
website: http://dans.knaw.nl (DANS-KNAW)

More Related Content

What's hot

Data(base) taxonomy
Data(base) taxonomyData(base) taxonomy
Data(base) taxonomyDejan Radic
 
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...4Science
 
DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15Michele Mennielli
 
DSpace-CRIS_An open source solution for Research_EDU15
DSpace-CRIS_An open source solution for Research_EDU15DSpace-CRIS_An open source solution for Research_EDU15
DSpace-CRIS_An open source solution for Research_EDU15Michele Mennielli
 
Ado Fundamentals
Ado FundamentalsAdo Fundamentals
Ado Fundamentalsasim78
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsAhmad Assaf
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...Jens Mittelbach
 
An Introduction to Linked Data and Microdata
An Introduction to Linked Data and MicrodataAn Introduction to Linked Data and Microdata
An Introduction to Linked Data and MicrodataDLFCLIR
 
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)Jan Polowinski
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talkbenosteen
 
Expressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLExpressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLCredential Engine
 
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...ariadnenetwork
 

What's hot (20)

No sql
No sqlNo sql
No sql
 
Data(base) taxonomy
Data(base) taxonomyData(base) taxonomy
Data(base) taxonomy
 
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
 
Database
DatabaseDatabase
Database
 
Dkan
DkanDkan
Dkan
 
DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15
 
DSpace-CRIS_An open source solution for Research_EDU15
DSpace-CRIS_An open source solution for Research_EDU15DSpace-CRIS_An open source solution for Research_EDU15
DSpace-CRIS_An open source solution for Research_EDU15
 
Ado Fundamentals
Ado FundamentalsAdo Fundamentals
Ado Fundamentals
 
I say NoSQL you say what
I say NoSQL you say whatI say NoSQL you say what
I say NoSQL you say what
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data Portals
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
 
An Introduction to Linked Data and Microdata
An Introduction to Linked Data and MicrodataAn Introduction to Linked Data and Microdata
An Introduction to Linked Data and Microdata
 
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
 
Expressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLExpressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDL
 
The Danish National Bibliography as LOD
The Danish National Bibliography as LODThe Danish National Bibliography as LOD
The Danish National Bibliography as LOD
 
Web Spa
Web SpaWeb Spa
Web Spa
 
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
LODLAM Landscape
LODLAM LandscapeLODLAM Landscape
LODLAM Landscape
 

Similar to Data standardization process for social sciences and humanities

Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesvty
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...vty
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And VisualizationIvan Ermilov
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataMarin Dimitrov
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projectszsrlibrary
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphsSören Auer
 
PoolParty Semantic Platform - Overview
PoolParty Semantic Platform - OverviewPoolParty Semantic Platform - Overview
PoolParty Semantic Platform - OverviewSemantic Web Company
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunitiesvty
 
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar SeriesKatina Toufexis
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM4Science
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked dataEnno Meijers
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationDenodo
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Stefan Dietze
 
DataverseNL as structured data hub
DataverseNL as structured data hubDataverseNL as structured data hub
DataverseNL as structured data hubvty
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Stuart Chalk
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 

Similar to Data standardization process for social sciences and humanities (20)

Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projects
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
PoolParty Semantic Platform - Overview
PoolParty Semantic Platform - OverviewPoolParty Semantic Platform - Overview
PoolParty Semantic Platform - Overview
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
Databases for Data Science
Databases for Data ScienceDatabases for Data Science
Databases for Data Science
 
mx & dbs
mx & dbsmx & dbs
mx & dbs
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
DataverseNL as structured data hub
DataverseNL as structured data hubDataverseNL as structured data hub
DataverseNL as structured data hub
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 

More from vty

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs vty
 
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs vty
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure vty
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museumvty
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataversevty
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7vty
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyvty
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes vty
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryvty
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...vty
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligencevty
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Projectvty
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataversevty
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research datavty
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution vty
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataversevty
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse vty
 

More from vty (20)

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataverse
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repository
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research data
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 

Recently uploaded

GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 

Recently uploaded (20)

GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 

Data standardization process for social sciences and humanities

  • 1. dans.knaw.nl DANS is een instituut van KNAW en NWO Data standardization process for arts and humanities Vyacheslav Tykhonov Senior Information Scientist (DANS-KNAW, Netherlands) Developing the SSHOC Reference Ontology workshop ICS-FORTH , Heraklion, Crete 21-22 May, 2019
  • 3. Outline • Standardization process during data deposit and archiving (metadata level created by users) • Research data management and harmonization of deposited datasets (file level) • Standardization and enrichment of harvested content (metadata level provided by different data providers) • Tracking provenance information for data and tools, moving to FAIR Big problem: researchers and librarians are not talking to each other and there is no common Reference model!
  • 4. Metadata schemas • EASY TDR has own metadata schema developed for Dutch scientific landscape but allows Dublin Core export from OAI- PMH endpoint • NARCIS is an aggregator that harvesting metadata from various repositories, no standardization pipeline • Metadata from Dataverse can be exported as:
  • 5. Controlled vocabulary and thesaurus • Linked data is one step forward (or actually backward in the right direction) on solving some of standardization problems. • By having shared controlled vocabularies (CV) created and maintained by experts on various domains, the digital items can be annotated with them and easily retrieved by other experts from the same domain without being librarian. It’s clear indication which vocabulary is good enough and shared by a critical mass. • A thesaurus is a semantic network of unique concepts, including relationships between synonyms, broader and narrower (parent/child) contexts, and other related concepts. Thesaurus is hierarchy for controlled vocabularies.
  • 6. SSHOC data repository DANS-KNAW is leading the development of SSHOC DataverseEU project. We’re developing multilingual web interface and localizing metadata fields and developed data standardization technique based on APIs for CESSDA CVs, Topic Classification and CESSDA CV Manager services. SSHOC/CESSDA DataverseEU: • Hungary (TARKI) • Sweden (SND) • Slovenia (ADP) • Germany (GESIS) • France (SciencesPro) • Austria (AUSSDA) • United Kingdom (UKDA) • Italy (CNR, UniData) • Belgium (SODA) • Latvia (LSZDA) • Poland (PSNC) • Norway (DataverseNO) • Netherlands (DANS-KNAW)
  • 7. SKOS RDF Vocabularies (CESSDA) We’re importing thesaurus delivered as SKOS RDF, for example: Rest API endpoint delivers back JSON suitable for web applications.
  • 10. Standardized metadata in RDF All relations exported and available in the Knowledge Graph and ready for the further querying and exploration:
  • 11. Research data management Data standardization process plays a key role in the data management plan of any organization but current situation in research data management is very complex: • too much data chaos in datasets • no data transparency • sometimes no standards available • no provenance information attached to data • homonyms, synonyms, generalizations, specializations, spelling variations and mistakes, language versions are all complicating the keyword-based search and retrieval of information
  • 12. Data standardization pipeline based on chatbot
  • 13. Mapping produced by AI as result mappings: Image-image: predicateobjects: - [a, 'http://xmlns.com/foaf/0.1/Image'] - [a, 'http://schema.org/ImageObject'] - [a, 'http://schema.org/CreativeWork'] - [a, 'http://xmlns.com/foaf/0.1/Document'] - ['http://www.w3.org/2000/01/rdf-schema#label', $(image)] - ['http://schema.org/image', $(image)] source: dataset-source subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Image/$(image) Person-name: predicateobjects: - [a, 'http://schema.org/Person'] - [a, 'http://www.w3.org/2000/10/swap/pim/contact#Person'] - [a, 'http://xmlns.com/foaf/0.1/Person'] - [a, 'http://purl.org/dc/terms/Agent'] - [a, 'http://purl.org/goodrelations/v1#BusinessEntity'] - [a, 'http://rhizomik.net/ontologies/copyrightonto.owl#LegalPerson'] - [a, 'http://schema.org/Thing'] - [a, 'http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing'] - [a, 'http://xmlns.com/foaf/0.1/Agent'] - ['http://www.w3.org/2000/01/rdf-schema#label', $(name)] source: dataset-source subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Person/$(name) Place-birth_cty: predicateobjects: - [a, 'http://schema.org/Place'] - [a, 'http://purl.org/goodrelations/v1#Location'] - [a, 'http://rdfs.co/juso/SpatialThing'] - [a, 'http://schema.org/Thing'] - ['http://www.w3.org/2000/01/rdf-schema#label', $(birth_cty)] - ['http://dbpedia.org/ontology/era', $(era)] - ['http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#isDescribedBy', $(era)] - ['http://dbpedia.org/ontology/birthDate', $(birth), 'http://www.w3.org/2001/XMLSchema#datetime'] source: dataset-source subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Place/$(birth_cty) Place-birth_prv: predicateobjects: - [a, 'http://schema.org/Place'] - [a, 'http://purl.org/goodrelations/v1#Location'] - [a, 'http://rdfs.co/juso/SpatialThing'] - [a, 'http://schema.org/Thing'] - ['http://www.w3.org/2000/01/rdf-schema#label', $(birth_prv)] - ['http://dbpedia.org/ontology/deathDate', $(death), 'http://www.w3.org/2001/XMLSchema#datetime'] source: dataset-source subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Place/$(birth_prv)
  • 14. NARCIS metadata (example) No authority linking or controlled vocabularies support, but…
  • 16. Prov-O example from PARTHENOS project
  • 17. Time Machine association • large scale project with 300+ partners • development and support of sustainable networked services • trends watching and tracking of software maturity level • reliable governance model • the foundation for the further innovation!
  • 18. Conclusion • development of large-scale networked services out of research pipelines • every service should be mature enough, maintainable and follow continuous integration pipeline • tracking provenance information for every tool and dataset is the highest priority • creation and governance of standardization pipelines based on services providing access to domain specific controlled vocabularies and ontologies • providing access to data, metadata and provenance (processes) in the Knowledge Graph • further integration of services maintained by different partners and deployed in the Cloud
  • 19. Questions? Feel free to ask questions! Vyacheslav (Slava) Tykhonov e-mail: vyacheslav.tykhonov@dans.knaw.nl website: http://dans.knaw.nl (DANS-KNAW)