Data standardization process for social sciences and humanities

•Download as PPTX, PDF•

0 likes•441 views

vty

Presentation for Time Machine conference 2018.

Science

Common problems in data management
Data standardization process plays a key role in the data
management plan of any organization but current situation in
research data management is very complex:
• too much data chaos in datasets
• no data transparency
• sometimes no standards available
• no provenance information attached to data
• homonyms, synonyms, generalizations, specializations,
spelling variations and mistakes, language versions are all
complicating the keyword-based search and retrieval of
information

Controlled vocabulary and thesaurus
• Linked data is one step forward (or actually backward in the right
direction) on solving some of standardization problems.
• By having shared controlled vocabularies (CV) created and
maintained by experts on various domains, the digital items can
be annotated with them and easily retrieved by other experts
from the same domain without being librarian. It’s clear
indication which vocabulary is good enough and shared by a
critical mass.
• A thesaurus is a semantic network of unique concepts, including
relationships between synonyms, broader and narrower
(parent/child) contexts, and other related concepts. Thesaurus is
hierarchy for controlled vocabularies.

Timbuctoo datastore
• Developed by Huygens ING (KNAW, Netherlands)
for academic research in the Arts and Humanities,
which often yields complex and heterogeneous
data.
• lives up to academic standards for working with
such content: the infrastructure accommodates
different views on a subject and leaves the
interpretation of the data to the researcher.
• keeps track of data provenance and does not
impose a certain research methodology on its

DataverseEU data repository
Dataverse is data repository developed by Harvard IQSS.
DataverseEU project funded by CESSDA, consortium for the promotion of the results of social science
research and supporting international research cooperation. We’re developing multilingual web interface
and localizing metadata fields and developed data standardization technique based on APIs for CESSDA
CVs, Topic Classification and CESSDA CV Manager services.
DataverseEU countries:
• Hungary (TARKI)
• Sweden (SND)
• Slovenia (ADP)
• Germany (GESIS)
• France (SciencesPro)
• Austria (AUSSDA)
• United Kingdom (UKDA)
• Italy (UniData)
• Belgium (SODA)
• Latvia (LSZDA)
• Netherlands (DANS-KNAW)

SKOS RDF Vocabularies is perfect input for Timbuctoo
We’re importing thesaurus delivered as SKOS RDF, for example:
Timbuctoo API endpoint delivers back JSON suitable for web
applications.

Standardization process during data deposit

Record in Dublin Core from Dataverse OAI-PMH endpoint
Here is a problem: values standardized but…
we’ve just lost controlled vocabularies relationship in the Knowledge Graph!
We need Linked Data repository (Timbuctoo) to keep all relations
alive.

Standardized RDF harvested by Timbuctoo
All relations exported and available in the Knowledge Graph
and ready for the further querying and exploration:

Timbuctoo GraphiQL data exploration tool
All semantic relations can be found in @context field, PIDs of linked datasets in @data:
Standardized metadata exported to the Linked Open Data Cloud (LOD)!

Questions?
Feel free to ask questions!
Vyacheslav Tykhonov
e-mail: vyacheslav.tykhonov@dans.knaw.nl
website: http://dans.knaw.nl (DANS-KNAW)

What's hot

Jisc research data shared service overview IDCC 2016Jisc RDM

Introduction to Text MiningCentre for Digital Scholarship, Leiden University Libraries

What I wish I’d known at the start!Jisc RDM

Report from RDAPlenary 3 to DataCitation Community in Australiaamiraryani

Case Studies in Capacity Acquisition: institutional strategies for sourcing R...OCLC

Discovering the research data allianceJisc RDM

Sarah Jones RDM from a disciplinary perspectiveJisc

Natalie Harrower - DRI, RDA and Irelanddri_ireland

Research at risk: developing a shared research data management service for UK...Jisc RDM

Ogier Virginia Tech's RIS EcosystemNational Information Standards Organization (NISO)

Ingrid Dillo - Trustworthy repositories for open research datadri_ireland

Birgit Schmidt: RDA for Libraries from an International Perspectivedri_ireland

Lightning Talks - IntroJisc RDM

Implementing figshare, research data networkJisc RDM

ORCID at SSP15 PreMeeting WorkshopORCID, Inc

A National Approach to Open Data in Ireland: Publishers and Research Data Man...Rebecca Grant

Sarah Jones - National approaches to data managementdri_ireland

Rachel Bruce UK research and data management where are we nowJisc

Researcher needs - a researchers perspectiveJisc

Hilary Hanahoe - The Research Data Alliance in a nutshelldri_ireland

What's hot (20)

Jisc research data shared service overview IDCC 2016

Introduction to Text Mining

What I wish I’d known at the start!

Report from RDAPlenary 3 to DataCitation Community in Australia

Case Studies in Capacity Acquisition: institutional strategies for sourcing R...

Discovering the research data alliance

Sarah Jones RDM from a disciplinary perspective

Natalie Harrower - DRI, RDA and Ireland

Research at risk: developing a shared research data management service for UK...

Ogier Virginia Tech's RIS Ecosystem

Ingrid Dillo - Trustworthy repositories for open research data

Birgit Schmidt: RDA for Libraries from an International Perspective

Lightning Talks - Intro

Implementing figshare, research data network

ORCID at SSP15 PreMeeting Workshop

A National Approach to Open Data in Ireland: Publishers and Research Data Man...

Sarah Jones - National approaches to data management

Rachel Bruce UK research and data management where are we now

Researcher needs - a researchers perspective

Hilary Hanahoe - The Research Data Alliance in a nutshell

Similar to Data standardization process for social sciences and humanities

Building an electronic repository and archives on Dataverse in the European O...vty

Data standardization process for social sciences and humanitiesvty

Decentralised identifiers and knowledge graphs vty

Running Dataverse repository in the European Open Science Cloud (EOSC)vty

Building COVID-19 Museum as Open Science Projectvty

Dataverse opportunitiesvty

DataverseNL as structured data hubvty

(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)OpenAIRE

Fighting COVID-19 with Artificial Intelligencevty

SDSC Industry News Q1 2015Ron Hawkins

From Open Access to Open Standards, (Linked) Data and CollaborationsSimeon Warner

DSpace CRIS EFS Miami.pdf4Science

Knowledge Graph IntroductionSören Auer

Why I don't use Semantic Web technologies anymore, event if they still influe...Gautier Poupeau

Intro to Digitization Projectszsrlibrary

Ariadne: Interoperabilityariadnenetwork

Manola-open aire and data publishing-nfdp13DataDryad

Linked Open Data CloudPretaLLOD

Dive deep into your Data PoolsSemantic Web Company

Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty

Similar to Data standardization process for social sciences and humanities (20)

Building an electronic repository and archives on Dataverse in the European O...

Data standardization process for social sciences and humanities

Decentralised identifiers and knowledge graphs

Running Dataverse repository in the European Open Science Cloud (EOSC)

Building COVID-19 Museum as Open Science Project

Dataverse opportunities

DataverseNL as structured data hub

(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)

Fighting COVID-19 with Artificial Intelligence

SDSC Industry News Q1 2015

From Open Access to Open Standards, (Linked) Data and Collaborations

DSpace CRIS EFS Miami.pdf

Knowledge Graph Introduction

Why I don't use Semantic Web technologies anymore, event if they still influe...

Intro to Digitization Projects

Ariadne: Interoperability

Manola-open aire and data publishing-nfdp13

Linked Open Data Cloud

Dive deep into your Data Pools

Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse

Recently uploaded

Reboulia: features, anatomy, morphology etc.Silpa

FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson

Cyathodium bryophyte: morphology, anatomy, reproduction etc.Silpa

Selaginella: features, morphology ,anatomy and reproduction.Silpa

Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav

Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197

development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar

GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry Areesha Ahmad

PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384

biology HL practice questions IB BIOLOGY1301aanya

Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2

CYTOGENETIC MAP................ ppt.pptxSilpa

Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav

POGONATUM : morphology, anatomy, reproduction etc.Silpa

Human genetics..........................pptxSilpa

PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEGoa Call Girls High Profile Escorts

Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2

GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad

Recently uploaded (20)

Reboulia: features, anatomy, morphology etc.

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry

Cyathodium bryophyte: morphology, anatomy, reproduction etc.

Selaginella: features, morphology ,anatomy and reproduction.

Zoology 5th semester notes( Sumit_yadav).pdf

Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL

development of diagnostic enzyme assay to detect leuser virus

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Role of AI in seed science Predictive modelling and Beyond.pptx

GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry

PSYCHOSOCIAL NEEDS. in nursing II sem pptx

biology HL practice questions IB BIOLOGY

Dr. E. Muralinath_ Blood indices_clinical aspects

CYTOGENETIC MAP................ ppt.pptx

Chemistry 5th semester paper 1st Notes.pdf

POGONATUM : morphology, anatomy, reproduction etc.

Human genetics..........................pptx

PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE

Thyroid Physiology_Dr.E. Muralinath_ Associate Professor

GBSN - Microbiology (Unit 3)Defense Mechanism of the body

Data standardization process for social sciences and humanities

1. dans.knaw.nl DANS is een instituut van KNAW en NWO Data standardization process for social sciences and humanities Vyacheslav Tykhonov Senior Information Scientist Data Archiving and Networked Services (DANS-KNAW, Netherlands) Time Machine conference 2018

2. DANS-KNAW core services

3. Common problems in data management Data standardization process plays a key role in the data management plan of any organization but current situation in research data management is very complex: • too much data chaos in datasets • no data transparency • sometimes no standards available • no provenance information attached to data • homonyms, synonyms, generalizations, specializations, spelling variations and mistakes, language versions are all complicating the keyword-based search and retrieval of information

4. Controlled vocabulary and thesaurus • Linked data is one step forward (or actually backward in the right direction) on solving some of standardization problems. • By having shared controlled vocabularies (CV) created and maintained by experts on various domains, the digital items can be annotated with them and easily retrieved by other experts from the same domain without being librarian. It’s clear indication which vocabulary is good enough and shared by a critical mass. • A thesaurus is a semantic network of unique concepts, including relationships between synonyms, broader and narrower (parent/child) contexts, and other related concepts. Thesaurus is hierarchy for controlled vocabularies.

5. Timbuctoo datastore • Developed by Huygens ING (KNAW, Netherlands) for academic research in the Arts and Humanities, which often yields complex and heterogeneous data. • lives up to academic standards for working with such content: the infrastructure accommodates different views on a subject and leaves the interpretation of the data to the researcher. • keeps track of data provenance and does not impose a certain research methodology on its

6. DataverseEU data repository Dataverse is data repository developed by Harvard IQSS. DataverseEU project funded by CESSDA, consortium for the promotion of the results of social science research and supporting international research cooperation. We’re developing multilingual web interface and localizing metadata fields and developed data standardization technique based on APIs for CESSDA CVs, Topic Classification and CESSDA CV Manager services. DataverseEU countries: • Hungary (TARKI) • Sweden (SND) • Slovenia (ADP) • Germany (GESIS) • France (SciencesPro) • Austria (AUSSDA) • United Kingdom (UKDA) • Italy (UniData) • Belgium (SODA) • Latvia (LSZDA) • Netherlands (DANS-KNAW)

7. SKOS RDF Vocabularies is perfect input for Timbuctoo We’re importing thesaurus delivered as SKOS RDF, for example: Timbuctoo API endpoint delivers back JSON suitable for web applications.

8. Standardization process during data deposit

9. Standardized metadata in Dataverse

10. Record in Dublin Core from Dataverse OAI-PMH endpoint Here is a problem: values standardized but… we’ve just lost controlled vocabularies relationship in the Knowledge Graph! We need Linked Data repository (Timbuctoo) to keep all relations alive.

11. Standardized RDF harvested by Timbuctoo All relations exported and available in the Knowledge Graph and ready for the further querying and exploration:

12. Timbuctoo GraphiQL data exploration tool All semantic relations can be found in @context field, PIDs of linked datasets in @data: Standardized metadata exported to the Linked Open Data Cloud (LOD)!

13. Questions? Feel free to ask questions! Vyacheslav Tykhonov e-mail: vyacheslav.tykhonov@dans.knaw.nl website: http://dans.knaw.nl (DANS-KNAW)

Data standardization process for social sciences and humanities

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data standardization process for social sciences and humanities

Similar to Data standardization process for social sciences and humanities (20)

More from vty

More from vty (20)

Recently uploaded

Recently uploaded (20)

Data standardization process for social sciences and humanities