SlideShare una empresa de Scribd logo
1 de 17
HDL
Towards a Harmonized Dataset
Model for Open Data Portals
Ahmad Assaf, Raphaël Troncy And Aline Senart
@ahmadaassaf
PROFILES 15 – 2nd International Workshop on Dataset PROFIling & fEderated Search for Linked Data 1st June 2015
HDL Towards a Harmonized Dataset Model for Open Data Portals
Open Data/Linked Open Data
 Open Data (OD) is the data that can be easily discovered, accessed, reused and
redistributed by anyone [Davies et al. 2014]
 Open Data should be placed in public domain under liberal terms of use and available
in electronic formats that are non-proprietary and machine readable.
 Linked Open Data (LOD) refers to the semantically rich, linked and machine readable
open data.
 Open Data has major benefits for citizens, businesses, societies and governments.
2
HDL Towards a Harmonized Dataset Model for Open Data Portals
Metadata
Metadata is structured information that describes, explains, locates or otherwise makes it
easier to retrieve use or manage information resources
Data Discovery,
exploration and
reuse
Organization
&
identification
Archiving
&
preservation
3
HDL Towards a Harmonized Dataset Model for Open Data Portals
Data Portals/Data Management Systems
 Data Portals (Catalogs) are the entry points to discover published
datasets
 Data Portals are a curated collection of datasets metadata providing a
set discovery and integration services.
 Data Portals can be private like datahub.io, publicdata.eu or private like
enigma.io or quandle.com
 Portals are built on top of Data Management Systems (DMS) like
CKAN, DKAN and Socrata
4
HDL Towards a Harmonized Dataset Model for Open Data Portals
Why a Harmonized Model ?
 Exploring/discovering datasets for
(re)use
 Defining a “minimal” set of
information needed to build a
“profile”
 Building tools that will
automatically generate/validate
metadata models
5
 The Data Catalog Vocabulary (DCAT)✝ is a W3C recommendation to facilitate interoperability
between data catalogs on the web
 DCAT is an RDF vocabulary with three main classes: dcat:Catalog, dcat:Dataset and dcat:Distribution
 DCAT Profiles [extensions built upon DCAT]
 DCAT-AP✝✝ defines a minimal set of properties that should be included in a datasets
profile by specifying mandatory and optional properties
 The Asset Description Metadata Schema (ADMS)✝✝✝ is used to semantically describe
assets (code lists, taxonomies, vocabularies)
HDL Towards a Harmonized Dataset Model for Open Data Portals
Dataset Models - DCAT
6
✝ http://w3.org/TR/vocab-dcat/
✝✝ https://joinup.ec.europa.eu/asset/dcat_application_profile/description
✝✝✝ http://www.w3.org/TR/vocab-adms/
HDL Towards a Harmonized Dataset Model for Open Data Portals
Dataset Models - VoID✝
 RDF vocabulary for interlinked datasets
 In addition to describing datasets, VoID
describes the links between datasets
 VoID defines three main classes:
void:Dataset, void:Linkset and void:subset
 A linkset in voiD is a subclass of a dataset,
used for storing triples to express the
interlinking relationship between datasets
7
✝ http://www.w3.org/TR/void/
HDL Towards a Harmonized Dataset Model for Open Data Portals
Dataset Models – CKAN✝/DKAN✝✝
 Data model describes a set of entities (dataset, resource, group, tag)
 Allow additional information to be added via “extra” arbitrary key/value fields
 The core metadata restricted as a JSON file
 Supports Linked Data and RDF by providing a complete and functional mapping of its
model to LD formats
 CKAN support descriptions of vocabularies
 DKAN is a Drupal based DMS
8
✝ http://ckan.org/
✝✝ http://demo.getdkan.com/
 Online collection of best practices
and case studies to help data
publishers
 POD data model is based on DCAT
 Similarly to DCAT-AP, POD defines
three types of metadata elements:
Required, Required-If and
Expanded(optional)
 Metadata extensions using elements
from the “Expanded” fields
HDL Towards a Harmonized Dataset Model for Open Data Portals
Dataset Models - Continued
 Commercial platform to streamline
data publishing, management,
analysis and reusing.
 The model is designed specifically to
represent tabular data
 The model covers a basic set of
metadata properties and has good
support for geospatial data
 A collection of schema used to
markup HTML pages with structured
data
 Covers many domains. We are
interested in the Dataset schema
although we also use various
properties from schemas like
organizations, authors, etc.
9
✝ http://socrata.com/
✝✝ http://schema.org/
✝✝✝ https://project-open-data.cio.gov/
✝ ✝✝ ✝✝✝
10
Ballmer
effect
anyone?
HDL Towards a Harmonized Dataset Model for Open Data Portals
https://xkcd.com/323/
HDL Towards a Harmonized Dataset Model for Open Data Portals
Metadata Classification – Information Groups
11
Organization
Clustering or curation
solely based on
associations with specific
administration parties
Resource
Actual raw data that can
be downloaded or
accessed directly e.g.
JSON, CSV, SPARQL
endpoint
Tag
Descriptive knowledge
about the dataset
contents and structure.
This can range from
simple textual tags to
semantically rich
controlled terms
Group
Organizational units that
share common
semantics. They can be
seen as a cluster or
curation based on shared
themes/categories
HDL Towards a Harmonized Dataset Model for Open Data Portals
Metadata Classification – Information Types
12
General Information
title, description, id
Ownership Information
author, maintainer_email
Provenance Information
version, creation_date, update_date
Access Information
URL, license_title, license_id
Geospatial Information
bbox, layers
Temporal Information
coverage_from, coverage_to
Statistical Information
max_value, uniques, average
Quality Information
rating, availability, freshness
Dataset Metadata
HDL Towards a Harmonized Dataset Model for Open Data Portals
Harmonization Process
 Examine the model or vocabulary specification and documentation
 Examine existing datasets using these models
 Examine the source code for DMS
13
1 Map the information groups [resource, tag, group, organization]
2 Map the information types [general, ownership, provenance, etc.]
HDL Towards a Harmonized Dataset Model for Open Data Portals
Mapping Information Types
14
CKAN maintainer_email
DKAN maintainer_email
POD ContactPoint -> hasEmail
Schema.org CreativeWork:producer -> Person:email
VoID void:Dataset -> dct:creator -> foaf:Person:givenName
DCAT dcat:Dataset -> dct:creator -> foaf:Person:givenName
HDL Towards a Harmonized Dataset Model for Open Data Portals
Extra Information
15
 Examining the models, we noticed an abundance of information filled in “extras” fields
 Using Roomba we generated aggregation reports to inspect those extras on LOD Cloud✝ and
OpenAfrica✝✝
extras>value:extras>name1 Extra fields names and values
resources>resource_type:resources>name2 Types describing resources
 53% of the datasets in OpenAfrica have additional geospatial attached (spatial-reference-system, spatial
harvester, bbox-east-long, bbox-north-long, bbox-south-long, bbox-west-long)
 16% of the datasets have additional provenance and ownership information (frequency-of-update, dataset-
reference-date)
✝ http://datahub.io/group/lodcloud
✝✝ http://africaopendata.org/https://github.com/ahmadassaf/opendata-checker/tree/master/model
HDL Towards a Harmonized Dataset Model for Open Data Portals 16
https://xkcd.com/927/
17HDL Towards a Harmonized Dataset Model for Open Data Portals
Questions?
Ahmad Assaf
http://ahmadassaf.com/
@ahmadaassaf
http://github.com/ahmadassaf

Más contenido relacionado

La actualidad más candente

Advantages of metadata
Advantages of metadataAdvantages of metadata
Advantages of metadataAzeem Sultan
 
Metadata harvesting
Metadata harvestingMetadata harvesting
Metadata harvestingAndrewLIS688
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to MetadataJenn Riley
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesOpen Data Support
 
Applying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsApplying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsJenn Riley
 
Metadata an overview
Metadata an overviewMetadata an overview
Metadata an overviewrobin fay
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...EUDAT
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014Robert Meusel
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked DataEUCLID project
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapubeswcsummerschool
 

La actualidad más candente (20)

Metadata harvesting Tools
Metadata harvesting ToolsMetadata harvesting Tools
Metadata harvesting Tools
 
Advantages of metadata
Advantages of metadataAdvantages of metadata
Advantages of metadata
 
Metadata harvesting
Metadata harvestingMetadata harvesting
Metadata harvesting
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to Metadata
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 
Gap Analysis
Gap AnalysisGap Analysis
Gap Analysis
 
Applying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsApplying Digital Library Metadata Standards
Applying Digital Library Metadata Standards
 
Metadata an overview
Metadata an overviewMetadata an overview
Metadata an overview
 
Metadata
MetadataMetadata
Metadata
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft Roadmap
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
 
General concepts: DDI
General concepts: DDIGeneral concepts: DDI
General concepts: DDI
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
 
FAIR Data ecosystem
FAIR Data ecosystemFAIR Data ecosystem
FAIR Data ecosystem
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
Meta data
Meta dataMeta data
Meta data
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
 
Providing Linked Data
Providing Linked DataProviding Linked Data
Providing Linked Data
 
Hadoop
HadoopHadoop
Hadoop
 

Destacado

HOTEL EXPO 2016
HOTEL EXPO 2016HOTEL EXPO 2016
HOTEL EXPO 2016HOTEL EXPO
 
LEY DE COMPAÑÍAS
LEY DE COMPAÑÍASLEY DE COMPAÑÍAS
LEY DE COMPAÑÍASjeankrs9
 
Joseph S Stump Resume
Joseph S Stump ResumeJoseph S Stump Resume
Joseph S Stump ResumeJoseph Stump
 
Heroku cloud platform
Heroku cloud platformHeroku cloud platform
Heroku cloud platformHasan Khatib
 
Nascenia: Road to Software Industry
Nascenia: Road to Software IndustryNascenia: Road to Software Industry
Nascenia: Road to Software IndustryNascenia IT
 
боги древних славян
боги древних славянбоги древних славян
боги древних славянШкола№3
 
Bachillerato de humanidades ok
Bachillerato de humanidades okBachillerato de humanidades ok
Bachillerato de humanidades okTECHNO_CISNEROS
 
Reunión de padres y tutores 2016. IES Joaquín Turina
Reunión de padres y tutores 2016. IES Joaquín TurinaReunión de padres y tutores 2016. IES Joaquín Turina
Reunión de padres y tutores 2016. IES Joaquín Turinajesusman
 
Decálogo antibulling
Decálogo antibullingDecálogo antibulling
Decálogo antibullingjesusman
 
Vagrant vs Docker
Vagrant vs DockerVagrant vs Docker
Vagrant vs Dockerjchase50
 
Snyder_Susan-Resume 2016
Snyder_Susan-Resume 2016Snyder_Susan-Resume 2016
Snyder_Susan-Resume 2016Susan Snyder
 
Formatos de manejo de almacén
Formatos de manejo de almacénFormatos de manejo de almacén
Formatos de manejo de almacénValeriaEH888
 
1 q 2016-us-tile-industry-update
1 q 2016-us-tile-industry-update1 q 2016-us-tile-industry-update
1 q 2016-us-tile-industry-updateCristiano Canotti
 

Destacado (20)

HOTEL EXPO 2016
HOTEL EXPO 2016HOTEL EXPO 2016
HOTEL EXPO 2016
 
FPGAs libres
FPGAs libresFPGAs libres
FPGAs libres
 
LEY DE COMPAÑÍAS
LEY DE COMPAÑÍASLEY DE COMPAÑÍAS
LEY DE COMPAÑÍAS
 
2016/10/28: Reset ETSII UPM
2016/10/28: Reset ETSII UPM2016/10/28: Reset ETSII UPM
2016/10/28: Reset ETSII UPM
 
Timeplan
TimeplanTimeplan
Timeplan
 
Joseph S Stump Resume
Joseph S Stump ResumeJoseph S Stump Resume
Joseph S Stump Resume
 
Heroku cloud platform
Heroku cloud platformHeroku cloud platform
Heroku cloud platform
 
Resume 1.4
Resume 1.4Resume 1.4
Resume 1.4
 
Nascenia: Road to Software Industry
Nascenia: Road to Software IndustryNascenia: Road to Software Industry
Nascenia: Road to Software Industry
 
Dina_Condon_Resume_2016
Dina_Condon_Resume_2016Dina_Condon_Resume_2016
Dina_Condon_Resume_2016
 
боги древних славян
боги древних славянбоги древних славян
боги древних славян
 
Useful C++ Features You Should be Using
Useful C++ Features You Should be UsingUseful C++ Features You Should be Using
Useful C++ Features You Should be Using
 
Bachillerato de humanidades ok
Bachillerato de humanidades okBachillerato de humanidades ok
Bachillerato de humanidades ok
 
Inspección atún en lata
Inspección atún en lataInspección atún en lata
Inspección atún en lata
 
Reunión de padres y tutores 2016. IES Joaquín Turina
Reunión de padres y tutores 2016. IES Joaquín TurinaReunión de padres y tutores 2016. IES Joaquín Turina
Reunión de padres y tutores 2016. IES Joaquín Turina
 
Decálogo antibulling
Decálogo antibullingDecálogo antibulling
Decálogo antibulling
 
Vagrant vs Docker
Vagrant vs DockerVagrant vs Docker
Vagrant vs Docker
 
Snyder_Susan-Resume 2016
Snyder_Susan-Resume 2016Snyder_Susan-Resume 2016
Snyder_Susan-Resume 2016
 
Formatos de manejo de almacén
Formatos de manejo de almacénFormatos de manejo de almacén
Formatos de manejo de almacén
 
1 q 2016-us-tile-industry-update
1 q 2016-us-tile-industry-update1 q 2016-us-tile-industry-update
1 q 2016-us-tile-industry-update
 

Similar a HDL Model for Harmonizing Open Data Metadata

Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
 
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...Ahmad Assaf
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse vty
 
Llinked open data training for EU institutions
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutionsOpen Data Support
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederOpenAIRE
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET Journal
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataversevty
 
Linked Data Planet Key Note
Linked Data Planet Key NoteLinked Data Planet Key Note
Linked Data Planet Key Noterumito
 
Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)mhb120
 
Let's downscale the semantic web !
Let's downscale the semantic web !Let's downscale the semantic web !
Let's downscale the semantic web !Christophe Guéret
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data TutorialSören Auer
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataStuart Chalk
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes vty
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 

Similar a HDL Model for Harmonizing Open Data Metadata (20)

How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria PesceHow to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
 
Linked Data In Action
Linked Data In ActionLinked Data In Action
Linked Data In Action
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
 
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 
Llinked open data training for EU institutions
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutions
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
 
Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description Framework
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
Linked Data Planet Key Note
Linked Data Planet Key NoteLinked Data Planet Key Note
Linked Data Planet Key Note
 
Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)
 
Let's downscale the semantic web !
Let's downscale the semantic web !Let's downscale the semantic web !
Let's downscale the semantic web !
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 

Último

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Último (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

HDL Model for Harmonizing Open Data Metadata

  • 1. HDL Towards a Harmonized Dataset Model for Open Data Portals Ahmad Assaf, Raphaël Troncy And Aline Senart @ahmadaassaf PROFILES 15 – 2nd International Workshop on Dataset PROFIling & fEderated Search for Linked Data 1st June 2015
  • 2. HDL Towards a Harmonized Dataset Model for Open Data Portals Open Data/Linked Open Data  Open Data (OD) is the data that can be easily discovered, accessed, reused and redistributed by anyone [Davies et al. 2014]  Open Data should be placed in public domain under liberal terms of use and available in electronic formats that are non-proprietary and machine readable.  Linked Open Data (LOD) refers to the semantically rich, linked and machine readable open data.  Open Data has major benefits for citizens, businesses, societies and governments. 2
  • 3. HDL Towards a Harmonized Dataset Model for Open Data Portals Metadata Metadata is structured information that describes, explains, locates or otherwise makes it easier to retrieve use or manage information resources Data Discovery, exploration and reuse Organization & identification Archiving & preservation 3
  • 4. HDL Towards a Harmonized Dataset Model for Open Data Portals Data Portals/Data Management Systems  Data Portals (Catalogs) are the entry points to discover published datasets  Data Portals are a curated collection of datasets metadata providing a set discovery and integration services.  Data Portals can be private like datahub.io, publicdata.eu or private like enigma.io or quandle.com  Portals are built on top of Data Management Systems (DMS) like CKAN, DKAN and Socrata 4
  • 5. HDL Towards a Harmonized Dataset Model for Open Data Portals Why a Harmonized Model ?  Exploring/discovering datasets for (re)use  Defining a “minimal” set of information needed to build a “profile”  Building tools that will automatically generate/validate metadata models 5
  • 6.  The Data Catalog Vocabulary (DCAT)✝ is a W3C recommendation to facilitate interoperability between data catalogs on the web  DCAT is an RDF vocabulary with three main classes: dcat:Catalog, dcat:Dataset and dcat:Distribution  DCAT Profiles [extensions built upon DCAT]  DCAT-AP✝✝ defines a minimal set of properties that should be included in a datasets profile by specifying mandatory and optional properties  The Asset Description Metadata Schema (ADMS)✝✝✝ is used to semantically describe assets (code lists, taxonomies, vocabularies) HDL Towards a Harmonized Dataset Model for Open Data Portals Dataset Models - DCAT 6 ✝ http://w3.org/TR/vocab-dcat/ ✝✝ https://joinup.ec.europa.eu/asset/dcat_application_profile/description ✝✝✝ http://www.w3.org/TR/vocab-adms/
  • 7. HDL Towards a Harmonized Dataset Model for Open Data Portals Dataset Models - VoID✝  RDF vocabulary for interlinked datasets  In addition to describing datasets, VoID describes the links between datasets  VoID defines three main classes: void:Dataset, void:Linkset and void:subset  A linkset in voiD is a subclass of a dataset, used for storing triples to express the interlinking relationship between datasets 7 ✝ http://www.w3.org/TR/void/
  • 8. HDL Towards a Harmonized Dataset Model for Open Data Portals Dataset Models – CKAN✝/DKAN✝✝  Data model describes a set of entities (dataset, resource, group, tag)  Allow additional information to be added via “extra” arbitrary key/value fields  The core metadata restricted as a JSON file  Supports Linked Data and RDF by providing a complete and functional mapping of its model to LD formats  CKAN support descriptions of vocabularies  DKAN is a Drupal based DMS 8 ✝ http://ckan.org/ ✝✝ http://demo.getdkan.com/
  • 9.  Online collection of best practices and case studies to help data publishers  POD data model is based on DCAT  Similarly to DCAT-AP, POD defines three types of metadata elements: Required, Required-If and Expanded(optional)  Metadata extensions using elements from the “Expanded” fields HDL Towards a Harmonized Dataset Model for Open Data Portals Dataset Models - Continued  Commercial platform to streamline data publishing, management, analysis and reusing.  The model is designed specifically to represent tabular data  The model covers a basic set of metadata properties and has good support for geospatial data  A collection of schema used to markup HTML pages with structured data  Covers many domains. We are interested in the Dataset schema although we also use various properties from schemas like organizations, authors, etc. 9 ✝ http://socrata.com/ ✝✝ http://schema.org/ ✝✝✝ https://project-open-data.cio.gov/ ✝ ✝✝ ✝✝✝
  • 10. 10 Ballmer effect anyone? HDL Towards a Harmonized Dataset Model for Open Data Portals https://xkcd.com/323/
  • 11. HDL Towards a Harmonized Dataset Model for Open Data Portals Metadata Classification – Information Groups 11 Organization Clustering or curation solely based on associations with specific administration parties Resource Actual raw data that can be downloaded or accessed directly e.g. JSON, CSV, SPARQL endpoint Tag Descriptive knowledge about the dataset contents and structure. This can range from simple textual tags to semantically rich controlled terms Group Organizational units that share common semantics. They can be seen as a cluster or curation based on shared themes/categories
  • 12. HDL Towards a Harmonized Dataset Model for Open Data Portals Metadata Classification – Information Types 12 General Information title, description, id Ownership Information author, maintainer_email Provenance Information version, creation_date, update_date Access Information URL, license_title, license_id Geospatial Information bbox, layers Temporal Information coverage_from, coverage_to Statistical Information max_value, uniques, average Quality Information rating, availability, freshness Dataset Metadata
  • 13. HDL Towards a Harmonized Dataset Model for Open Data Portals Harmonization Process  Examine the model or vocabulary specification and documentation  Examine existing datasets using these models  Examine the source code for DMS 13 1 Map the information groups [resource, tag, group, organization] 2 Map the information types [general, ownership, provenance, etc.]
  • 14. HDL Towards a Harmonized Dataset Model for Open Data Portals Mapping Information Types 14 CKAN maintainer_email DKAN maintainer_email POD ContactPoint -> hasEmail Schema.org CreativeWork:producer -> Person:email VoID void:Dataset -> dct:creator -> foaf:Person:givenName DCAT dcat:Dataset -> dct:creator -> foaf:Person:givenName
  • 15. HDL Towards a Harmonized Dataset Model for Open Data Portals Extra Information 15  Examining the models, we noticed an abundance of information filled in “extras” fields  Using Roomba we generated aggregation reports to inspect those extras on LOD Cloud✝ and OpenAfrica✝✝ extras>value:extras>name1 Extra fields names and values resources>resource_type:resources>name2 Types describing resources  53% of the datasets in OpenAfrica have additional geospatial attached (spatial-reference-system, spatial harvester, bbox-east-long, bbox-north-long, bbox-south-long, bbox-west-long)  16% of the datasets have additional provenance and ownership information (frequency-of-update, dataset- reference-date) ✝ http://datahub.io/group/lodcloud ✝✝ http://africaopendata.org/https://github.com/ahmadassaf/opendata-checker/tree/master/model
  • 16. HDL Towards a Harmonized Dataset Model for Open Data Portals 16 https://xkcd.com/927/
  • 17. 17HDL Towards a Harmonized Dataset Model for Open Data Portals Questions? Ahmad Assaf http://ahmadassaf.com/ @ahmadaassaf http://github.com/ahmadassaf

Notas del editor

  1. An asset is something that can be opened and read using a familiar desktop software as opposed to the need to be processed like raw data.
  2. The interlinking is modelled by a linkset (void:Linkset). A linkset in voiD is a subclass of a dataset, used for storing triples to express the interlinking relationship between datasets. In each interlinking triple, the subject is a resource hosted in one dataset and the object is a resource hosted in another dataset. This modelling enables a flexible and powerful way to talk in great detail about the interlinking between two datasets, such as how many links there exist, which kind of links (e.g. owl:sameAs or foaf:knows) are present, or stating who claims these statements.