SlideShare a Scribd company logo
1 of 22
How to describe a dataset.
Interoperability issues
Valeria Pesce
Global Forum on Agricultural Research
Definition of “dataset”
The term “dataset” has been defined in several ways, all of which
further specify or extend the basic concept of “a collection of data”.
Definition given by the W3C Government Linked Data Working Group:
A dataset is “a collection of data, published or curated by a
single source, and available for access or download in one or
more formats”
The “instances” of the dataset “available for access or
download in one or more formats” are called
“distributions”. A dataset can have many distributions.
Examples of distributions include a downloadable CSV
file, an API or an RSS feed.
Definition of “interoperability”
“Data interoperability is a feature of datasets -
and of information services that give access to
datasets - whereby data can easily be retrieved,
processed, re-used, and re-packaged
(“operated”) by other systems.”
Interim Proceedings of International Expert Consultation on “Building the CIARD
Framework for Data and Information Sharing”, CIARD (2011)
software applications
datasets have to be machine-readable
What applications need
Besides information common to any type of resource (name, author /
owner, date…), applications have to find enough metadata about
datasets to understand:
1. the specific coverage of the dataset (type of data, thematic
coverage, geographic coverage)
2. the necessary technical specifications to retrieve and parse a
distribution of the dataset (format, protocol etc.)
3. the conditions for re-use (rights, licenses)
4. the “dimensions” covered by the dataset (e.g. temperature,
time, salinity, gene, coordinates)
5. the semantics of the dimensions (units of measure, time
granularity, syntax, reference taxonomies)
Partial answers in existing vocabularies
• DCAT vocabulary
– RDF vocabulary for describing any dataset
– Datasets can be standalone or part of a “catalog”
– Datasets are accessible through several “distributions”
– “Other, complementary vocabularies may be used together with DCAT to provide
more detailed format-specific information. For example, properties from the VoID
vocabulary can be used if that dataset is in RDF format.”
• VOID vocabulary
– RDF vocabulary for expressing metadata about RDF datasets
• (SDMX ) DataCube vocabulary
– RDF vocabulary for describing statistical datasets
– Useful for attaching metadata about the “data structure” to any dataset that
doesn’t follow a known published standard
Coverage of a dataset
• This can be handled by common Dublin Core properties like subject and
coverage.
• DCAT re-uses these DC properties.
Issue 1: No specific property for the type of data covered in a dataset
The values of these properties have to be understood by machines:
- The value should be standardized, possibly a URI
- The URI should be de-referenceable to a thing
- The thing should be part of an authority list / taxonomy
Issue 3: There is no authority vocabulary for types of data
Issue 1
Issue 2
Conditions for re-use
• DCAT re-uses the license DC property at the level of
distributions
• DCAT re-uses the rights DC property at bith the level
of dataset and the level of distribution
dc:license > dc:LicenseDocument
dc:rights > dc:RightsStatement
W3C DCAT > DCAT AP
DCAT core
Technical properties
The necessary technical specifications to retrieve and
parse a distribution of a dataset (format, protocol etc.)
• DCAT re-uses the DC format property;
Issue No property for protocol
The values of these properties have to be understood by
machines, possibly URIs:
Issue2 No comprehensive RDF authority lists for these
values (partial: DC Types; non-RDF: IANA types)
Issue 1
Issue 2
VOID
VOID can help with the protocol metadata but only for
RDF datasets:
- Property for data dump: dataDump
- Property for SPARQL endpoint: sparqlEndpoint
“Dimensions” and their semantics
DCAT does not describe the dimensions of a dataset,
except for a reference to a standard if the dataset
dimensions can be defined by a formalized standard
(e.g. an XML schema or an RDF vocabulary or an ISO
standard)
dc:conformsTo > dc:Standard
Statistical vocabularies can help
with the description of the dimensions
SDMX: data structure and dimensions
SDMX: Statistical Data and Metadata Exchange
The data structure definition is a description of all the metadata needed to
understand the data set structure.
This includes:
• identification of the dimensions (Dimension) according to standard
statistical terminology,
• the key structure (KeyDescriptor),
• the code-lists (CodeList) that enumerate valid values for each dimension
• coded attribute (CodedAttribute), information about whether attributes
are required or optional and coded or free text.
Given the metadata in the data structure definition, all of the data in the
data set becomes meaningful.
DataCube: simplified SDMX in RDF
DataCube: simplified SDMX in RDF
Reference to a concept scheme
DataCube: simplified SDMX in RDF
“Semantic role” of the property
DataCube: simplified SDMX in RDF
“Semantic role” of
Combining different vocabularies
Name
URL
Owner
Content type
Topic(s)
Language
Metadata set(s)
Data structure
Distribution(s)
[…]
DATASET
Name
Protocol
Endpoint URL
Media type
Format
Size
DISTRIBUTION
DCAT model
Dimensions
Attributes
Measures
Value lists
DATA STRUCTURE
DataCube model
Catalog: the directory
Vocabulary(ies)
SPARQL endpoint
Data dump
Serialization format
Number of triples
RDF dataset info
VOID properties
If one or more known
published metadata sets
are used, just fill
“metadata set(s)”,
otherwise link to a “data
structure” with custom
“dimensions”
IF media type has RDF
or SPARQL response
Tools for managing dataset metadata
• CKAN maintained by the Open Knowledge Foundation
Uses most of DCAT. Doesn’t describe dimensions.
Also provides a global dataset hub called the Datahub
• Dataverse created by Harvard University
Uses a custom vocabulary. Doesn’t describe dimensions.
• Commercial solutions
• Repositories and catalogs:
OpenAIRE, DataCite (using re3data to search repositories) and Dryad
use their own vocabularies.
• CIARD RING
Uses full DCAT AP with some extended properties (protocol, data
type) and local taxonomies with URIs mapped when possible to
authorities.
Next steps: adding DataCube properties for dimensions.
Major outstanding issues
• Some missing properties in existing vocabularies:
 approach vocabulary owners OR extend vocabularies
• Missing vocabularies for protocols, formats
 approach standardizing bodies?
 perhaps specific dataset formats?
• Need for more standardized semantics for
dimensions:
 Joint discussions with the RDA Data Type Registries WG?
• Lack of interoperability metadata in existing tools
References
• W3C DCAT: http://www.w3.org/TR/vocab-dcat/
• DCAT AP: https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat-
application-profile-data-portals-europe-final
• DataCube: http://purl.org/linked-data/cube#
• VOID: http://rdfs.org/ns/void-guide
• VIVO Datastar: http://sourceforge.net/projects/vivo/files/Datastar%20ontology/
• CERIF for datasets: https://cerif4datasets.wordpress.com/c4d-deliverables/
• CKAN: http://ckan.org/
• Datahub: http://datahub.io/
• DataCite: http://search.datacite.org/ui?q=subject%3Aagriculture
• Re3data: http://www.re3data.org
• Dryad: http://datadryad.org/
• OpenAIRE: https://www.openaire.eu/
Thank you
Valeria Pesce
Global Forum on Agricultural Research

More Related Content

What's hot

What's hot (20)

Metadata Mapping & Crosswalks
Metadata Mapping & CrosswalksMetadata Mapping & Crosswalks
Metadata Mapping & Crosswalks
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
 
Metadata Standards
Metadata StandardsMetadata Standards
Metadata Standards
 
Metadata crosswalks
Metadata crosswalksMetadata crosswalks
Metadata crosswalks
 
Data analysis in dataverse & visualization of datasets on historical maps
Data analysis in dataverse & visualization of datasets on historical mapsData analysis in dataverse & visualization of datasets on historical maps
Data analysis in dataverse & visualization of datasets on historical maps
 
General concepts: DDI
General concepts: DDIGeneral concepts: DDI
General concepts: DDI
 
Applying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsApplying Digital Library Metadata Standards
Applying Digital Library Metadata Standards
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
 
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogueseROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
 
Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)
 
Data(base) taxonomy
Data(base) taxonomyData(base) taxonomy
Data(base) taxonomy
 
Metadata an overview
Metadata an overviewMetadata an overview
Metadata an overview
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT Ecosystem
 
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
 
Good Practice in Research Data Management
Good Practice in Research Data ManagementGood Practice in Research Data Management
Good Practice in Research Data Management
 
FAIR Data ecosystem
FAIR Data ecosystemFAIR Data ecosystem
FAIR Data ecosystem
 
Krish data controls
Krish data controlsKrish data controls
Krish data controls
 
The JISC DC Application Profiles: Some thoughts on requirements and scope
The JISC DC Application Profiles: Some thoughts on requirements and scopeThe JISC DC Application Profiles: Some thoughts on requirements and scope
The JISC DC Application Profiles: Some thoughts on requirements and scope
 

Similar to How to Describe a Dataset. Interoperability Issues, by Valeria Pesce

Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemes
Richard.Sapon-White
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29
Julie Allinson
 

Similar to How to Describe a Dataset. Interoperability Issues, by Valeria Pesce (20)

Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
 
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS TechnologiesEasily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
DC-2008 Architecture Forum Open session
DC-2008 Architecture Forum Open sessionDC-2008 Architecture Forum Open session
DC-2008 Architecture Forum Open session
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description Framework
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemes
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft Roadmap
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29
 
Validation: Requirements and approaches
Validation: Requirements and approachesValidation: Requirements and approaches
Validation: Requirements and approaches
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents
 
Lecture01 257
Lecture01 257Lecture01 257
Lecture01 257
 
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODSAlphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
 
Understanding Data
Understanding Data Understanding Data
Understanding Data
 
Dataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardDataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standard
 

More from AIMS (Agricultural Information Management Standards)

Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
AIMS (Agricultural Information Management Standards)
 
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
AIMS (Agricultural Information Management Standards)
 
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research PublishingWebinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
AIMS (Agricultural Information Management Standards)
 
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
AIMS (Agricultural Information Management Standards)
 
Research4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portesResearch4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portes
AIMS (Agricultural Information Management Standards)
 
Publishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmosPublishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmos
AIMS (Agricultural Information Management Standards)
 
Research4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertasResearch4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertas
AIMS (Agricultural Information Management Standards)
 

More from AIMS (Agricultural Information Management Standards) (20)

Linked Data Competency Index : Mapping the field for teachers and learners
 Linked Data Competency Index : Mapping the field for teachers and learners Linked Data Competency Index : Mapping the field for teachers and learners
Linked Data Competency Index : Mapping the field for teachers and learners
 
Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...
 
Assigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
Assigning Digital Object Identifiers (DOIs) to Plant Genetic ResourcesAssigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
Assigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
 
VocBench 3: some insights on the forthcoming release
VocBench 3: some insights on the forthcoming release VocBench 3: some insights on the forthcoming release
VocBench 3: some insights on the forthcoming release
 
The case for Digital Objects Identifiers (DOIs) in support of research activi...
The case for Digital Objects Identifiers (DOIs) in support of research activi...The case for Digital Objects Identifiers (DOIs) in support of research activi...
The case for Digital Objects Identifiers (DOIs) in support of research activi...
 
Webinar@AIMS_FAIR Principles and Data Management Planning
Webinar@AIMS_FAIR Principles and Data Management PlanningWebinar@AIMS_FAIR Principles and Data Management Planning
Webinar@AIMS_FAIR Principles and Data Management Planning
 
Webinar@ASIRA: How to foster openness from an academic library
Webinar@ASIRA: How to foster openness from an academic library Webinar@ASIRA: How to foster openness from an academic library
Webinar@ASIRA: How to foster openness from an academic library
 
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
 
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
 
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
 
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA) Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
 
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
 
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
 
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research PublishingWebinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
 
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
 
Using AGRIS as a portal of choice to access agricultural research and technol...
Using AGRIS as a portal of choice to access agricultural research and technol...Using AGRIS as a portal of choice to access agricultural research and technol...
Using AGRIS as a portal of choice to access agricultural research and technol...
 
Research4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portesResearch4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portes
 
Publishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmosPublishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmos
 
Research4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertasResearch4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertas
 
Research4Life: The library that opens doors
Research4Life: The library that opens doorsResearch4Life: The library that opens doors
Research4Life: The library that opens doors
 

Recently uploaded

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 

Recently uploaded (20)

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 

How to Describe a Dataset. Interoperability Issues, by Valeria Pesce

  • 1. How to describe a dataset. Interoperability issues Valeria Pesce Global Forum on Agricultural Research
  • 2. Definition of “dataset” The term “dataset” has been defined in several ways, all of which further specify or extend the basic concept of “a collection of data”. Definition given by the W3C Government Linked Data Working Group: A dataset is “a collection of data, published or curated by a single source, and available for access or download in one or more formats” The “instances” of the dataset “available for access or download in one or more formats” are called “distributions”. A dataset can have many distributions. Examples of distributions include a downloadable CSV file, an API or an RSS feed.
  • 3. Definition of “interoperability” “Data interoperability is a feature of datasets - and of information services that give access to datasets - whereby data can easily be retrieved, processed, re-used, and re-packaged (“operated”) by other systems.” Interim Proceedings of International Expert Consultation on “Building the CIARD Framework for Data and Information Sharing”, CIARD (2011) software applications datasets have to be machine-readable
  • 4. What applications need Besides information common to any type of resource (name, author / owner, date…), applications have to find enough metadata about datasets to understand: 1. the specific coverage of the dataset (type of data, thematic coverage, geographic coverage) 2. the necessary technical specifications to retrieve and parse a distribution of the dataset (format, protocol etc.) 3. the conditions for re-use (rights, licenses) 4. the “dimensions” covered by the dataset (e.g. temperature, time, salinity, gene, coordinates) 5. the semantics of the dimensions (units of measure, time granularity, syntax, reference taxonomies)
  • 5. Partial answers in existing vocabularies • DCAT vocabulary – RDF vocabulary for describing any dataset – Datasets can be standalone or part of a “catalog” – Datasets are accessible through several “distributions” – “Other, complementary vocabularies may be used together with DCAT to provide more detailed format-specific information. For example, properties from the VoID vocabulary can be used if that dataset is in RDF format.” • VOID vocabulary – RDF vocabulary for expressing metadata about RDF datasets • (SDMX ) DataCube vocabulary – RDF vocabulary for describing statistical datasets – Useful for attaching metadata about the “data structure” to any dataset that doesn’t follow a known published standard
  • 6. Coverage of a dataset • This can be handled by common Dublin Core properties like subject and coverage. • DCAT re-uses these DC properties. Issue 1: No specific property for the type of data covered in a dataset The values of these properties have to be understood by machines: - The value should be standardized, possibly a URI - The URI should be de-referenceable to a thing - The thing should be part of an authority list / taxonomy Issue 3: There is no authority vocabulary for types of data Issue 1 Issue 2
  • 7. Conditions for re-use • DCAT re-uses the license DC property at the level of distributions • DCAT re-uses the rights DC property at bith the level of dataset and the level of distribution dc:license > dc:LicenseDocument dc:rights > dc:RightsStatement
  • 8. W3C DCAT > DCAT AP
  • 10. Technical properties The necessary technical specifications to retrieve and parse a distribution of a dataset (format, protocol etc.) • DCAT re-uses the DC format property; Issue No property for protocol The values of these properties have to be understood by machines, possibly URIs: Issue2 No comprehensive RDF authority lists for these values (partial: DC Types; non-RDF: IANA types) Issue 1 Issue 2
  • 11. VOID VOID can help with the protocol metadata but only for RDF datasets: - Property for data dump: dataDump - Property for SPARQL endpoint: sparqlEndpoint
  • 12. “Dimensions” and their semantics DCAT does not describe the dimensions of a dataset, except for a reference to a standard if the dataset dimensions can be defined by a formalized standard (e.g. an XML schema or an RDF vocabulary or an ISO standard) dc:conformsTo > dc:Standard Statistical vocabularies can help with the description of the dimensions
  • 13. SDMX: data structure and dimensions SDMX: Statistical Data and Metadata Exchange The data structure definition is a description of all the metadata needed to understand the data set structure. This includes: • identification of the dimensions (Dimension) according to standard statistical terminology, • the key structure (KeyDescriptor), • the code-lists (CodeList) that enumerate valid values for each dimension • coded attribute (CodedAttribute), information about whether attributes are required or optional and coded or free text. Given the metadata in the data structure definition, all of the data in the data set becomes meaningful.
  • 15. DataCube: simplified SDMX in RDF Reference to a concept scheme
  • 16. DataCube: simplified SDMX in RDF “Semantic role” of the property
  • 17. DataCube: simplified SDMX in RDF “Semantic role” of
  • 18. Combining different vocabularies Name URL Owner Content type Topic(s) Language Metadata set(s) Data structure Distribution(s) […] DATASET Name Protocol Endpoint URL Media type Format Size DISTRIBUTION DCAT model Dimensions Attributes Measures Value lists DATA STRUCTURE DataCube model Catalog: the directory Vocabulary(ies) SPARQL endpoint Data dump Serialization format Number of triples RDF dataset info VOID properties If one or more known published metadata sets are used, just fill “metadata set(s)”, otherwise link to a “data structure” with custom “dimensions” IF media type has RDF or SPARQL response
  • 19. Tools for managing dataset metadata • CKAN maintained by the Open Knowledge Foundation Uses most of DCAT. Doesn’t describe dimensions. Also provides a global dataset hub called the Datahub • Dataverse created by Harvard University Uses a custom vocabulary. Doesn’t describe dimensions. • Commercial solutions • Repositories and catalogs: OpenAIRE, DataCite (using re3data to search repositories) and Dryad use their own vocabularies. • CIARD RING Uses full DCAT AP with some extended properties (protocol, data type) and local taxonomies with URIs mapped when possible to authorities. Next steps: adding DataCube properties for dimensions.
  • 20. Major outstanding issues • Some missing properties in existing vocabularies:  approach vocabulary owners OR extend vocabularies • Missing vocabularies for protocols, formats  approach standardizing bodies?  perhaps specific dataset formats? • Need for more standardized semantics for dimensions:  Joint discussions with the RDA Data Type Registries WG? • Lack of interoperability metadata in existing tools
  • 21. References • W3C DCAT: http://www.w3.org/TR/vocab-dcat/ • DCAT AP: https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat- application-profile-data-portals-europe-final • DataCube: http://purl.org/linked-data/cube# • VOID: http://rdfs.org/ns/void-guide • VIVO Datastar: http://sourceforge.net/projects/vivo/files/Datastar%20ontology/ • CERIF for datasets: https://cerif4datasets.wordpress.com/c4d-deliverables/ • CKAN: http://ckan.org/ • Datahub: http://datahub.io/ • DataCite: http://search.datacite.org/ui?q=subject%3Aagriculture • Re3data: http://www.re3data.org • Dryad: http://datadryad.org/ • OpenAIRE: https://www.openaire.eu/
  • 22. Thank you Valeria Pesce Global Forum on Agricultural Research