SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
Digital Enterprise Research Institute                                                www.deri.ie




                      dcat: An RDF vocabulary for
                  interoperability of data catalogues
                                Richard Cyganiak, Fadi Maali, Vassilios Peristeras




 Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Agenda
Digital Enterprise Research Institute                          www.deri.ie




           Why catalogue interoperability is important
           A survey of data catalogues
           Introducing the dcat vocabulary
           First experiments with integrated catalogue data
           Where to take this next?
Government data catalogues
Digital Enterprise Research Institute                                   www.deri.ie




           Now more than 30 catalogues online
           National
                  U.S., UK, Australia, New Zealand
           State level
                  New South Wales, California, Massachusetts, Maine
           Regional and local
                  New York, San Francisco, London, Vancouver, Kent County
           Both official and private initiatives
Digital Enterprise Research Institute     www.deri.ie




            Catalogue websites do not
           unlock the full potential of the
                collected metadata.
Beyond catalogue websites
Digital Enterprise Research Institute                                         www.deri.ie




           Querying across catalogs
                  Overlapping regional coverage – U.S., California, SF
                  Supra-national catalogs – data.gov.eu?
           New user interfaces
                  Faceted browsing
                  Specialized UI for geographical/statistical/tabular sub-
                   sections of a catalogue
                  Social annotation
           Bulk processing of datasets
                  Search indexes that inspect dataset contents
                  Update notifications
Current state of interoperability
Digital Enterprise Research Institute                                       www.deri.ie




           Most major catalogues do expose their contents in
            a structured format!
                  CSV
                  Atom feeds
                  RDFa
           But using this data is difficult
                  Different formats for each catalogue
                  Different metadata fields in each
                  Metadata fields poorly documented
                  Contents of metadata fields are inconsistent or do not
                   match documentation
A survey of data catalogues
Digital Enterprise Research Institute                                      www.deri.ie




           In-depth review of seven catalogues
                  data.gov, data.gov.uk, data.gov.nz, data.australia.gov.uk,
                   datasf.org, data.london.gov.uk, statcentral.ie
           Looking at metadata, not into the datasets
Metadata structure
Digital Enterprise Research Institute   www.deri.ie
Consistency and availability
Digital Enterprise Research Institute   www.deri.ie
Direct download links
Digital Enterprise Research Institute                            www.deri.ie




           Download links
                  Can go straight to the data (Excel, CSV, …)
                  Or to a splash page or license page
           % of direct links
                  data.london.gov.uk: 100%
                  data.gov: 95%
                  datasf.org: 10%
                  data.gov.uk: 7%
The dcat vocabulary
Digital Enterprise Research Institute               www.deri.ie




           Intended as interoperability standard
           Vocabulary expressed in RDF Schema
           http://vocab.deri.ie/dcat#
                  Vocabulary namespace
           http://vocab.deri.ie/dcat-overview
                  Misc information
Design notes
Digital Enterprise Research Institute                          www.deri.ie




           Hepp’s Law: An integration ontology must not
            introduce distinctions that are finer than the
            distinctions made in the data to be integrated.
           Focus on the metadata fields that’s available in all/
            most catalogues
           Require no data cleansing before catalogue can be
            published in dcat
           Re-use Dublin Core, SKOS, FOAF whenever possible
Concepts
Digital Enterprise Research Institute                      www.deri.ie




           dcat:Catalog
           dcat:Dataset
           dcat:CatalogRecord
           dcat:Distribution
                  subclasses dcat:Feed, dcat:WebService
           skos:Concept, skos:ConceptScheme
           foaf:Organization
Vocabulary overview
Digital Enterprise Research Institute   www.deri.ie
Initial experiments
Digital Enterprise Research Institute                             www.deri.ie




           Set up a D2R Server over four catalogues
                  US, AU, SF, London
                  http://lab.linkeddata.deri.ie/govcat/
                  SPARQL interface:
                   http://lab.linkeddata.deri.ie/govcat/snorql/
                  Links to Geonames, DBpedia
SPARQL across datasets
Digital Enterprise Research Institute                     www.deri.ie



       SELECT ?title ?url
       WHERE {
         ?dataset a dcat:Dataset;
            dc:title ?title;
            dcat:theme :education;
            dcat:distribution ?distribution.
         ?distribution dcat:downloadURL ?url;
            dc:format ?format;
            dcat:size ?size.
         ?size dcat:bytes ?bytes.
         FILTER (?bytes<1048576 && ?format=”text/xml”).
       }
SPARQL query with external data
Digital Enterprise Research Institute       www.deri.ie




       SELECT ?title
       WHERE {
         :data.gov dcat:dataset ?dataset.
         ?dataset dc:title ?title;
             dc:publisher ?agency.
         ?agency dbpedia:budget ?budget.
         FILTER (?budget>50000000000)
       }
Benefits of the dcat standard
Digital Enterprise Research Institute                            www.deri.ie




           Embedded metadata in catalogue web pages
            increases findability
           Enables decentralised publishing
           Enables federated search
           Will enable one-click download and installation of
            data packages
           Serves as manifest file for digital preservation
           Applications can be built once and work with
            multiple catalogues
Where next?
Digital Enterprise Research Institute                       www.deri.ie




           Get feedback on the vocabulary, improve where
            necessary
           Write up a Guide to using dcat
           Explore how to use it with voiD, SDMX+RDF
           Get more catalogues to expose dcat format

           So far, everything happened in DERI, but we want to
            open this up. Where?

Más contenido relacionado

La actualidad más candente

Cni research data_oxford_horstmann_jefferies
Cni research data_oxford_horstmann_jefferiesCni research data_oxford_horstmann_jefferies
Cni research data_oxford_horstmann_jefferies
BDLSS
 
Designing and developing vocabularies in RDF
Designing and developing vocabularies in RDFDesigning and developing vocabularies in RDF
Designing and developing vocabularies in RDF
Open Data Support
 
Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817
Figoblog
 

La actualidad más candente (20)

Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to Metadata
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital libraries
 
ELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - ExamplarsELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - Examplars
 
The State of Linked Government Data
The State of Linked Government DataThe State of Linked Government Data
The State of Linked Government Data
 
An introduction to Linked (Open) Data
An introduction to Linked (Open) DataAn introduction to Linked (Open) Data
An introduction to Linked (Open) Data
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 
Meadows apr28-1
Meadows apr28-1Meadows apr28-1
Meadows apr28-1
 
The Future of LOD
The Future of LODThe Future of LOD
The Future of LOD
 
Cni research data_oxford_horstmann_jefferies
Cni research data_oxford_horstmann_jefferiesCni research data_oxford_horstmann_jefferies
Cni research data_oxford_horstmann_jefferies
 
Designing and developing vocabularies in RDF
Designing and developing vocabularies in RDFDesigning and developing vocabularies in RDF
Designing and developing vocabularies in RDF
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
 
Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
 
Reinventing Laboratory Data To Be Bigger, Smarter & Faster
Reinventing Laboratory Data To Be Bigger, Smarter & FasterReinventing Laboratory Data To Be Bigger, Smarter & Faster
Reinventing Laboratory Data To Be Bigger, Smarter & Faster
 
SWSIG wlic2016
SWSIG wlic2016SWSIG wlic2016
SWSIG wlic2016
 

Similar a dcat: An RDF vocabulary for interoperability of data catalogues

Dcat - Machine Accessible Data Catalogues
Dcat - Machine Accessible Data CataloguesDcat - Machine Accessible Data Catalogues
Dcat - Machine Accessible Data Catalogues
Fadi Maali
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
Andre Freitas
 

Similar a dcat: An RDF vocabulary for interoperability of data catalogues (20)

Lgd 2
Lgd 2Lgd 2
Lgd 2
 
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challenges
 
Dcat - Machine Accessible Data Catalogues
Dcat - Machine Accessible Data CataloguesDcat - Machine Accessible Data Catalogues
Dcat - Machine Accessible Data Catalogues
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data Integration
 
A distributional structured semantic space for querying rdf graph data
A distributional structured semantic space for querying rdf graph dataA distributional structured semantic space for querying rdf graph data
A distributional structured semantic space for querying rdf graph data
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...
 
Resilient Linked Data
Resilient Linked DataResilient Linked Data
Resilient Linked Data
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Linked Data In Action
Linked Data In ActionLinked Data In Action
Linked Data In Action
 
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationSigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
 
Cornell 2011 05-13
Cornell 2011 05-13Cornell 2011 05-13
Cornell 2011 05-13
 
Ciard Initiative and a Global Infrastructure for Linked Open Data
Ciard Initiative and a Global Infrastructure for Linked Open Data Ciard Initiative and a Global Infrastructure for Linked Open Data
Ciard Initiative and a Global Infrastructure for Linked Open Data
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Gbrds Tech Issues Op
Gbrds Tech Issues OpGbrds Tech Issues Op
Gbrds Tech Issues Op
 
Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Ontotext Overview Winter 2012
Ontotext Overview Winter 2012
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
Exploring Linked Data
Exploring Linked DataExploring Linked Data
Exploring Linked Data
 

Más de Richard Cyganiak

EDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsEDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five Stars
Richard Cyganiak
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdf
Richard Cyganiak
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and Gridworks
Richard Cyganiak
 

Más de Richard Cyganiak (7)

SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudSHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data Mud
 
What's New in RDF 1.1?
What's New in RDF 1.1?What's New in RDF 1.1?
What's New in RDF 1.1?
 
EDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsEDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five Stars
 
Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)
 
Investigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations OntologyInvestigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations Ontology
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdf
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and Gridworks
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

dcat: An RDF vocabulary for interoperability of data catalogues

  • 1. Digital Enterprise Research Institute www.deri.ie dcat: An RDF vocabulary for interoperability of data catalogues Richard Cyganiak, Fadi Maali, Vassilios Peristeras  Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
  • 2. Agenda Digital Enterprise Research Institute www.deri.ie   Why catalogue interoperability is important   A survey of data catalogues   Introducing the dcat vocabulary   First experiments with integrated catalogue data   Where to take this next?
  • 3. Government data catalogues Digital Enterprise Research Institute www.deri.ie   Now more than 30 catalogues online   National   U.S., UK, Australia, New Zealand   State level   New South Wales, California, Massachusetts, Maine   Regional and local   New York, San Francisco, London, Vancouver, Kent County   Both official and private initiatives
  • 4. Digital Enterprise Research Institute www.deri.ie Catalogue websites do not unlock the full potential of the collected metadata.
  • 5. Beyond catalogue websites Digital Enterprise Research Institute www.deri.ie   Querying across catalogs   Overlapping regional coverage – U.S., California, SF   Supra-national catalogs – data.gov.eu?   New user interfaces   Faceted browsing   Specialized UI for geographical/statistical/tabular sub- sections of a catalogue   Social annotation   Bulk processing of datasets   Search indexes that inspect dataset contents   Update notifications
  • 6. Current state of interoperability Digital Enterprise Research Institute www.deri.ie   Most major catalogues do expose their contents in a structured format!   CSV   Atom feeds   RDFa   But using this data is difficult   Different formats for each catalogue   Different metadata fields in each   Metadata fields poorly documented   Contents of metadata fields are inconsistent or do not match documentation
  • 7. A survey of data catalogues Digital Enterprise Research Institute www.deri.ie   In-depth review of seven catalogues   data.gov, data.gov.uk, data.gov.nz, data.australia.gov.uk, datasf.org, data.london.gov.uk, statcentral.ie   Looking at metadata, not into the datasets
  • 8. Metadata structure Digital Enterprise Research Institute www.deri.ie
  • 9. Consistency and availability Digital Enterprise Research Institute www.deri.ie
  • 10. Direct download links Digital Enterprise Research Institute www.deri.ie   Download links   Can go straight to the data (Excel, CSV, …)   Or to a splash page or license page   % of direct links   data.london.gov.uk: 100%   data.gov: 95%   datasf.org: 10%   data.gov.uk: 7%
  • 11. The dcat vocabulary Digital Enterprise Research Institute www.deri.ie   Intended as interoperability standard   Vocabulary expressed in RDF Schema   http://vocab.deri.ie/dcat#   Vocabulary namespace   http://vocab.deri.ie/dcat-overview   Misc information
  • 12. Design notes Digital Enterprise Research Institute www.deri.ie   Hepp’s Law: An integration ontology must not introduce distinctions that are finer than the distinctions made in the data to be integrated.   Focus on the metadata fields that’s available in all/ most catalogues   Require no data cleansing before catalogue can be published in dcat   Re-use Dublin Core, SKOS, FOAF whenever possible
  • 13. Concepts Digital Enterprise Research Institute www.deri.ie   dcat:Catalog   dcat:Dataset   dcat:CatalogRecord   dcat:Distribution   subclasses dcat:Feed, dcat:WebService   skos:Concept, skos:ConceptScheme   foaf:Organization
  • 14. Vocabulary overview Digital Enterprise Research Institute www.deri.ie
  • 15. Initial experiments Digital Enterprise Research Institute www.deri.ie   Set up a D2R Server over four catalogues   US, AU, SF, London   http://lab.linkeddata.deri.ie/govcat/   SPARQL interface: http://lab.linkeddata.deri.ie/govcat/snorql/   Links to Geonames, DBpedia
  • 16. SPARQL across datasets Digital Enterprise Research Institute www.deri.ie SELECT ?title ?url WHERE { ?dataset a dcat:Dataset; dc:title ?title; dcat:theme :education; dcat:distribution ?distribution. ?distribution dcat:downloadURL ?url; dc:format ?format; dcat:size ?size. ?size dcat:bytes ?bytes. FILTER (?bytes<1048576 && ?format=”text/xml”). }
  • 17. SPARQL query with external data Digital Enterprise Research Institute www.deri.ie SELECT ?title WHERE { :data.gov dcat:dataset ?dataset. ?dataset dc:title ?title; dc:publisher ?agency. ?agency dbpedia:budget ?budget. FILTER (?budget>50000000000) }
  • 18. Benefits of the dcat standard Digital Enterprise Research Institute www.deri.ie   Embedded metadata in catalogue web pages increases findability   Enables decentralised publishing   Enables federated search   Will enable one-click download and installation of data packages   Serves as manifest file for digital preservation   Applications can be built once and work with multiple catalogues
  • 19. Where next? Digital Enterprise Research Institute www.deri.ie   Get feedback on the vocabulary, improve where necessary   Write up a Guide to using dcat   Explore how to use it with voiD, SDMX+RDF   Get more catalogues to expose dcat format   So far, everything happened in DERI, but we want to open this up. Where?