Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web

340 visualizaciones

Publicado el

The uptake of Linked Data (LD) has promoted the proliferation of datasets and their associated ontologies bringing their semantic to the data being published. These ontologies should be evaluated at different stages, both during their development and their publication. As important as correctly modelling the intended part of the world to be captured in an ontology, is publishing, sharing and facilitating the (re)use of the obtained model. In this paper, 11 evaluation characteristics, with respect to publish, share and facilitate the reuse, are proposed. In particular, 6 good practices and 5 pitfalls are presented, together with their associated detection methods. In addition, a grid-based rating system is generated showing the results of analysing the vocabularies gathered in LOV repository. Both contributions, the set of evaluation characteristics and the grid system, could be useful for ontologists in order to reuse existing LD vocabularies or to check the one being built.

Publicado en: Tecnología, Diseño
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web

  1. 1. Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web María Poveda-Villalón1, Bernard Vatant2, Mari Carmen SuárezFigueroa1, Asunción Gómez-Pérez1 , 1Ontology Engineering Group. Universidad Politécnica de Madrid. Spain. 2Mondeca, Paris, France. mpoveda@fi.upm.es, bernard.vatant@mondeca.com, {mcsuarez, asun}@fi.upm.es Speaker: Asunción Gómez-Pérez Contact author: María Poveda-Villalón: mpoveda@fi.upm.es Date: 10/28/13
  2. 2. Table of Contents •  Introduction •  Good practices and pitfalls for publishing vocabularies •  Results and Analysis over LOV vocabularies •  Conclusions and future work Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web 2
  3. 3. Introduction •  Different formats: RDFS, OWL, HTML •  Different configurations •  Do they ease or impede applications consuming vocabularies? Ø  Good practices & Pitfalls Vocabularies bring semantics to data “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” Along this work: •  Detailed analysis of 355 vocabularies gathered in the LOV registry (http://lov.okfn.org/) •  Why LOV: complete information about each vocabulary, namely URI, namespace and prefix •  Results: 1.  a non exhaustive list of good practices and pitfalls about publishing LD vocabularies 2.  specific methods for detecting such good practices and pitfalls 3.  some metadata about ontology quality 4.  the inclusion of pitfalls in services such as OOPS! (http://www.oeg-upm.net/oops) to help eager vocabulary managers Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web 3
  4. 4. Table of Contents •  Introduction •  Good practices and pitfalls for publishing vocabularies •  Previous work •  Proposed good practices and pitfalls •  Results and analysis over LOV vocabularies •  Conclusions and future work Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web 4
  5. 5. Previous work (I) Linked Open Data 5 Star rating system (Tim Bernes-Lee) http://www.w3.org/DesignIssues/ LinkedData.html. 2006 (last change 2009). LOD1. Available on the web (whatever format) but with an open licence, to be Open Data LOD2. Available as machine-readable structured data (e.g. excel instead of image scan of a table) LOD3. As (2) plus non-proprietary format (e.g. CSV instead of excel) LOD4. All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff LOD5. All the above plus Link your data to other people’s data to provide context Is your linked data vocabulary 5-star? (Bernard Vatant) http://bvatant.blogspot.fr/2012/02/is-yourlinked-data-vocabulary-5-star_9588.html. 2012. LDV1. Publish your vocabulary on the Web at a stable URI LDV2. Provide human-readable documentation and basic metadata such as creator, publisher, date of creation, last modification, version number LDV3. Provide labels and descriptions, if possible in several languages, to make your vocabulary usable in multiple linguistic scopes LDV4. Make your vocabulary available via its namespace URI, both as a formal file and humanreadable documentation, using content negotiation LDV5. Link to other vocabularies by re-using elements rather than re-inventing Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web 5
  6. 6. Previous work (II) Archer, P., Goedertier, S., and Loutas, N. D7.1.3 – Study on persistent URIs, with identification of best practices and recommendations on the topic for the MSs and the EC. Deliverable. December 17, 2012. Heath, T., Bizer, C.: Linked data: Evolving the Web into a global data space (1st edition). Morgan & Claypool. 2011. Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web 6
  7. 7. Proposed good practices and pitfalls Proposals Inspired by Previous work brief reminder Linked Open Data 5 Star LOD1. on the web. Open. LOD2. machine-readable LOD3. non-proprietary LOD4. open standards LOD5. Link Good practices GP1. Provide RDF description GP2. Provide HTML documentation GP3. Content negotiation for RDF GP4. Content negotiation for HTML GP5. Provide vann metadata GP6. Well-established prefix Is your linked data vocabulary 5-star? LDV1. vocabulary on the Web LDV2. human-readable and metadata LDV3. labels and descriptions LDV4. content negotiation LDV5. Link Pitfalls P36. URI contains file extension P37. Ontology not available P38. No OWL ontology declaration P39. Ambiguous namespace P40. Namespace hijacking 10 rules for persistent URIs ✔ Linked data: Evolving the Web into a global data space: “Only define new terms in a namespace that you control.” ✖ • Follow the pattern • Re-use existing identifiers • Multiple representations • Implements 303 redirects • Use a dedicated server • Avoid stating ownership • Avoid version numbers • Avoid using auto-increment • Avoid query strings • Avoid file extensions Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web 7
  8. 8. Table of Contents •  Introduction •  Good practices and pitfalls for publishing vocabularies •  Results and analysis over LOV vocabularies •  Conclusions and future work Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web 8
  9. 9. Results and analysis over LOV vocabularies (I) Good practices and pitfalls frequency 355 vocabularies registered in LOV - 19th June, 2013 GP1. Provide RDF description GP2. Provide HTML documentation GP3. Content negotiation for RDF GP4. Content negotiation for HTML GP5. Provide vann metadata GP6. Well-established prefix Pitfalls distribution Good practices distribution Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web 9 P36. URI contains file extension P37. Ontology not available P38. No OWL ontology declaration P39. Ambiguous namespace P40. Namespace hijacking
  10. 10. Results and analysis over LOV vocabularies (I) Grid with vocabularies according to the number of good practices and pitfalls observed. Available at http://goo.gl/zu9ZbW Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web 10
  11. 11. Table of Contents •  Introduction •  Good practices and pitfalls for publishing vocabularies •  Results and analysis over LOV vocabularies •  Conclusions and future work Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web 11
  12. 12. Conclusions •  6 good practices and 5 pitfalls proposed •  Based on existing works •  Implementation of the detection methods •  Grid-based rating system proposed. Useful for: •  Vocabulary registry maintainers •  Vocabulary developers and creators •  Execution over 355 vocabularies •  All good practices and pitfalls are observed •  Some of them surprisingly (e.g.: P40. Namespace hijacking) •  LOV vocabularies seem to be well maintained and likely to be high quality . Due to semi-handcrafted maintenance instead of crawlers? Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web 12
  13. 13. Future work (I) Linked Open Data 5 Star LOD1. on the web. Open. LOD2. machine-readable LOD3. non-proprietary LOD4. open standards LOD5. Link •  Take into account: •  metadata about licences •  other metadata, e.g., creators, authors, dates, languages, etc. •  linguistic information •  reused terms from other vocabularies •  Provide guidelines to solve pitfalls and to follow good practices •  Execute methods over LOV in regular basis •  Observe evaluation of the ecosystem •  Draw tends for vocabulary publication Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web Is your linked data vocabulary 5-star? LDV1. vocabulary on the Web LDV2. human-readable and metadata LDV3. labels and descriptions LDV4. content negotiation LDV5. Link 13
  14. 14. Future work (II) •  Integration with third party systems. E.g. •  LOV search •  OOPS! - OntOlogy Pitfall Scanner! (http://oeg-upm.net/oops/) ü  Done for pitfalls •  Assign importance levels for good practices and pitfalls ü  Done for pitfalls … … … … Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web 14
  15. 15. Questions? Thanks! Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web 15
  16. 16. Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web María Poveda-Villalón1, Bernard Vatant2, Mari Carmen SuárezFigueroa1, Asunción Gómez-Pérez1 , 1Ontology Engineering Group. Universidad Politécnica de Madrid. Spain. 2Mondeca, Paris, France. mpoveda@fi.upm.es, bernard.vatant@mondeca.com, {mcsuarez, asun}@fi.upm.es Speaker: Asunción Gómez-Pérez Contact author: María Poveda-Villalón: mpoveda@fi.upm.es Date: 10/28/13

×