Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Is the LOD cloud at risk of becoming a museum
for datasets? Looking ahead towards a fully
collaborative and sustainable LO...
www.adaptcentre.ie
1
The Current LOD Cloud
• 1,239 depicted datasets (increase
of 5 from Jan’ 19)
• Depicts datasets that ...
www.adaptcentre.ie
2
... but how much can we access?
www.adaptcentre.ie
3
... but how much can we access?
As of May 8th only 388 datasets accessible -
66 datasets less than in...
www.adaptcentre.ie
4
The Open Data definition
Open Data should be freely used, modified, and shared by
anyone for any purp...
www.adaptcentre.ie
5
... this is what the LOD Cloud should look like
www.adaptcentre.ie
6
... this is what the LOD Cloud should look like
As of May 8th only 266 datasets
follow the open data ...
www.adaptcentre.ie
7
Agenda
METADATA ANALYSIS SUSTAINABLE
STRATEGIES
www.adaptcentre.ie
8
Agenda
METADATA ANALYSIS
www.adaptcentre.ie
9
Metadata Analysis
Analysis:
• Identification of licenses for datasets in metadata
• Identification of...
www.adaptcentre.ie
10
Metadata Analysis – Gathering the Data
• LOD cloud provides a JSON file with all datasets: https://l...
www.adaptcentre.ie
11
Metadata Analysis - Licenses
• JSON key: license
• Conformant Licenses: https://opendefinition.org/l...
www.adaptcentre.ie
12
Metadata Analysis - Licenses
www.adaptcentre.ie
13
Metadata Analysis – Media Types for Datasets
• JSON key: media_type – for each dataset
distribution ...
www.adaptcentre.ie
14
Metadata Analysis - Accessibility
• Potential access point: data dump, SPARQL endpoint, voID
descrip...
www.adaptcentre.ie
15
Metadata Analysis - Accessibility
Datadump: 14.5%
SPARQL: 5.6%
voID: 2.5%
More than 1
Discoverabilit...
www.adaptcentre.ie
16
Agenda
SUSTAINABLE STRATEGIES
www.adaptcentre.ie
17
The Sustainable LOD Cloud
C1. Publishers should own and maintain the datasets’ metadata
C2. Lack of ...
www.adaptcentre.ie
18
The Sustainable LOD Cloud
C1. Publishers should own and maintain the datasets’ metadata
• Adding dat...
www.adaptcentre.ie
19
The Sustainable LOD Cloud
C2. Lack of systematic and fine-granular metadata structures
• No systemat...
www.adaptcentre.ie
20
The Sustainable LOD Cloud
C3. Invalid metadata descriptions
• Incorrect value for properties
• e.g D...
www.adaptcentre.ie
21
The Sustainable LOD Cloud
C4. Many dead and outdated datasets listed
• Datasets not online (e.g. 270...
www.adaptcentre.ie
22
The Sustainable LOD Cloud
C5. Lack of involvement of data consumers in the structure
• Difficult to ...
www.adaptcentre.ie
23
Sustainable Strategies - Pillars
Stakeholders
Processes
LOD Cloud
Technologies
Publishers, Maintaine...
www.adaptcentre.ie
24
Sustainable Strategies
Service Operating Model (C1, C5)
• Federated model
• Culture of interaction b...
www.adaptcentre.ie
25
Sustainable Strategies
Identification of critical data elements (C2)
• Who? What? Where? How?
• One ...
www.adaptcentre.ie
26
Sustainable Strategies
Defining the key activities and control structures (C3, C4)
• Validation and ...
www.adaptcentre.ie
27
Capabilities for a sustainable architecture
DISCOVERY UNDERSTANDABILITY SOCIAL
www.adaptcentre.ie
28
Proposed architecture
Linked Data Platform
Metadata
Validator
FiltersDereferencer
Semantic Layer / V...
www.adaptcentre.ie
29
Is the LOD cloud at risk of becoming a museum for
datasets?
@jerdeb
jeremy.debattista@adaptcentre.ie...
Próxima SlideShare
Cargando en…5
×

Is the LOD cloud at risk of becoming a museum for datasets? Looking ahead towards a fully collaborative and sustainable LOD cloud

212 visualizaciones

Publicado el

In this talk, I have presented and discussed the experiences with the Linked Open Data Cloud (http://lod-cloud.net/), exploring and analysing whether the visualisation is a truthful representation of open linked datasets on the web. I discussed challenges and approaches of how we can make the LOD cloud more sustainable.

This was presented at the Linked Data on the Web workshop (http://events.linkeddata.org/ldow-lddl/) co-located with The Web Conference 2019 in San Francisco (https://www2019.thewebconf.org/)

Citation:
Jeremy Debattista, Judie Attard, Rob Brennan, and Declan O'Sullivan. 2019. Is the LOD cloud at risk of becoming a museum for datasets? Looking ahead towards a fully collaborative and sustainable LOD cloud. In Companion Proceedings of The 2019 World Wide Web Conference (WWW '19), Ling Liu and Ryen White (Eds.). ACM, New York, NY, USA, 850-858. DOI: https://doi.org/10.1145/3308560.3317075

Publicado en: Internet
  • Inicia sesión para ver los comentarios

Is the LOD cloud at risk of becoming a museum for datasets? Looking ahead towards a fully collaborative and sustainable LOD cloud

  1. 1. Is the LOD cloud at risk of becoming a museum for datasets? Looking ahead towards a fully collaborative and sustainable LOD cloud Jeremy Debattista, Judie Attard, Rob Brennan, Declan O’Sullivan ADAPT Centre, Trinity College Dublin, Ireland This research has received funding from the Irish Research Council Government of Ireland Postdoctoral Fellowship award (GOIPD/2017/1204) and the ADAPT Centre for Digital Content Technology, funded under the SFI Research Centres Programme (Grant 13/RC/2106) and co-funded by theEuropeanRegionalDevelopmentFund.
  2. 2. www.adaptcentre.ie 1 The Current LOD Cloud • 1,239 depicted datasets (increase of 5 from Jan’ 19) • Depicts datasets that have been published in Linked Data • Clustered catalog of individual domain specific KGs demonstrating cohesion between interlinks and intralinks • An image with embedded metadata 29/3/2019 - CC-BY http://lod-cloud.net
  3. 3. www.adaptcentre.ie 2 ... but how much can we access?
  4. 4. www.adaptcentre.ie 3 ... but how much can we access? As of May 8th only 388 datasets accessible - 66 datasets less than in Jan’19! (what was reported in paper)
  5. 5. www.adaptcentre.ie 4 The Open Data definition Open Data should be freely used, modified, and shared by anyone for any purpose http://opendefinition.org/ Open Data Defined Open License Easily Accessible Machine Readable Open Format
  6. 6. www.adaptcentre.ie 5 ... this is what the LOD Cloud should look like
  7. 7. www.adaptcentre.ie 6 ... this is what the LOD Cloud should look like As of May 8th only 266 datasets follow the open data definition
  8. 8. www.adaptcentre.ie 7 Agenda METADATA ANALYSIS SUSTAINABLE STRATEGIES
  9. 9. www.adaptcentre.ie 8 Agenda METADATA ANALYSIS
  10. 10. www.adaptcentre.ie 9 Metadata Analysis Analysis: • Identification of licenses for datasets in metadata • Identification of the format/media types of available datasets • Identification of dataset access points Purpose: • Discoverability and Openness of datasets in the LOD cloud Not relevant for analysis: • Size/Number of Triples in a dataset • Number of external interlinks NOTE: At this stage its only metadata analysis
  11. 11. www.adaptcentre.ie 10 Metadata Analysis – Gathering the Data • LOD cloud provides a JSON file with all datasets: https://lod- cloud.net/lod-data.json • Discrepancy between the JSON metadata and the voID metadata generated/provided by the LOD cloud • Jupyter notebook available: https://github.com/jerdeb/lodexperiments
  12. 12. www.adaptcentre.ie 11 Metadata Analysis - Licenses • JSON key: license • Conformant Licenses: https://opendefinition.org/licenses/ Results: • Number of dataset with a defined license: 619 datasets (~ 45%, an increase of 5% from the observation done in 2015) • Number of datasets with a conformant license: 530 datasets • Regex: license or copyright and one of under, grant or right • 22 matches: 10 datasets with conformant licenses; 4 bad matches • 3 datasets with conflicting license between the description and the license field
  13. 13. www.adaptcentre.ie 12 Metadata Analysis - Licenses
  14. 14. www.adaptcentre.ie 13 Metadata Analysis – Media Types for Datasets • JSON key: media_type – for each dataset distribution (download) • Ideally using a registered Linked Data media type. • text/html the most frequently used, but no RDFa embedded • A large number of unregistered media types • 596 distributions using meta/void and meta/rdf-schema but these are not registered
  15. 15. www.adaptcentre.ie 14 Metadata Analysis - Accessibility • Potential access point: data dump, SPARQL endpoint, voID description • Set criteria for different access points: • 10 second timeout (for all) • Data dumps – distribution tagged with an set of pre-defined media types • SPARQL – return result for ASK { ?s ?p ?o } • voID – return true for ASK { ?s a void:Dataset } after loading metadata in memory
  16. 16. www.adaptcentre.ie 15 Metadata Analysis - Accessibility Datadump: 14.5% SPARQL: 5.6% voID: 2.5% More than 1 Discoverability Entry: 10.5% None: 66.8% • Jan’19: 33% of datasets have a discoverable data access point (454 datasets). 9% less than the observation of 2015 • Majority of dataset have a data dump distribution (199 datasets) • May’19: only 388 datasets (28%) have an access point, with 226 having a data dump and 65 datasets with more than one discoverability entry point Jan’19 Distribution
  17. 17. www.adaptcentre.ie 16 Agenda SUSTAINABLE STRATEGIES
  18. 18. www.adaptcentre.ie 17 The Sustainable LOD Cloud C1. Publishers should own and maintain the datasets’ metadata C2. Lack of systematic and fine-granular metadata structures C3. Invalid metadata descriptions C4. Many dead and outdated datasets listed C5. Lack of involvement of data consumers in the structure
  19. 19. www.adaptcentre.ie 18 The Sustainable LOD Cloud C1. Publishers should own and maintain the datasets’ metadata • Adding dataset = filling google sheet form • Dataset updated != metadata in LOD cloud updated
  20. 20. www.adaptcentre.ie 19 The Sustainable LOD Cloud C2. Lack of systematic and fine-granular metadata structures • No systematic structure in terms of properties, the property’s values, and categorical values • Attempts to leverage on DCAT and voID standards
  21. 21. www.adaptcentre.ie 20 The Sustainable LOD Cloud C3. Invalid metadata descriptions • Incorrect value for properties • e.g DBpedia void:dataDump predicate incorrectly links to the DBpedia download page • License predicate points to a human-readable page with no semantic description • Incorrect media type values
  22. 22. www.adaptcentre.ie 21 The Sustainable LOD Cloud C4. Many dead and outdated datasets listed • Datasets not online (e.g. 270a.info datasets), yet still depicted • Using LOD Laundromat as a preservation/archive tool
  23. 23. www.adaptcentre.ie 22 The Sustainable LOD Cloud C5. Lack of involvement of data consumers in the structure • Difficult to find the relevant dataset • Parse JSON file or previously use datahub.io
  24. 24. www.adaptcentre.ie 23 Sustainable Strategies - Pillars Stakeholders Processes LOD Cloud Technologies Publishers, Maintainers, Consumers Publishing and Consumption Automation and infrastructure
  25. 25. www.adaptcentre.ie 24 Sustainable Strategies Service Operating Model (C1, C5) • Federated model • Culture of interaction between stakeholders • Publishers: provide and maintain metadata, high quality datasets, uptime of endpoints • Maintainers: ensure availability of services related to generation, cataloguing, and maintenance of the cloud, and availability of the cloud itself • Consumers: comment and vote for different data sources
  26. 26. www.adaptcentre.ie 25 Sustainable Strategies Identification of critical data elements (C2) • Who? What? Where? How? • One standard metadata model • Glossary for values
  27. 27. www.adaptcentre.ie 26 Sustainable Strategies Defining the key activities and control structures (C3, C4) • Validation and correctness of candidate dataset’s metadata • Heartbeat checks of the availability of dataset distributions • Prevention of abuse and spamming
  28. 28. www.adaptcentre.ie 27 Capabilities for a sustainable architecture DISCOVERY UNDERSTANDABILITY SOCIAL
  29. 29. www.adaptcentre.ie 28 Proposed architecture Linked Data Platform Metadata Validator FiltersDereferencer Semantic Layer / Vocabularies and Triple Store Cloud Visualisation Dataset Search ... Consumer Visualisation Features Publisher Heartbeat 3rd Party Connector Machine/Human Agents Third Party Service Providers Publishers Send LD notification Heartbeat Check Query/Crawl LOD Cloud Metadata request and Dataset retrieval LOD Cloud Service Data Consumers Data Posts
  30. 30. www.adaptcentre.ie 29 Is the LOD cloud at risk of becoming a museum for datasets? @jerdeb jeremy.debattista@adaptcentre.ie We need to strategically restructure the LOD cloud as a sustainable service with sound governance, rather than as an academic or research artefact.

×