SlideShare una empresa de Scribd logo
1 de 31
THE FUTURE OF LINKED
OPEN DATA
Ghislain Atemezing, PhD
Director R&D - MONDECA
@gatemezing
1ESSnet Linked Open Statistics - Sofia, Bulgaria - 28th May 2019
AGENDA
❖ Current status of LOD
❖ Challenges
➢ LOD is NOT (only) about Technology
❖ Signs of Hope
❖ Towards sustainable LOD ecosystem - FAIRS
(FAIR + Sustainable)
2
RDF: Simple or hard to use?
“RDF is hard to sell”
“RDF is heavy” - Eoin MacCuirc
“RDF is simple enough that you can
build a complex system”
“It’s difficult to standardize vocabularies
because of many ego”
"The Semantic Web is . . . an extension
of the current one, in which
information is given well-defined
meaning." "Meaning is expressed by
RDF."
Is RDF hard to use? Why?
3
Google Trends - RDF vs LOD - Last five years
LOD is popular than RDF
LOD & RDF search decreasing since 2004
4
LOD Evolution in the last decade
2008 2014 2019
34 datasets
1,239 datasets
16,147 links
570 datasets
2,909 links
In the last five years,
- 2X datasets available
- 5X links in the LOD 5
LOD Stats by 2024 - Predictions
LOD will contain at least:
- 2,688 datasets
- 88,808 links
Is this realistic or not?
6
● Society
● Organisations
● Publishers
● Consumers
7
LOD benefits are well-known
What’s up LOD? - State of the LOD Cloud in 2019
- How many datasets by domain are
available ?
- How many vocabularies by datasets?
- What are the most used predicates for
interlink by category?
- Number of linked datasets?
- How many datasets are using cube
vocab?
- How many broken links? 8
In 2019, you can’t simply
have an answer by looking
at the LOD Cloud.
Challenges with
LOD publication
Photo by Dylan Siebelink on Unsplash 9
Publishers, do they ever know who is consuming
their datasets ?
Not always .. why?
Are we building towers of
knowledge?
How to know who are consuming
our dataset?
What are the incentives for the
publishers? 10
Are we (really) data driven ORGs?
Many use cases of semantic technologies in
industry
Why and how people are still sceptical with RDF
?
The problem is maybe NOT about the technology
ORGs should show the path through massive
data generation on the Web
11
(Some) Challenges to create LOD
Shared vocabulary management
Ontology creation: No clear methodology / Lack of
internal expertise
Mappings to ontology are not trivial
Links to external datasets (which ones? Default:
DBpedia ?)
Pan-national interpretation and comparison is
particularly challenging.
12
More Challenges
Maintenance of tools : we can’t trust
tools built by PhDs / interns
Versioning of datasets in LOD
Annual review of datasets (Who?)
General commitment/ Find a real
business value
13
Organizational challenges: where is the CDO?
Lack of data gouvernance in our ORGs
Minimal data sharing within ORG
No existing practice for documenting
knowledge
Lack of visions on harmonizing different
“data lakes”
14
Challenges - Metadata / Versioning
Frequent releases of datasets in LOD
Manage versions and track Diff of datasets
Proper use of metadata to track changes / check
data consistency
Data Quality and Provenance attached to datasets
Licensing issues (How to properly cite and reuse
datasets )
15
Signs of Hope
Many semantic technology advances in
reducing the barriers of querying billion triples
even in a normal laptop
Photo by Ron Smith on Unsplash 16
Democratizing Access to LOD
“Fernández, J. D., Beek, W., Martínez-Prieto, M.A., and Arias, M. LOD-a-lot: A Queryable Dump of the
LOD cloud (2017). http://purl.org/HDT/lod-a-lot.”
28 billion unique triples from 650K
datasets - All LOD in a medium size
laptop:
524 GB of disk space; 15.7 GB of RAM
17
Google Data Search or Schema.org In Action
Google data service launched in 2008
Based on schema.org ( cf.
https://toolbox.google.com/datasetsearch/search?query=
Site%3Adata.gouv.fr )
Uses DCAT and other structured metadata to discover
open datasets
One DCAT file per Dataset / Googlebot is not smart
enough
Link: https://toolbox.google.com/datasetsearch 18
Semantic technology guarantees
FAIR principles of data published
on the Web
19
Beyond FAIR Principles
Findable: unique ids that are resolvable,
Accessible: common access method,
Interoperable: shared vocabularies &
taxonomies,
Reusable: provenance, license
20
FAIR + Sustainable => FAIRS
56M items, 700M statements, 400 lang, 20K active contrib
p/month, 900M edits, 8.5M daily SPARQL queries
Healthy community that helps write sparql queries.
Showing that technology is mature
60M links to DBpedia, 7.7 Billion triples
Many applications in Chabot
(Apple Siri, research, scientists, etc)
SPARQL is affordable and usable
Wikidata - A community for Wikibase
21
The future starts today: data is
infinite, the Web is here to
stay, semantic technologies
are mature.
22
Towards a Sustainable LOD Ecosystem
Work on having a board of committed people from
different expertises (Tech, academics, industry,
government, etc)
Gather and Promote LOD tools, applications based
on past experience
Learned from errors of the past
Create a real community of publishers and
consumers out of W3C
Liaise with W3C to create a community group?
23
LOD for Social Good: Killer Apps ?
Develop and use LOD for solving
societal issues
Apps achieve any of the 17
Sustainable Development Goals
(SDGs) included in the 2030 Agenda
for Sustainable Development.
LOD to enhance advances on
Misinformation issues on the Web
24
Solutions for future LOD -
Create a forum with different stakeholders to discuss
LOD issues and maintenance
Create a new way to manage and maintain datasets in
LOD (W3C community, mix of community? Foundation
ala Apache ?
New enforcement rules for LOD management life-
cycle
More use cases of datasets with probabilistic and
temporal models
25
Graph Of Linked “Insights” Datasets ?
Statistical models also find “insights”
over datasets
Data scientists spent hours
understanding underlying data to
generate reports, dashboards or
applications
How to model that knowledge and
publish on the Web?
“Current insights after data
analysis gets stored in a
spreadsheet and then gets lost.
We want to create a graph of
insights, link them and generate
new insights” - Lambert
Hogenhout, UN #kgc2019
https://twitter.com/juansequeda/status/1126144558
683885569
26
Takeaway message
The maturity of semantic technologies is fully
demonstrated in many real world applications
The Web is a precious mean to exchange
information both by humans and machines
Versioning and datasets updates are still
challenging.
For exchanging knowledge, LOD is probably the
(only) solution - The only way to be able to make
“AI intelligent”
New applications combining AI (autonomous
agents, chatbots, etc)
27
The more you publish datasets as
LOD, the more you are preparing
the next generation of
"prescriptive" autonomous agents.
Classical AI (predictive) alone
(neural networks, machine
learning) can’t make this happen
28
“The future [of the Web] is
still so much bigger than
the past.” - Tim BL (2018)
So will be the future of
Linked Open Data…
Just publish and share your
Assets on the Web
https://inrupt.com/blog/one-small-step-for-the-web
29
Thank you for
your attention!!
Questions?
30
THE FUTURE OF LINKED
OPEN DATA
Ghislain Atemezing, PhD
Director R&D - MONDECA
@gatemezing
31ESSnet Linked Open Statistics - Sofia, Bulgaria - 28th May 2019

Más contenido relacionado

La actualidad más candente

Research Data Alliance Member Statistics July 2015
Research Data Alliance Member Statistics July 2015Research Data Alliance Member Statistics July 2015
Research Data Alliance Member Statistics July 2015
Research Data Alliance
 

La actualidad más candente (20)

Pragmatic Approaches to the Semantic Web
Pragmatic Approaches to the Semantic WebPragmatic Approaches to the Semantic Web
Pragmatic Approaches to the Semantic Web
 
The importance of FAIR and the Community of Data Driven Insights - the road t...
The importance of FAIR and the Community of Data Driven Insights - the road t...The importance of FAIR and the Community of Data Driven Insights - the road t...
The importance of FAIR and the Community of Data Driven Insights - the road t...
 
Research Data Alliance: Current Activities and Expected Impact
Research Data Alliance: Current Activities and Expected ImpactResearch Data Alliance: Current Activities and Expected Impact
Research Data Alliance: Current Activities and Expected Impact
 
Cooking up the Semantic Web
Cooking up the Semantic WebCooking up the Semantic Web
Cooking up the Semantic Web
 
UKSG 2018 Lightning Talk - Annotations as research objects: findable, indexab...
UKSG 2018 Lightning Talk - Annotations as research objects: findable, indexab...UKSG 2018 Lightning Talk - Annotations as research objects: findable, indexab...
UKSG 2018 Lightning Talk - Annotations as research objects: findable, indexab...
 
Research Data Alliance Member Statistics July 2015
Research Data Alliance Member Statistics July 2015Research Data Alliance Member Statistics July 2015
Research Data Alliance Member Statistics July 2015
 
Research Data Alliance Member Statistics June 2015
Research Data Alliance Member Statistics June 2015Research Data Alliance Member Statistics June 2015
Research Data Alliance Member Statistics June 2015
 
Jarrar: Linked Data
Jarrar: Linked DataJarrar: Linked Data
Jarrar: Linked Data
 
Isf vivo2013
Isf vivo2013Isf vivo2013
Isf vivo2013
 
An introduction to Linked Open Data
An introduction to Linked Open DataAn introduction to Linked Open Data
An introduction to Linked Open Data
 
Research Data Alliance Member Statistics August 2015
Research Data Alliance Member Statistics August 2015Research Data Alliance Member Statistics August 2015
Research Data Alliance Member Statistics August 2015
 
Rdaeu russia_fg_1_july2014_final
Rdaeu  russia_fg_1_july2014_finalRdaeu  russia_fg_1_july2014_final
Rdaeu russia_fg_1_july2014_final
 
Linked data migrational framework
Linked data migrational frameworkLinked data migrational framework
Linked data migrational framework
 
Research Data Alliance Member Statistics September 2015
Research Data Alliance Member Statistics September 2015Research Data Alliance Member Statistics September 2015
Research Data Alliance Member Statistics September 2015
 
Research Data Alliance Member Statistics October 2015
Research Data Alliance Member Statistics October 2015Research Data Alliance Member Statistics October 2015
Research Data Alliance Member Statistics October 2015
 
FAIR Data Experiences - Kees van Bochove - The Hyve
FAIR Data Experiences - Kees van Bochove - The HyveFAIR Data Experiences - Kees van Bochove - The Hyve
FAIR Data Experiences - Kees van Bochove - The Hyve
 
The technical case for a semantic web
The technical case for a semantic webThe technical case for a semantic web
The technical case for a semantic web
 
Introducción a Linked Open Data (espacios enlazados y enlazables)
Introducción a Linked Open Data (espacios enlazados y enlazables)Introducción a Linked Open Data (espacios enlazados y enlazables)
Introducción a Linked Open Data (espacios enlazados y enlazables)
 
Getting Comfortable with Metadata Reuse
Getting Comfortable with Metadata ReuseGetting Comfortable with Metadata Reuse
Getting Comfortable with Metadata Reuse
 
Dive deep into your Data Pools
Dive deep into your Data PoolsDive deep into your Data Pools
Dive deep into your Data Pools
 

Similar a The Future of LOD

First they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedFirst they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and Used
Rensselaer Polytechnic Institute
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Barry Smith
 

Similar a The Future of LOD (20)

Putting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataPutting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open Data
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
First they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and UsedFirst they have to find it: Getting Open Government Data Discovered and Used
First they have to find it: Getting Open Government Data Discovered and Used
 
Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open Data
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
 
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, GlasgowNotes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
 
Semantic web
Semantic webSemantic web
Semantic web
 
Data centric business and knowledge graph trends
Data centric business and knowledge graph trendsData centric business and knowledge graph trends
Data centric business and knowledge graph trends
 
Keynote: Mark Parsons - Plans are Useless, But Planning is Essential
Keynote: Mark Parsons - Plans are Useless, But Planning is EssentialKeynote: Mark Parsons - Plans are Useless, But Planning is Essential
Keynote: Mark Parsons - Plans are Useless, But Planning is Essential
 
"Plans are worthless, but planning is essential"
"Plans are worthless, but planning is essential""Plans are worthless, but planning is essential"
"Plans are worthless, but planning is essential"
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016
 
FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdf
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
 
Linking Open Data
Linking Open DataLinking Open Data
Linking Open Data
 
DCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdfDCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdf
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the Software
 
Bridging the gap between the semantic web and big data: answering SPARQL que...
Bridging the gap between the semantic web and big data:  answering SPARQL que...Bridging the gap between the semantic web and big data:  answering SPARQL que...
Bridging the gap between the semantic web and big data: answering SPARQL que...
 

Último

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 

Último (20)

Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 

The Future of LOD

  • 1. THE FUTURE OF LINKED OPEN DATA Ghislain Atemezing, PhD Director R&D - MONDECA @gatemezing 1ESSnet Linked Open Statistics - Sofia, Bulgaria - 28th May 2019
  • 2. AGENDA ❖ Current status of LOD ❖ Challenges ➢ LOD is NOT (only) about Technology ❖ Signs of Hope ❖ Towards sustainable LOD ecosystem - FAIRS (FAIR + Sustainable) 2
  • 3. RDF: Simple or hard to use? “RDF is hard to sell” “RDF is heavy” - Eoin MacCuirc “RDF is simple enough that you can build a complex system” “It’s difficult to standardize vocabularies because of many ego” "The Semantic Web is . . . an extension of the current one, in which information is given well-defined meaning." "Meaning is expressed by RDF." Is RDF hard to use? Why? 3
  • 4. Google Trends - RDF vs LOD - Last five years LOD is popular than RDF LOD & RDF search decreasing since 2004 4
  • 5. LOD Evolution in the last decade 2008 2014 2019 34 datasets 1,239 datasets 16,147 links 570 datasets 2,909 links In the last five years, - 2X datasets available - 5X links in the LOD 5
  • 6. LOD Stats by 2024 - Predictions LOD will contain at least: - 2,688 datasets - 88,808 links Is this realistic or not? 6
  • 7. ● Society ● Organisations ● Publishers ● Consumers 7 LOD benefits are well-known
  • 8. What’s up LOD? - State of the LOD Cloud in 2019 - How many datasets by domain are available ? - How many vocabularies by datasets? - What are the most used predicates for interlink by category? - Number of linked datasets? - How many datasets are using cube vocab? - How many broken links? 8 In 2019, you can’t simply have an answer by looking at the LOD Cloud.
  • 9. Challenges with LOD publication Photo by Dylan Siebelink on Unsplash 9
  • 10. Publishers, do they ever know who is consuming their datasets ? Not always .. why? Are we building towers of knowledge? How to know who are consuming our dataset? What are the incentives for the publishers? 10
  • 11. Are we (really) data driven ORGs? Many use cases of semantic technologies in industry Why and how people are still sceptical with RDF ? The problem is maybe NOT about the technology ORGs should show the path through massive data generation on the Web 11
  • 12. (Some) Challenges to create LOD Shared vocabulary management Ontology creation: No clear methodology / Lack of internal expertise Mappings to ontology are not trivial Links to external datasets (which ones? Default: DBpedia ?) Pan-national interpretation and comparison is particularly challenging. 12
  • 13. More Challenges Maintenance of tools : we can’t trust tools built by PhDs / interns Versioning of datasets in LOD Annual review of datasets (Who?) General commitment/ Find a real business value 13
  • 14. Organizational challenges: where is the CDO? Lack of data gouvernance in our ORGs Minimal data sharing within ORG No existing practice for documenting knowledge Lack of visions on harmonizing different “data lakes” 14
  • 15. Challenges - Metadata / Versioning Frequent releases of datasets in LOD Manage versions and track Diff of datasets Proper use of metadata to track changes / check data consistency Data Quality and Provenance attached to datasets Licensing issues (How to properly cite and reuse datasets ) 15
  • 16. Signs of Hope Many semantic technology advances in reducing the barriers of querying billion triples even in a normal laptop Photo by Ron Smith on Unsplash 16
  • 17. Democratizing Access to LOD “Fernández, J. D., Beek, W., Martínez-Prieto, M.A., and Arias, M. LOD-a-lot: A Queryable Dump of the LOD cloud (2017). http://purl.org/HDT/lod-a-lot.” 28 billion unique triples from 650K datasets - All LOD in a medium size laptop: 524 GB of disk space; 15.7 GB of RAM 17
  • 18. Google Data Search or Schema.org In Action Google data service launched in 2008 Based on schema.org ( cf. https://toolbox.google.com/datasetsearch/search?query= Site%3Adata.gouv.fr ) Uses DCAT and other structured metadata to discover open datasets One DCAT file per Dataset / Googlebot is not smart enough Link: https://toolbox.google.com/datasetsearch 18
  • 19. Semantic technology guarantees FAIR principles of data published on the Web 19
  • 20. Beyond FAIR Principles Findable: unique ids that are resolvable, Accessible: common access method, Interoperable: shared vocabularies & taxonomies, Reusable: provenance, license 20 FAIR + Sustainable => FAIRS
  • 21. 56M items, 700M statements, 400 lang, 20K active contrib p/month, 900M edits, 8.5M daily SPARQL queries Healthy community that helps write sparql queries. Showing that technology is mature 60M links to DBpedia, 7.7 Billion triples Many applications in Chabot (Apple Siri, research, scientists, etc) SPARQL is affordable and usable Wikidata - A community for Wikibase 21
  • 22. The future starts today: data is infinite, the Web is here to stay, semantic technologies are mature. 22
  • 23. Towards a Sustainable LOD Ecosystem Work on having a board of committed people from different expertises (Tech, academics, industry, government, etc) Gather and Promote LOD tools, applications based on past experience Learned from errors of the past Create a real community of publishers and consumers out of W3C Liaise with W3C to create a community group? 23
  • 24. LOD for Social Good: Killer Apps ? Develop and use LOD for solving societal issues Apps achieve any of the 17 Sustainable Development Goals (SDGs) included in the 2030 Agenda for Sustainable Development. LOD to enhance advances on Misinformation issues on the Web 24
  • 25. Solutions for future LOD - Create a forum with different stakeholders to discuss LOD issues and maintenance Create a new way to manage and maintain datasets in LOD (W3C community, mix of community? Foundation ala Apache ? New enforcement rules for LOD management life- cycle More use cases of datasets with probabilistic and temporal models 25
  • 26. Graph Of Linked “Insights” Datasets ? Statistical models also find “insights” over datasets Data scientists spent hours understanding underlying data to generate reports, dashboards or applications How to model that knowledge and publish on the Web? “Current insights after data analysis gets stored in a spreadsheet and then gets lost. We want to create a graph of insights, link them and generate new insights” - Lambert Hogenhout, UN #kgc2019 https://twitter.com/juansequeda/status/1126144558 683885569 26
  • 27. Takeaway message The maturity of semantic technologies is fully demonstrated in many real world applications The Web is a precious mean to exchange information both by humans and machines Versioning and datasets updates are still challenging. For exchanging knowledge, LOD is probably the (only) solution - The only way to be able to make “AI intelligent” New applications combining AI (autonomous agents, chatbots, etc) 27
  • 28. The more you publish datasets as LOD, the more you are preparing the next generation of "prescriptive" autonomous agents. Classical AI (predictive) alone (neural networks, machine learning) can’t make this happen 28
  • 29. “The future [of the Web] is still so much bigger than the past.” - Tim BL (2018) So will be the future of Linked Open Data… Just publish and share your Assets on the Web https://inrupt.com/blog/one-small-step-for-the-web 29
  • 30. Thank you for your attention!! Questions? 30
  • 31. THE FUTURE OF LINKED OPEN DATA Ghislain Atemezing, PhD Director R&D - MONDECA @gatemezing 31ESSnet Linked Open Statistics - Sofia, Bulgaria - 28th May 2019

Notas del editor

  1. Results for why RDF is not easier for middle 30% of developers?
  2. Popularity of searching terms RDF vs LOD since 2014. LOD search is more popular than RDF. (beware of RDF = Rwanda Defence Force) RDF search decreasing since 2014. Same for LOD.
  3. March 2019 : 1,239 datasets with 16,147 links (as of March 2019) March 2008: 34 datasets March 2014: 570 datasets and 2,909 linkage relationships between the datasets March 2024 : 1,239 *2.17 -> 2,688+ datasets with 16,147*5,5 with 88,808 links
  4. “Data, Scientific; Astell, Mathias (2017): Benefits of Open Research Data Infographic. figshare. Figure” https://doi.org/10.6084/m9.figshare.5179006.v3
  5. Figshare from Open Science community could be a way to go…
  6. How many of you have ontologists team?
  7. More Data governance from publishers is needed Many tools built during a project are not maintained anymore after completion Who are culprit?
  8. By using two technologies, RDF binary format and Linked Data Fragments, you can even deploy and run all LOD in a medium size laptop.
  9. May 2019: 14M datasets from 3k repositories. Crawl #DCAT and http://schema.org metadata Link guidelines Google dataset search: https://developers.google.com/search/docs/data-types/dataset#sitemap You can use one single DCAT for all your datasets in your domain
  10. Build a community for LOD as Wikidata for wikibase. Active users / active community See Grafana https://grafana.wikimedia.org/d/000000489/wikidata-query-service?refresh=1m&orgId=1&from=now-1y&to=now
  11. We need to review the current challenges to make a more sustainable LOD ecosystem
  12. I really like the idea of having workshops on this topic of SDGs, see a call at ISWC 2019 https://sw4sg2019.github.io/iswc2019/
  13. Todo: best practices for data aggregation and scale for publishing
  14. This is similar to the idea of Eurostat to create a KG of explained statistics.
  15. The more you publish datasets as LOD, the more you are preparing the next generation of “prescriptive” autonomous agents. Classical AI alone (neural networks, machine learning) can’t make this happen. A Need to find new ways to maintain and update data in the LOD