SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
The web of data:
how are we
doing so far?
E L E N A S I M P E RL
KI NG’ S COLLE GE LONDON
@E S I M PE RL
The web has shaped our understanding and
interactions with data
Answering factual
questions
Sharing data online
Publishing data for
others to use
Creating datasets in
collaboration
Creating
digital
traces
Making
algorithms
smarter
with data
Logbook of the USS Burton Island from a voyage to the Arctic in 1948
(Source: Fensel, 2013) (Source: W3C, 2006)
The theory and practice of the web are not
the same
Release Datasets Links
2022 1573 2.6 M
2020 1440 2.5 M
2018 1308 2.4 M
data.europa.eu (1.6M datasets from
170 publishers in 36 countries)
15
16
Elena
loves this
900+ use
cases!!!
Low uptake of linked data, limited vocabulary
reuse, proprietary, non-dereferenceable
vocabularies, reasonable metadata quality
45 million
datasets
(DCAT,
schema.org)
There is a lot of annotated data
online, especially about locations,
products, businesses
24
(Source: Walker & Simperl, 2017)
The ten guidelines
Organise for use of the datasets - rather than simply for publication
Promote use through data storytelling and community building, borrowing from open-source communities and other
peer-production systems
Invest in discoverability best practices, borrowing from e-commerce and web search
Publish good quality metadata - to enhance reuse
Adopt standards to ensure interoperability
Co-locate tools so that a wider range of users can be engaged with
Link datasets to enhance value
Be accessible by offering options for from APIs to CSV downloads
Co-locate documentation - users should not need to be domain experts to understand the data;
Be measurable - as a way to assess how well they are meeting users’ needs.
Operationalising the guidance
Literature review to develop 5* schemes to operationalise indicators.
Application of the schemes on 10 open data portals at different maturity level.
(Walker & Simperl, 2017)
Example: Organise for use
Each dataset is accompanied by a comprehensive descriptive
record (going beyond a collection of structured metadata)
An extract of the data can be previewed (for sense making)
The portal provides recommendations for related datasets
The portal enables users to review/rate the datasets
Keywords from datasets are linked to other published datasets
Example: Co-locate documentation
Supporting documentation does not exist.
Supporting documentation exists, but as a document found separately from the data.
Supporting documentation is found at the same time as the data (e.g. the link to the document is
next to the link to the data in the search).
Supporting documentation can be immediately accessed from within the dataset but it is not context
sensitive (e.g. a link to the documentation or text contained within the dataset).
Supporting documentation can be immediately accessed from within the dataset and it is context
sensitive so that users can immediately access information about a specific item of concern (e.g. a
link to a specific point in the documentation or the text contained within the dataset).
Varying open data maturity levels
Be discoverable , co-locate documentation, be
measurable are universally challenging
User-centric principles in
data.europa.eu
A lot of guidance available already
Is there any evidence that it works?
Be
measurable:
GitHub as a
data platform
~1.4 million datasets (e.g. CSV,
excel) from ~65K repos
Map literature features to both
dataset and repository features
Use engagement metrics as
proxies for data reuse
Train a predictive model to see
what publishing guidance leads to
higher engagement values
Size Attributes
Age
Quality
Documentation
Reviews
Recommendations for publishers
Co-locate documentation
◦ Informative, short text about the dataset
◦ Comprehensive README file in a structured form,
links to further information
Co-locate tools
◦ Standard processable file sizes for dataset
distributions
◦ Openable with a standard configuration of a
common library (such as Pandas)
Can people find the data they need?
Analysis of logs and data requests
(2018)
• Four national open government data portals, 2.2 million queries (2013 – 2016), 1500 data
requests.
• Shorter queries, include temporal and location information.
• Explorative search.
• Native and external queries topically different.
• Data requests offer more context to user intent.
Analysis of logs (2020 - 21)
844k sessions from 04/2018 to 06/2020, web search as well as
native search sessions from the European Data Portal
Location, provenance, format, licence, time frame and date, publishing
date, location of publication and data schema
Mostly web search, web search and native search users have different
information needs and different success rates
Dataset preview page is important in web search
Linking to stories and other content helps with traffic
Recommendations for publishers
Two types of
users
Spatial and
temporal
queries
Result
presentation
Quality
reviews
Data stories
More logs
needed!
Data documentation and sensemaking
practices
Conclusions
We are at a crucial moment in data availability and use, online
and elsewhere.
There is an increasing body of evidence about what people’s
data needs and about how data is published on the web.
We don’t have links and we don’t always have great business
cases for creating and maintaining them on the open,
decentralised web. In fact, with the rise of generative AI we need
to re-think business models for open data publishing yet another
time.
Standards emerge and are adopted when they solve a real
problem. Technology can help rather than hinder and should
never take the centre stage.
Metadata vocabularies are used
where there is a clear business
case. More documentation
needed to make them useful
more widely.
Some data is missing, with
serious consequences
(Source: data.europa.eu, Federica
Fragapane, 2022)
Generative AI as a tool to support
data sensemaking and reuse
Thank you
Talking Datasets: understanding data sensemaking behaviours. L Koesten, K Gregory, P Groth, E Simperl.
International Journal of Human-Computer Studies, 146:102562, 2021
Everything You Always Wanted to Know about a Dataset: Studies in Data Summarisation. L Koesten, E Simperl, E
Kacprzak, T Blount, J Tennison. International Journal of Human-Computer Studies. 2019
Collaborative Practices with Structured Data: Do Tools Support what Users Need? L Koesten, E Kacprzak, E Simperl, J
Tennison. ACM CHI Conference on Human Factors in Computing Systems, 2019.
Dataset search: a survey. A Chapman, E Simperl, L Koesten, G Konstantinidis, LD Ibáñez, E Kacprzak, P Groth. The
International Journal on Very Large Data Bases, 2019.
Characterising dataset search — An analysis of search logs and data requests. E Kacprzak, L Koesten, LD Ibáñez, T
Blount, J Tennison, E Simperl. Journal of Web Semantics, 2018
Making sense of numerical data-semantic labelling of web tables. Kacprzak, E., Giménez-García, J.M., Piscopo, A.,
Koesten, L., Ibáñez, L.D., Tennison, J. and Simperl, E. In European Knowledge Acquisition Workshop (pp. 163-178),
Springer, 2018
The Trials and Tribulations of Working with Structured Data - a Study on Information Seeking Behaviour. L Koesten, E
Kacprzak, J Tennison, E Simperl. ACM CHI Conference on Human Factors in Computing Systems, 2017
Dataset Reuse: Toward Translating Principles to Practice. L Koesten, P Vougiouklis, E Simperl, P Groth. Patterns,
2020
A comparison of dataset search behaviour of internal versus search engine referred sessions. L D Ibáñez, and E
Simperl. ACM SIGIR Conference on Human Information Interaction and Retrieval (pp. 158-168), 2022
Characterising Dataset Search on the European Data Portal . L Ibáñez, L Koesten, E Kacprzak, E Simperl. European
Data Portal Analytical Report 18, 2020
Understanding Supply and Demand on the European Data Portal. L Ibáñez, E Simperl. European Data Portal Analytical
Report 19, 2020
The Future of Open Data Portals. J Walker, E Simperl. European Data Portal Analytical Report 8, 2017
Smart Rural: The Open Data Gap. J Walker, G Thuermer, E Simperl, L Carr. Data for Policy, 2020.

Más contenido relacionado

Similar a The web of data: how are we doing so far

RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneResearch Data Alliance
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
The “use” of an electronic resource from a social network analysis perspective
The “use” of an electronic resource from a social network analysis perspectiveThe “use” of an electronic resource from a social network analysis perspective
The “use” of an electronic resource from a social network analysis perspectiveMarie Kennedy
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...AKSHAY BHAGAT
 
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...Eric Stephan
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourKNOWeSCAPE2014
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...giuseppe_futia
 
Open Data and Institutional Repositories
Open Data and Institutional RepositoriesOpen Data and Institutional Repositories
Open Data and Institutional RepositoriesRobin Rice
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration James Hendler
 
ODIN Final Event - The Care and Feeding of Scientific Data
ODIN Final Event - The Care and Feeding of Scientific DataODIN Final Event - The Care and Feeding of Scientific Data
ODIN Final Event - The Care and Feeding of Scientific Datadatacite
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...Eric Stephan
 
Birgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International PerspectiveBirgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International Perspectivedri_ireland
 
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...e-ROSA
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessdatacite
 

Similar a The web of data: how are we doing so far (20)

RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOne
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 
The “use” of an electronic resource from a social network analysis perspective
The “use” of an electronic resource from a social network analysis perspectiveThe “use” of an electronic resource from a social network analysis perspective
The “use” of an electronic resource from a social network analysis perspective
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
 
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 
Open Data and Institutional Repositories
Open Data and Institutional RepositoriesOpen Data and Institutional Repositories
Open Data and Institutional Repositories
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
ODIN Final Event - The Care and Feeding of Scientific Data
ODIN Final Event - The Care and Feeding of Scientific DataODIN Final Event - The Care and Feeding of Scientific Data
ODIN Final Event - The Care and Feeding of Scientific Data
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
 
Birgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International PerspectiveBirgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International Perspective
 
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information access
 

Más de Elena Simperl

This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceElena Simperl
 
Knowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationKnowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationElena Simperl
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringElena Simperl
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfElena Simperl
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringElena Simperl
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfElena Simperl
 
Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Elena Simperl
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesElena Simperl
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterElena Simperl
 
The story of Data Stories
The story of Data StoriesThe story of Data Stories
The story of Data StoriesElena Simperl
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...Elena Simperl
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesElena Simperl
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...Elena Simperl
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachElena Simperl
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computingElena Simperl
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in WikidataElena Simperl
 

Más de Elena Simperl (20)

This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing science
 
Knowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationKnowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generation
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdf
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdf
 
Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart cities
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on Twitter
 
The story of Data Stories
The story of Data StoriesThe story of Data Stories
The story of Data Stories
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart cities
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
 
Qrowd and the city
Qrowd and the cityQrowd and the city
Qrowd and the city
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approach
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computing
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in Wikidata
 
The data we want
The data we wantThe data we want
The data we want
 
Data stories
Data storiesData stories
Data stories
 

Último

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

The web of data: how are we doing so far

  • 1. The web of data: how are we doing so far? E L E N A S I M P E RL KI NG’ S COLLE GE LONDON @E S I M PE RL
  • 2. The web has shaped our understanding and interactions with data
  • 8. Making algorithms smarter with data Logbook of the USS Burton Island from a voyage to the Arctic in 1948
  • 9. (Source: Fensel, 2013) (Source: W3C, 2006)
  • 10. The theory and practice of the web are not the same
  • 11. Release Datasets Links 2022 1573 2.6 M 2020 1440 2.5 M 2018 1308 2.4 M
  • 12.
  • 13. data.europa.eu (1.6M datasets from 170 publishers in 36 countries)
  • 14.
  • 15. 15
  • 16. 16
  • 18. Low uptake of linked data, limited vocabulary reuse, proprietary, non-dereferenceable vocabularies, reasonable metadata quality
  • 19.
  • 20.
  • 21.
  • 23. There is a lot of annotated data online, especially about locations, products, businesses
  • 24. 24 (Source: Walker & Simperl, 2017)
  • 25. The ten guidelines Organise for use of the datasets - rather than simply for publication Promote use through data storytelling and community building, borrowing from open-source communities and other peer-production systems Invest in discoverability best practices, borrowing from e-commerce and web search Publish good quality metadata - to enhance reuse Adopt standards to ensure interoperability Co-locate tools so that a wider range of users can be engaged with Link datasets to enhance value Be accessible by offering options for from APIs to CSV downloads Co-locate documentation - users should not need to be domain experts to understand the data; Be measurable - as a way to assess how well they are meeting users’ needs.
  • 26. Operationalising the guidance Literature review to develop 5* schemes to operationalise indicators. Application of the schemes on 10 open data portals at different maturity level. (Walker & Simperl, 2017)
  • 27. Example: Organise for use Each dataset is accompanied by a comprehensive descriptive record (going beyond a collection of structured metadata) An extract of the data can be previewed (for sense making) The portal provides recommendations for related datasets The portal enables users to review/rate the datasets Keywords from datasets are linked to other published datasets
  • 28. Example: Co-locate documentation Supporting documentation does not exist. Supporting documentation exists, but as a document found separately from the data. Supporting documentation is found at the same time as the data (e.g. the link to the document is next to the link to the data in the search). Supporting documentation can be immediately accessed from within the dataset but it is not context sensitive (e.g. a link to the documentation or text contained within the dataset). Supporting documentation can be immediately accessed from within the dataset and it is context sensitive so that users can immediately access information about a specific item of concern (e.g. a link to a specific point in the documentation or the text contained within the dataset).
  • 29. Varying open data maturity levels Be discoverable , co-locate documentation, be measurable are universally challenging
  • 31. A lot of guidance available already
  • 32. Is there any evidence that it works?
  • 33. Be measurable: GitHub as a data platform ~1.4 million datasets (e.g. CSV, excel) from ~65K repos Map literature features to both dataset and repository features Use engagement metrics as proxies for data reuse Train a predictive model to see what publishing guidance leads to higher engagement values Size Attributes Age Quality Documentation Reviews
  • 34. Recommendations for publishers Co-locate documentation ◦ Informative, short text about the dataset ◦ Comprehensive README file in a structured form, links to further information Co-locate tools ◦ Standard processable file sizes for dataset distributions ◦ Openable with a standard configuration of a common library (such as Pandas)
  • 35. Can people find the data they need?
  • 36. Analysis of logs and data requests (2018) • Four national open government data portals, 2.2 million queries (2013 – 2016), 1500 data requests. • Shorter queries, include temporal and location information. • Explorative search. • Native and external queries topically different. • Data requests offer more context to user intent.
  • 37. Analysis of logs (2020 - 21) 844k sessions from 04/2018 to 06/2020, web search as well as native search sessions from the European Data Portal Location, provenance, format, licence, time frame and date, publishing date, location of publication and data schema Mostly web search, web search and native search users have different information needs and different success rates Dataset preview page is important in web search Linking to stories and other content helps with traffic
  • 38. Recommendations for publishers Two types of users Spatial and temporal queries Result presentation Quality reviews Data stories More logs needed!
  • 39. Data documentation and sensemaking practices
  • 40. Conclusions We are at a crucial moment in data availability and use, online and elsewhere. There is an increasing body of evidence about what people’s data needs and about how data is published on the web. We don’t have links and we don’t always have great business cases for creating and maintaining them on the open, decentralised web. In fact, with the rise of generative AI we need to re-think business models for open data publishing yet another time. Standards emerge and are adopted when they solve a real problem. Technology can help rather than hinder and should never take the centre stage.
  • 41. Metadata vocabularies are used where there is a clear business case. More documentation needed to make them useful more widely.
  • 42. Some data is missing, with serious consequences (Source: data.europa.eu, Federica Fragapane, 2022)
  • 43. Generative AI as a tool to support data sensemaking and reuse
  • 44. Thank you Talking Datasets: understanding data sensemaking behaviours. L Koesten, K Gregory, P Groth, E Simperl. International Journal of Human-Computer Studies, 146:102562, 2021 Everything You Always Wanted to Know about a Dataset: Studies in Data Summarisation. L Koesten, E Simperl, E Kacprzak, T Blount, J Tennison. International Journal of Human-Computer Studies. 2019 Collaborative Practices with Structured Data: Do Tools Support what Users Need? L Koesten, E Kacprzak, E Simperl, J Tennison. ACM CHI Conference on Human Factors in Computing Systems, 2019. Dataset search: a survey. A Chapman, E Simperl, L Koesten, G Konstantinidis, LD Ibáñez, E Kacprzak, P Groth. The International Journal on Very Large Data Bases, 2019. Characterising dataset search — An analysis of search logs and data requests. E Kacprzak, L Koesten, LD Ibáñez, T Blount, J Tennison, E Simperl. Journal of Web Semantics, 2018 Making sense of numerical data-semantic labelling of web tables. Kacprzak, E., Giménez-García, J.M., Piscopo, A., Koesten, L., Ibáñez, L.D., Tennison, J. and Simperl, E. In European Knowledge Acquisition Workshop (pp. 163-178), Springer, 2018 The Trials and Tribulations of Working with Structured Data - a Study on Information Seeking Behaviour. L Koesten, E Kacprzak, J Tennison, E Simperl. ACM CHI Conference on Human Factors in Computing Systems, 2017 Dataset Reuse: Toward Translating Principles to Practice. L Koesten, P Vougiouklis, E Simperl, P Groth. Patterns, 2020 A comparison of dataset search behaviour of internal versus search engine referred sessions. L D Ibáñez, and E Simperl. ACM SIGIR Conference on Human Information Interaction and Retrieval (pp. 158-168), 2022 Characterising Dataset Search on the European Data Portal . L Ibáñez, L Koesten, E Kacprzak, E Simperl. European Data Portal Analytical Report 18, 2020 Understanding Supply and Demand on the European Data Portal. L Ibáñez, E Simperl. European Data Portal Analytical Report 19, 2020 The Future of Open Data Portals. J Walker, E Simperl. European Data Portal Analytical Report 8, 2017 Smart Rural: The Open Data Gap. J Walker, G Thuermer, E Simperl, L Carr. Data for Policy, 2020.