Linked Open Vocabulary Ranking and Terms Discovery

Linked Open Vocabulary
Ranking and Terms
Discovery
Ioannis Stavrakantonakis
PhD candidate
University of Innsbruck - STI Innsbruck
Ioannis Stavrakantonakis, Anna Fensel, Dieter Fensel

3
webpage
n keywords
n searches
n*m search results
n result terms
extract
perform
extract
ﬁlter
Discovering vocabulary terms
use
vocab- 
recommender

Outline
• Survey of vocabulary terms discovery.
• State of the art.
• Vocab-recommender: The vocabulary terms
discovery assistant.
• Outlook - What is next?
4

Survey of vocabulary terms discovery

Survey of vocabulary terms
discovery
• 64 participants with valid submissions out of 66.
• Familiar to Computer Science but without any
experience in the annotations topic.
• 4 use cases: article (NASA), exhibition (Louvre), hotel
(room description), recipe (pizza).
• 1 week time to submit the answers.
• Completion time provided by them.
• “Use solely the LOV Search for the discovery of terms.”
6

Number of selected terms
median 50% of 
data points
whisker - minimum
whisker - maximum
7

Measured selection time
8
• A few outliers in the terms
selection time.
• Least skewed
measurements for the
exhibition and recipe
cases.
• Takes in average 1hr.
• Exhibition & recipe cases
have the lowest time.

Distribution of participants
and schema.org
9
The 47% of the proposed terms belong to the
schema.org namespace.
Is it due to a speciﬁc use case? - No.

General observations
• 2 out of 66 participants failed to provide a valid
submission (3%).
• Static parts (media) were considered for inclusion
only from 10% of the participants.
• The terms discovery process per se, didn’t include
a guideline to follow.
• Participants faced cases that the vocabulary terms
description was hard to follow.
10

Outline
11

Inclusion rules
• Be written in RDF and be dereferenceable (URI).
• Be parseable without errors.
• Terms should have an rdfs:label.
• Reuse relevant existing vocabularies.
• Provide metadata about itself.
No guarantee about the effectiveness.
Linked Open Vocabularies
13

Ranking
14
• Atemezing and Troncy: Information Content (IC) in LOV.
a) Terms occurrence in comparison to the maximum
term occurrence in the set of vocabularies. b) Centrality
of the vocabulary.
• DWRank: a) Hub score. b) Authority score. (no LOD
usage)
• TermPicker: Suggests types and properties from
vocabularies that other LOD providers have combined
together with the one the engineer has used to model
the given part (using Schema Level Patterns).

Outline
15

Vocab-recommender: The vocabulary
terms discovery assistant

Methodology
Aim: Assist the exploration of the vocabulary space for a
given input webpage W.
Output: A set of vocabulary terms covering the needs of a
given W.
Requires:
• Perform all the discovery steps in an automatic manner.
• Rank result terms to provide the best matches.
• Describe the output in a transparent way that helps the
user educate herself about the vocabulary space.
17

Methodology
Ranker
Extractor
Searcher
Static
recommender
Recommender
webpage Vocabulary terms set generator
Result set T
18

Ranking dimensions
• Vocabularies ranking (backlinks, inactive vocabs).
• Vocabulary authors proﬁle.
• Vocabulary terms ranking (LOD usage: LODStats,
vocab.cc).
• Vocabulary terms result set for similar webpages.
19

Vocabulary authors
How can we address the low ranking scores of new
vocabularies?
Promote newly created vocabularies by authors
that have provided vocabularies in the past with a
desired quality level (reﬂected in ranking).
20
vocabularies
authors

Static recommendations
• Refers to images, videos, audio objects.
• Important aspect of the webpage interpretation by
the search engines.
• Static mappings to some well deﬁned schema.org
terms.
21

Implementation
• As a Web service.
• Modular, i.e. any part can be substituted.
• Input: URL or Keywords.
• Output: Set of vocabulary terms described using
the vSearch vocabulary.
22

Describing search results
23
The vSearch vocabulary

Comparison with survey
25
Approach
recall: 71% ( - cooking time, cooking method)
precision: 100% (no irrelevant terms)

Outline
26

Approach accomplishments
• Provide vocabulary terms recommendations for a given
webpage.
• Simplify the discovery process.
• Address the cold start problem for new vocabularies in
the search.
• Educate the users around the semantic annotations topic.
• Provide a general vocabulary that can be used to
describe search results.
28

What is next?
• For the presented approach: Recommendation of
actions given the entities that have been proposed
by the approach.
• For the Web: Utilise the structured and semantically
annotated data to assist our daily lives. Interpret
the annotated websites as APIs.
29

Thank you!
ioannis.stavrakantonakis@sti2.at
istavrak.com
@istavrak

References
Ioannis Stavrakantonakis, Anna Fensel, and Dieter Fensel: Linked Open Vocabulary ranking and terms
discovery. In Proceedings of the SEMANTiCS 2016.
Ioannis Stavrakantonakis, Anna Fensel, and Dieter Fensel: Towards a vocabulary terms discovery
assistant. In Proceedings of the SEMANTiCS 2016 Posters & Demos.
G. A. Atemezing and R. Troncy. Information content based ranking metric for Linked Open
Vocabularies. In Proceedings of the 10th International Conference on Semantic Systems, 2014.
P.-Y. Vandenbussche, G. A. Atemezing, M. Poveda-Villalón, and B. Vatant. Linked Open Vocabularies
(LOV): a gateway to reusable semantic vocabularies on the web. Semantic Web, 2015.
J. Schaible, T. Gottron, and A. Scherp. TermPicker: Enabling the reuse of vocabulary terms by
exploiting data from the Linked Open Data cloud. In European Semantic Web Conference (ESWC),
2016.
Photo credits for the sections’ photos:  
https://unsplash.com/photos/7m2gkYUDfFE 
https://unsplash.com/photos/DJ_kOgH5u0o  
https://unsplash.com/photos/o4-YyGi5JBc  
https://unsplash.com/photos/a8YV2C3yBMk  
https://unsplash.com/photos/s9XMNEm-M9c

Linked Open Vocabulary Ranking and Terms Discovery

Recomendados

Recomendados

Más contenido relacionado

Similar a Linked Open Vocabulary Ranking and Terms Discovery

Similar a Linked Open Vocabulary Ranking and Terms Discovery (20)

Último

Último (20)

Linked Open Vocabulary Ranking and Terms Discovery