Searching among the existing 500 and more vocabularies was never easier than today with the Linked Open Vocabularies
(LOV) curated directory list. The LOV search provides one
central point to explore the vocabulary terms space. However,
it can be still cumbersome for non-expert or semantic
annotation experts to discover the appropriate terms for the
description of given website content. In this direction, the
proposed approach is the cornerstone part of a framework
that aims to facilitate the selection of the highest ranked
terms from the abundance of the registered vocabularies
based on a keyword search. Moreover, it introduces for the
rst time the role of the contributors background, as that
is retrieved from the LOV repository, in the ranking of the
vocabularies. With this addition, we aim to address the issue
of very low scores for the newly published vocabularies.
The paper describes the algorithm that enables the ranking
of vocabularies within the above mentioned framework and
analyses the results of the corresponding evaluation.
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
Linked Open Vocabulary Ranking and Terms Discovery
1. Linked Open Vocabulary
Ranking and Terms
Discovery
Ioannis Stavrakantonakis
PhD candidate
University of Innsbruck - STI Innsbruck
Ioannis Stavrakantonakis, Anna Fensel, Dieter Fensel
3. 3
webpage
n keywords
n searches
n*m search results
n result terms
extract
perform
extract
filter
Discovering vocabulary terms
use
vocab-
recommender
4. Outline
• Survey of vocabulary terms discovery.
• State of the art.
• Vocab-recommender: The vocabulary terms
discovery assistant.
• Outlook - What is next?
4
6. Survey of vocabulary terms
discovery
• 64 participants with valid submissions out of 66.
• Familiar to Computer Science but without any
experience in the annotations topic.
• 4 use cases: article (NASA), exhibition (Louvre), hotel
(room description), recipe (pizza).
• 1 week time to submit the answers.
• Completion time provided by them.
• “Use solely the LOV Search for the discovery of terms.”
6
7. Number of selected terms
median 50% of
data points
whisker - minimum
whisker - maximum
7
8. Measured selection time
8
• A few outliers in the terms
selection time.
• Least skewed
measurements for the
exhibition and recipe
cases.
• Takes in average 1hr.
• Exhibition & recipe cases
have the lowest time.
9. Distribution of participants
and schema.org
9
The 47% of the proposed terms belong to the
schema.org namespace.
Is it due to a specific use case? - No.
10. General observations
• 2 out of 66 participants failed to provide a valid
submission (3%).
• Static parts (media) were considered for inclusion
only from 10% of the participants.
• The terms discovery process per se, didn’t include
a guideline to follow.
• Participants faced cases that the vocabulary terms
description was hard to follow.
10
11. Outline
• Survey of vocabulary terms discovery.
• State of the art.
• Vocab-recommender: The vocabulary terms
discovery assistant.
• Outlook - What is next?
11
13. Inclusion rules
• Be written in RDF and be dereferenceable (URI).
• Be parseable without errors.
• Terms should have an rdfs:label.
• Reuse relevant existing vocabularies.
• Provide metadata about itself.
No guarantee about the effectiveness.
Linked Open Vocabularies
13
14. Ranking
14
• Atemezing and Troncy: Information Content (IC) in LOV.
a) Terms occurrence in comparison to the maximum
term occurrence in the set of vocabularies. b) Centrality
of the vocabulary.
• DWRank: a) Hub score. b) Authority score. (no LOD
usage)
• TermPicker: Suggests types and properties from
vocabularies that other LOD providers have combined
together with the one the engineer has used to model
the given part (using Schema Level Patterns).
15. Outline
• Survey of vocabulary terms discovery.
• State of the art.
• Vocab-recommender: The vocabulary terms
discovery assistant.
• Outlook - What is next?
15
17. Methodology
Aim: Assist the exploration of the vocabulary space for a
given input webpage W.
Output: A set of vocabulary terms covering the needs of a
given W.
Requires:
• Perform all the discovery steps in an automatic manner.
• Rank result terms to provide the best matches.
• Describe the output in a transparent way that helps the
user educate herself about the vocabulary space.
17
19. Ranking dimensions
• Vocabularies ranking (backlinks, inactive vocabs).
• Vocabulary authors profile.
• Vocabulary terms ranking (LOD usage: LODStats,
vocab.cc).
• Vocabulary terms result set for similar webpages.
19
20. Vocabulary authors
How can we address the low ranking scores of new
vocabularies?
Promote newly created vocabularies by authors
that have provided vocabularies in the past with a
desired quality level (reflected in ranking).
20
vocabularies
authors
21. Static recommendations
• Refers to images, videos, audio objects.
• Important aspect of the webpage interpretation by
the search engines.
• Static mappings to some well defined schema.org
terms.
21
22. Implementation
• As a Web service.
• Modular, i.e. any part can be substituted.
• Input: URL or Keywords.
• Output: Set of vocabulary terms described using
the vSearch vocabulary.
22
26. Outline
• Survey of vocabulary terms discovery.
• State of the art.
• Vocab-recommender: The vocabulary terms
discovery assistant.
• Outlook - What is next?
26
28. Approach accomplishments
• Provide vocabulary terms recommendations for a given
webpage.
• Simplify the discovery process.
• Address the cold start problem for new vocabularies in
the search.
• Educate the users around the semantic annotations topic.
• Provide a general vocabulary that can be used to
describe search results.
28
29. What is next?
• For the presented approach: Recommendation of
actions given the entities that have been proposed
by the approach.
• For the Web: Utilise the structured and semantically
annotated data to assist our daily lives. Interpret
the annotated websites as APIs.
29
31. References
Ioannis Stavrakantonakis, Anna Fensel, and Dieter Fensel: Linked Open Vocabulary ranking and terms
discovery. In Proceedings of the SEMANTiCS 2016.
Ioannis Stavrakantonakis, Anna Fensel, and Dieter Fensel: Towards a vocabulary terms discovery
assistant. In Proceedings of the SEMANTiCS 2016 Posters & Demos.
G. A. Atemezing and R. Troncy. Information content based ranking metric for Linked Open
Vocabularies. In Proceedings of the 10th International Conference on Semantic Systems, 2014.
P.-Y. Vandenbussche, G. A. Atemezing, M. Poveda-Villalón, and B. Vatant. Linked Open Vocabularies
(LOV): a gateway to reusable semantic vocabularies on the web. Semantic Web, 2015.
J. Schaible, T. Gottron, and A. Scherp. TermPicker: Enabling the reuse of vocabulary terms by
exploiting data from the Linked Open Data cloud. In European Semantic Web Conference (ESWC),
2016.
Photo credits for the sections’ photos:
https://unsplash.com/photos/7m2gkYUDfFE
https://unsplash.com/photos/DJ_kOgH5u0o
https://unsplash.com/photos/o4-YyGi5JBc
https://unsplash.com/photos/a8YV2C3yBMk
https://unsplash.com/photos/s9XMNEm-M9c