This document summarizes a project that automatically generates metadata and links between digital books. It extracts semantic tags from books using natural language processing, represents each book as a collection of tags, and links books that share common tags. It optimizes the linking process by only considering the top 50% of tags to improve performance. The project aims to scale this approach and provide semantic recommendations between related books.
1. Exposing Digital Content as Linked Data,
and Linking them using StoryBlink
Ben De Meester
Tom De Nies, Laurens De Vocht,
Ruben Verborgh, Erik Mannens,
and Rik Van de Walle
University Ghent – iMinds – Multimedia Lab
ben.demeester@ugent.be | @Ben__DM
NLPDBpedia2015@ISWC | October 11th 2015 | Bethlehem, PA
2. We live in a fast world
with a lot of content to sift through
http://blog.qmee.com/qmee-online-in-60-seconds/
6. What do we want?
Automatic content-based metadata
to fuel future recommendation-engines
7. Content-based metadata
Get the tags…
DBPedia Spotlight
... use them to represent books’ content …
EPUB CFI, NIF, ITS, …
… and link to other books … in a good way.
TPF, EiCE
Storyblink!
8. Get the tags
Find out what a book is about…
Semantic tags!
Using NER/NED!
Extract all semantic concepts from the book
21. Keeping all concepts…
Not all mentioned concepts are useful.
The path finding becomes really slow.
What happens if we keep the top X%?
22. 0
10
20
30
40
50
60
0
2
4
6
8
10
12
14
0 10 20 30 40 50 60 70 80 90 100
Time (s)#paths
Amount of considered concepts (%)
Top 50% of found concepts gives similar paths,
but a lot faster
23. 0
10
20
30
40
50
60
0
2
4
6
8
10
12
14
0 10 20 30 40 50 60 70 80 90 100
Time (s)#paths
Amount of considered concepts (%)
Top 50% of found concepts gives similar paths,
but a lot faster
Time-out
28. Storyblink
gives a semantic representation
of important semantic concepts
inside books, and uses those to connect
books together content-wise
http://uvdt.test.iminds.be/storyblink
Demo 48
I would like to talk about representing large bodies of text to a small set of representative tags automatically, and how we can use that to find out how stories are related, so namely how we automatically exposed digital content as linked data, and link those semantic representations using an application we dubbed storyblink
Not tf-idf because a book about paper paper important
… so please come and check our booth 48 to play around with storyblink