This document discusses DBpedia Spotlight, a tool that semantically enhances text by linking terms to entities in DBpedia. It describes how DBpedia Spotlight works through three stages: spotting terms, mapping them to candidate entities, and disambiguating the correct entity based on context. It also presents examples of how DBpedia Spotlight creates a virtuous cycle between structured Linked Data and unstructured text, by improving both DBpedia and sources like Wikipedia.
Bhavnagar Escort💋 Call Girl (Komal) Service #Bhavnagar Call Girl @Independent...
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012
1. A Virtuous Cycle of
Semantic Enhancement with
DBpedia Spotlight
Pablo N. Mendes, Christian Bizer
pablo.mendes@fu-berlin.de
Web Based Systems Group
Freie Universität Berlin
SemTechBiz Berlin
February 6th 2011
2. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
Agenda
• What do we mean by semantic enhancement?
• How does DBpedia Spotlight work?
• From Wikipedia to DBpedia Spotlight
• From DBpedia Spotlight to Wikipedia
• In your project
Can you also enable a virtuous cycle?
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
3. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
Semantic Enhancement?
• Generally:
– Making something easier to understand
• For humans:
– Say what you mean (reduce ambiguity)
– Make associations
– Access to definitions, background
• For machines:
– the same, but in structured format
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
4. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
Semantic Enhancement (Example)
http://nyti.ms/qsYAyt
News Annotation
Links to “topics”
Topic pages lead to related
content
Semantic Enhancement
Links text to unique
identifiers
Adds background
information
Interconnects related
content
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
5. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
A Virtuous Cycle of Semantic Enhancement
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
6. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
DBpedia Spotlight
• DBpedia is a collection of entity descriptions
extracted from Wikipedia & shared as linked data
• DBpedia Spotlight uses data from DBpedia and
text from associated Wikipedia pages
• Learns how to recognize that a DBpedia resource
was mentioned
• Given plain text as input, generates annotated
text
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
7. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
DBpedia Spotlight: Text Annotation
• From:
(…) Upon their return, Lennon and McCartney
went to New York to announce the formation of
Apple Corps.
• To:
(…) Upon their return, Lennon and McCartney
went to New York to announce the formation of
Apple Corps.
http://dbpedia.org/resource/New_York_City
http://dbpedia.org/resource/Apple_Corps
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
8. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
Challenge: Term Ambiguity
• ...this apple on the palm of my hand...
• ...Apple tried to acquire Palm Inc....
• ...eating an apple sitting next to a palm tree...
• What do “apple” and “palm” mean in each case?
• Our objective is to recognize entities/topics and
disambiguate their meaning, generating DBpedia
annotation in text.
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
9. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
Stage 1: Spotting
• Find substrings that seem worthy of annotation
Input:
(…) Upon their return, Lennon and McCartney went to New York
to announce the formation of Apple Corps.
Output:
“Lennon”, “McCartney”, “New York”, “Apple Corps”
• Simplest approach relies on a dictionary of known
entity names.
– Other: Named Entity Recognition, Keyphrase Extraction, ...
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
10. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
Stage 2: Candidate Mapping
• Find possible meanings for each of the
spotted substrings.
Input (spotted names):
“Lennon”, “McCartney”, “New York”, “Apple Corps”
Output (candidate map):
“Lennon”: { Lennon_(album),
Lennon,_Michigan, … }
“McCartney”: { McCartney(surname),
Paul_McCartney, … }
“New York”: { New_York_State,
New_York_City, … }
“Apple Corps”: { Apple_Corps }
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
11. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
Stage 3: Disambiguation
• Select the correct candidate DBpedia Resource
for a given substring.
• Decision is made based on the context(1) the
substring was mentioned
con·text (kntkst)n.
1. the parts of a discourse that surround a word or
passage and can throw light on its meaning
http://mw1.merriam-webster.com/dictionary/context
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
12. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
Learning the Context for a resource
(…) Upon their return, Lennon and McCartney went to New
York to announce the formation of Apple Corps.
• Collect context for DBpedia Resources from all
articles in Wikipedia
e.g. Co-occurrence Statistics
John_Lennon = {John:981, Beatles:320, McCartney:100, ...}
• Types of context
– Wikipedia Pages
– Definitions from disambiguation pages
– Paragraphs that link to resources
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
13. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
DBpedia Spotlight
http://spotlight.dbpedia.org/demo
Freely available Web Service;
Open Source, Java/Scala
Apache V2 License.
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
14. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
A Virtuous Cycle
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
15. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
The “Suggest” Button
User decides to add a link
Suggest
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
16. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
The “Suggest” Button
System suggest targets, user chooses
Suggest
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
17. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
Sztakipedia
http://pedia.sztaki.hu
• Developed by Mihály Héder et al. at MTA SZTAKI
(Hungarian Academy of Sciences)
• Adds a toolbar to Wikipedia that can use DBpedia
Spotlight to suggest links
– Also suggests Categories, Infoboxes, Books
• Helps editors to refine knowledge in Wikipedia
– More interconnections, more entity types, more
structured data!
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
18. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
Sztakipedia (screenshots)
http://pedia.sztaki.hu
Source: http://www.youtube.com/watch? v=8VW0TrvXpl4
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
19. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
Beyond Wikipedia? RDFaCE
http://rdface.aksw.org/
• Developed by Ali Khalili at U. Leipzig (AKSW)
• Helps users to add RDFa markup via a
WYSIWYG interface
• Can use DBpedia Spotlight, among other
services to disambiguate entity names
• Available as a Wordpress Plugin
– Enables blogs as sources of context for
disambiguation
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
20. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
RDFaCE
http://rdface.aksw.org
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
21. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
A Virtuous Cycle in My Enterprise
• Select a database of entity identifiers
• Select textual sources that talk about those
entities
• Use semantic enhancement editors (with
automatic suggestions) to annotate text
• Use annotated text to re-train annotator
DBpedia Spotlight is ready for MediaWiki XML and TSV.
Other formats to come! Take a look at NIF
http://nlp2rdf.org/nif-1-0
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
22. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
Semantic Enhancement Marketplace
• User-generated annotations are valuable
crowdsourced knowledge
• Can be used as currency:
– “sweat for web service provision”
– b2b partnerships
• Example: RoboTagger.com
– entity annotation service (in German)
– entity types are not fixed (crowd-sourced)
– users rewarded with more access to web service
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
23. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
Conclusion
• Unstructured information (text) and
structured information (e.g. RDF)
– Mutually dependent and beneficial
• DBpedia Spotlight sits on the border of two
worlds:
– From Wikipedia, an automatic annotator
– From the auto-annotator, a more interconnected
Wikipedia
• Fosters a semantic enhancement ecosystem!
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
24. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
Thank you!
On Twitter: @pablomendes
E-mail: pablo.mendes@fu-berlin.de
Web: http://pablomendes.com
http://slideshare.net/pablomendes
• Special thanks to Mihály Héder and Iavor Jelev
for many fruitful discussions
• DBpedia Spotlight is partially funded by LOD2.eu
http://spotlight.dbpedia.org
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight
25. FREIE UNIVERSITÄT BERLIN
SemTechBiz Berlin, February 2012
http://wbsg.de
Links
• Download
– DBpedia: http://dbpedia.org/Downloads37/
– DBpedia Spotlight:
http://sourceforge.net/projects/dbp-spotlight/
– RDFaCE
• http://code.google.com/p/rdface/
• Wordpress plugin:
http://wordpress.org/extend/plugins/rdface/
Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight