SlideShare una empresa de Scribd logo
1 de 17
Gathering Alternative Surface Forms
for DBpedia Entities
Volha Bryl
University of Mannheim, Germany  Springer Nature
Christian Bizer, Heiko Paulheim
University of Mannheim, Germany
NLP & DBpedia @ ISWC 2015, Bethlehem, USA, October 11, 2015
Why you need Surface Forms
• Surface form (SF) of an entity is a collection of strings it can be
referred as to: synonyms, alternatives names, etc.
• Used to support many NLP tasks: co-reference resolution, entity
linking, disambiguation
2Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Why you need Surface Forms
• Surface form (SF) of an entity is a collection of strings it can be
referred as to: synonyms, alternatives names, etc.
• Used to support many NLP tasks: co-reference resolution, entity
linking, disambiguation
“Billionaire Elon Musk has spelled out how he plans to
create temporary suns over Mars in order to heat the
Red Planet. Dismissing earlier comments that he
intended to nuke the planet’s surface, he says he wants
to create aerial explosions to heat it up. ”
--- to link the three entities, your machine should know that red planet is
an alternative name for Mars, and that Mars can be referred to just by its
“type” – planet
3Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Surface Forms from Wiki(DB)pedia
• Some of Wikipedia’s (hence, DBpedia’s) crowd-sourced content look
quite like surface forms
• Page titles
• Redirects
• Account for alternative names, word forms (e.g. plurals), closely related words,
abbreviations, alternative spellings, likely misspellings, subtopics
• Disambiguation pages
• There are 10+ Bethlehem’s in US, according to
https://en.wikipedia.org/wiki/Bethlehem_(disambiguation)
• Anchor texts of links between wiki pages
Named after the Roman god of war, it is often referred to as the “Red
Planet”...
Source: Named after the [[Mars (mythology)|Roman god of war]], it is
often referred to as the "Red Planet“
• …additionally, we use anchor texts of links from external pages to Wikipedia
4Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Surface Forms from Wiki(DB)pedia
• Not a new idea
• BabelNet, DBpedia Spotlight, … [see our paper for more links]
5Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Mars in BabelNet:
Surface Forms from Wiki(DB)pedia
• Not a new idea
• BabelNet, DBpedia Spotlight, … [see our paper for more links]
• Problem: Quality
• …it is not only that quality is a problem, it is also that it have never been
assessed or addressed
• Reason 1: good quality of Wikipedia content is taken for granted
• Reason 2: hopes are that NLP algorithms won’t be influenced by noise
6Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Mars in BabelNet:
Surface Forms from Wiki(DB)pedia
• Not a new idea
• BabelNet, DBpedia Spotlight, … [see our paper for more links]
• Problem: Quality – Why?
• By adding a redirect or an anchor text of internal Wikipedia link, a Wikipedia
editor might mean not only same as or also known as, but also related to,
contains, etc.
• Both variants serve the purpose of pointing to the correct wiki page
7Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Mars in BabelNet:
Solution: Focus on Quality
• Step 1: Extract
• We extract SFs from Wikipedia labels, redirects, disambiguations, and anchor
texts of internal wiki-links
• Step 2: Evaluate
• We create a gold standard to evaluate the SFs quality
• Step 3: Filter
• We implement three filters to improve SFs quality
• Bonus: More SFs
• We extract SFs from anchor texts of Wiki links found in the Common Crawl
2014 corpus
• All datasets are available at
http://data.dws.informatik.uni-mannheim.de/dbpedia/nlp2014/
8Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
SFs Dataset Statistics
• LRD = Labels, Redirects, Disambiguations
• Extracted from DBpedia dumps
• WAT = Wikipedia Anchor Texts
• Extracted by a new DBpedia extractor (based on PageLinksExtractor)
9Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Gold Standard
• Manual annotation, 1 annotator, 2 subsets
• Popular subset: manually selected 34 popular entities of different types
• Denmark, Berlin, Apple Inc., Animal Farm, Michael Jackson, Star Wars, Diego
Maradona, Mars, etc.
• ~82 SFs per entity, linked from other Wiki pages 813,736 times
• Random subset: randomly selected 81 entities each having at least 5 SFs
• Andy_Zaltzman, Bell AH-1 SuperCobra, Biarritz, Castellum, Firefox (film), Kipchak
languages, ParisTech, Psychokinesis, etc.
• ~13 SFs per entity , linked from other Wiki pages 14,760 times
Available at http://data.dws.informatik.uni-mannheim.de/dbpedia/nlp2014/gold/
10Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Gold Standard
• Type of annotations
• correct (“the eternal city” for Rome),
• contained (“Google Japan” for Google), contains (“Turkey” for Istanbul),
• type of (“the city” for Rome)
• partial (“Diego” for Diego Maradona)
• related (“Google Blog” for Google)
• wrong (“during World War I” for United States)
11Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Evaluation: How many correct SFs?
• SFs extracted from labels, redirects, disambiguations
• correct, popular subset: 66.8%
• correct, random subset: 86.6%
• SFs extracted from Wikipedia anchor texts
• correct, popular subset: 38.5%
• correct, random subset: 70.7%
• Combined dataset
• correct, popular subset: 45.7%
• correct, random subset: 75%
12Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
(1) Filtering: String Patterns
• Data analysis  there are patterns wrong SFs follow
• URLs: contain .com or .net (“Berlin-china.net” for Berlin)
• of-phrases, with the exceptions for city of, state of, and the like (“Issues of
Toronto” for Toronto)
• in-phrases (“Historical sites in Berlin” for Berlin)
• and-phrases (“Tom Cruise and Katie Holmes” for Tom Cruise)
• list-of (“List of Toronto MPs and MPPs” for Toronto)
• Increase in precision
• popular subset: 1.33%
• popular subset, LRD only: 3.75%
• random subset: less than 1%
13Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
(2) Filtering: Wikidata
• Observation: some SFs are entities on their own in other languages
• E.g. “Neckarau” city area of Mannheim redirects to Mannheim in English
Wikipedia, but has its own page in German Wikipedia
• Implementation: use DBpedia- Wikidata dumps, released in May 2015
• Check whether a SF exactly matches or is close (Levenshtein distance) to any
of the labels of Wikidata entities that do not have English but have other
Wikipedia pages
• Increase in precision
• 0.5% compared to pattern-based filtering
• 1.5% for SF extracted only from LRD
14Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
(3) Filtering: Frequency Scores
• For SFs extracted from anchor texts, frequencies are available
 TF-IDF scores
• Determining the threshold: 1.0 .. 8.0 values with a step of 0.2 evaluated
•Two thresholds selected, highest values of F1: 1.8 and 2.6
•Threshold 0 (no filtering) used as baseline
• Increase in precision
•20% for popular subset, 10% for random subset
* Filtering done on the dataset to which pattern- and Wikidata-based filters are already applied
15Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
SFs from Common Crawl
• Common Crawl (CC) is the largest publicly available web corpus
• Extraction done on Winter 2014 CC Corpus, in the context of the Web
Data Commons project
• http://webdatacommons.org/ -- extracting and providing for public download
various types of structured data from CC
• Data required a lot of cleaning
• 3M SFs added to our LRD&WAT corpus
• No annotated gold standard: left for future work
• Available at
http://data.dws.informatik.uni-mannheim.de/dbpedia/nlp2014/lrd-cc/
16Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Conclusion and Future Work
• Main message
• quality of Wikipedia-base surface forms is often overlooked!
• Contributions
• Gold standard SFs, made available
• 3 filtering strategies: precision improved by > 20% for popular Wikipedia
entities, for > 10% for random entities
• Extracted SFs from Common Crawl corpus
• All data publicly available
• Future work directions
• Task-based evaluation of the resource, further work on the gold standard
17Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim

Más contenido relacionado

La actualidad más candente

RDA: thinking globally, acting globally
RDA: thinking globally, acting globallyRDA: thinking globally, acting globally
RDA: thinking globally, acting globally
Gordon Dunsire
 
OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?
Aidan Hogan
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparql
Dhavalkumar Thakker
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
Muhammad Saleem
 
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
NASIG
 

La actualidad más candente (20)

Another RDF Encoding Form
Another RDF Encoding FormAnother RDF Encoding Form
Another RDF Encoding Form
 
Contexts and Importing in RDF
Contexts and Importing in RDFContexts and Importing in RDF
Contexts and Importing in RDF
 
DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...
DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...
DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...
 
RDA: thinking globally, acting globally
RDA: thinking globally, acting globallyRDA: thinking globally, acting globally
RDA: thinking globally, acting globally
 
OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?
 
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
 
Memento 101
Memento 101Memento 101
Memento 101
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
 
FedX - Optimization Techniques for Federated Query Processing on Linked Data
FedX - Optimization Techniques for Federated Query Processing on Linked DataFedX - Optimization Techniques for Federated Query Processing on Linked Data
FedX - Optimization Techniques for Federated Query Processing on Linked Data
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparql
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
 
Linking the Open Data? by Petko Valtchev
Linking the Open Data? by Petko ValtchevLinking the Open Data? by Petko Valtchev
Linking the Open Data? by Petko Valtchev
 
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
 
Querying Linked Data on Android
Querying Linked Data on AndroidQuerying Linked Data on Android
Querying Linked Data on Android
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
 
Information-rich programming in F# (ML Workshop 2012)
Information-rich programming in F# (ML Workshop 2012)Information-rich programming in F# (ML Workshop 2012)
Information-rich programming in F# (ML Workshop 2012)
 

Destacado

Destacado (20)

DBpedia InsideOut
DBpedia InsideOutDBpedia InsideOut
DBpedia InsideOut
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
 
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsEvaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
Linked Data Fragments
Linked Data FragmentsLinked Data Fragments
Linked Data Fragments
 
NLP todo
NLP todoNLP todo
NLP todo
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked Data
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 
Applying Linked Open Data to Public Procurement
Applying Linked Open Data to Public ProcurementApplying Linked Open Data to Public Procurement
Applying Linked Open Data to Public Procurement
 
Exploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queriesExploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queries
 
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
 
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data CloudA Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
 
Unsupervised Extraction of Attributes and Their Values from Product Description
Unsupervised Extraction of Attributes and Their Values from Product DescriptionUnsupervised Extraction of Attributes and Their Values from Product Description
Unsupervised Extraction of Attributes and Their Values from Product Description
 
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and ExecutionFedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
 
RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031
 
Querying Linked Data with SPARQL
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQL
 
The Future is Federated
The Future is FederatedThe Future is Federated
The Future is Federated
 
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
 

Similar a Gathering Alternative Surface Forms for DBpedia Entities

DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
Anja Jentzsch
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
Anja Jentzsch
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly Resources
Robert Sanderson
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
Korea Sdec
 
Thesis Proposal: User Application Profiles for Publishing Linked Data in HTM...
Thesis Proposal: User Application Profiles for Publishing Linked Data in  HTM...Thesis Proposal: User Application Profiles for Publishing Linked Data in  HTM...
Thesis Proposal: User Application Profiles for Publishing Linked Data in HTM...
Sean Petiya
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
Sören Auer
 

Similar a Gathering Alternative Surface Forms for DBpedia Entities (20)

DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly Resources
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
New Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAMENew Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAME
 
Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending Influence
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
WikiAsp: A Dataset for Multi-domain Aspect-based SummarizationWikiAsp: A Dataset for Multi-domain Aspect-based Summarization
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Learning Conflict Resolution Strategies for Cross-Language Wikipedia Data Fusion
Learning Conflict Resolution Strategies for Cross-Language Wikipedia Data FusionLearning Conflict Resolution Strategies for Cross-Language Wikipedia Data Fusion
Learning Conflict Resolution Strategies for Cross-Language Wikipedia Data Fusion
 
Type Inference on Noisy RDF Data
Type Inference on Noisy RDF DataType Inference on Noisy RDF Data
Type Inference on Noisy RDF Data
 
Thesis Proposal: User Application Profiles for Publishing Linked Data in HTM...
Thesis Proposal: User Application Profiles for Publishing Linked Data in  HTM...Thesis Proposal: User Application Profiles for Publishing Linked Data in  HTM...
Thesis Proposal: User Application Profiles for Publishing Linked Data in HTM...
 
Schema.org - Extending Benefits
Schema.org - Extending BenefitsSchema.org - Extending Benefits
Schema.org - Extending Benefits
 
Machine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge DiscoveryMachine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge Discovery
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 
Linked Data Basics
Linked Data BasicsLinked Data Basics
Linked Data Basics
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
 

Más de Heiko Paulheim

Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Heiko Paulheim
 

Más de Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
 

Último

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Último (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Gathering Alternative Surface Forms for DBpedia Entities

  • 1. Gathering Alternative Surface Forms for DBpedia Entities Volha Bryl University of Mannheim, Germany  Springer Nature Christian Bizer, Heiko Paulheim University of Mannheim, Germany NLP & DBpedia @ ISWC 2015, Bethlehem, USA, October 11, 2015
  • 2. Why you need Surface Forms • Surface form (SF) of an entity is a collection of strings it can be referred as to: synonyms, alternatives names, etc. • Used to support many NLP tasks: co-reference resolution, entity linking, disambiguation 2Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 3. Why you need Surface Forms • Surface form (SF) of an entity is a collection of strings it can be referred as to: synonyms, alternatives names, etc. • Used to support many NLP tasks: co-reference resolution, entity linking, disambiguation “Billionaire Elon Musk has spelled out how he plans to create temporary suns over Mars in order to heat the Red Planet. Dismissing earlier comments that he intended to nuke the planet’s surface, he says he wants to create aerial explosions to heat it up. ” --- to link the three entities, your machine should know that red planet is an alternative name for Mars, and that Mars can be referred to just by its “type” – planet 3Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 4. Surface Forms from Wiki(DB)pedia • Some of Wikipedia’s (hence, DBpedia’s) crowd-sourced content look quite like surface forms • Page titles • Redirects • Account for alternative names, word forms (e.g. plurals), closely related words, abbreviations, alternative spellings, likely misspellings, subtopics • Disambiguation pages • There are 10+ Bethlehem’s in US, according to https://en.wikipedia.org/wiki/Bethlehem_(disambiguation) • Anchor texts of links between wiki pages Named after the Roman god of war, it is often referred to as the “Red Planet”... Source: Named after the [[Mars (mythology)|Roman god of war]], it is often referred to as the "Red Planet“ • …additionally, we use anchor texts of links from external pages to Wikipedia 4Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 5. Surface Forms from Wiki(DB)pedia • Not a new idea • BabelNet, DBpedia Spotlight, … [see our paper for more links] 5Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim Mars in BabelNet:
  • 6. Surface Forms from Wiki(DB)pedia • Not a new idea • BabelNet, DBpedia Spotlight, … [see our paper for more links] • Problem: Quality • …it is not only that quality is a problem, it is also that it have never been assessed or addressed • Reason 1: good quality of Wikipedia content is taken for granted • Reason 2: hopes are that NLP algorithms won’t be influenced by noise 6Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim Mars in BabelNet:
  • 7. Surface Forms from Wiki(DB)pedia • Not a new idea • BabelNet, DBpedia Spotlight, … [see our paper for more links] • Problem: Quality – Why? • By adding a redirect or an anchor text of internal Wikipedia link, a Wikipedia editor might mean not only same as or also known as, but also related to, contains, etc. • Both variants serve the purpose of pointing to the correct wiki page 7Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim Mars in BabelNet:
  • 8. Solution: Focus on Quality • Step 1: Extract • We extract SFs from Wikipedia labels, redirects, disambiguations, and anchor texts of internal wiki-links • Step 2: Evaluate • We create a gold standard to evaluate the SFs quality • Step 3: Filter • We implement three filters to improve SFs quality • Bonus: More SFs • We extract SFs from anchor texts of Wiki links found in the Common Crawl 2014 corpus • All datasets are available at http://data.dws.informatik.uni-mannheim.de/dbpedia/nlp2014/ 8Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 9. SFs Dataset Statistics • LRD = Labels, Redirects, Disambiguations • Extracted from DBpedia dumps • WAT = Wikipedia Anchor Texts • Extracted by a new DBpedia extractor (based on PageLinksExtractor) 9Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 10. Gold Standard • Manual annotation, 1 annotator, 2 subsets • Popular subset: manually selected 34 popular entities of different types • Denmark, Berlin, Apple Inc., Animal Farm, Michael Jackson, Star Wars, Diego Maradona, Mars, etc. • ~82 SFs per entity, linked from other Wiki pages 813,736 times • Random subset: randomly selected 81 entities each having at least 5 SFs • Andy_Zaltzman, Bell AH-1 SuperCobra, Biarritz, Castellum, Firefox (film), Kipchak languages, ParisTech, Psychokinesis, etc. • ~13 SFs per entity , linked from other Wiki pages 14,760 times Available at http://data.dws.informatik.uni-mannheim.de/dbpedia/nlp2014/gold/ 10Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 11. Gold Standard • Type of annotations • correct (“the eternal city” for Rome), • contained (“Google Japan” for Google), contains (“Turkey” for Istanbul), • type of (“the city” for Rome) • partial (“Diego” for Diego Maradona) • related (“Google Blog” for Google) • wrong (“during World War I” for United States) 11Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 12. Evaluation: How many correct SFs? • SFs extracted from labels, redirects, disambiguations • correct, popular subset: 66.8% • correct, random subset: 86.6% • SFs extracted from Wikipedia anchor texts • correct, popular subset: 38.5% • correct, random subset: 70.7% • Combined dataset • correct, popular subset: 45.7% • correct, random subset: 75% 12Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 13. (1) Filtering: String Patterns • Data analysis  there are patterns wrong SFs follow • URLs: contain .com or .net (“Berlin-china.net” for Berlin) • of-phrases, with the exceptions for city of, state of, and the like (“Issues of Toronto” for Toronto) • in-phrases (“Historical sites in Berlin” for Berlin) • and-phrases (“Tom Cruise and Katie Holmes” for Tom Cruise) • list-of (“List of Toronto MPs and MPPs” for Toronto) • Increase in precision • popular subset: 1.33% • popular subset, LRD only: 3.75% • random subset: less than 1% 13Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 14. (2) Filtering: Wikidata • Observation: some SFs are entities on their own in other languages • E.g. “Neckarau” city area of Mannheim redirects to Mannheim in English Wikipedia, but has its own page in German Wikipedia • Implementation: use DBpedia- Wikidata dumps, released in May 2015 • Check whether a SF exactly matches or is close (Levenshtein distance) to any of the labels of Wikidata entities that do not have English but have other Wikipedia pages • Increase in precision • 0.5% compared to pattern-based filtering • 1.5% for SF extracted only from LRD 14Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 15. (3) Filtering: Frequency Scores • For SFs extracted from anchor texts, frequencies are available  TF-IDF scores • Determining the threshold: 1.0 .. 8.0 values with a step of 0.2 evaluated •Two thresholds selected, highest values of F1: 1.8 and 2.6 •Threshold 0 (no filtering) used as baseline • Increase in precision •20% for popular subset, 10% for random subset * Filtering done on the dataset to which pattern- and Wikidata-based filters are already applied 15Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 16. SFs from Common Crawl • Common Crawl (CC) is the largest publicly available web corpus • Extraction done on Winter 2014 CC Corpus, in the context of the Web Data Commons project • http://webdatacommons.org/ -- extracting and providing for public download various types of structured data from CC • Data required a lot of cleaning • 3M SFs added to our LRD&WAT corpus • No annotated gold standard: left for future work • Available at http://data.dws.informatik.uni-mannheim.de/dbpedia/nlp2014/lrd-cc/ 16Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 17. Conclusion and Future Work • Main message • quality of Wikipedia-base surface forms is often overlooked! • Contributions • Gold standard SFs, made available • 3 filtering strategies: precision improved by > 20% for popular Wikipedia entities, for > 10% for random entities • Extracted SFs from Common Crawl corpus • All data publicly available • Future work directions • Task-based evaluation of the resource, further work on the gold standard 17Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim