SlideShare a Scribd company logo
1 of 20
Download to read offline
Julien Plu, Giuseppe Rizzo, Raphaël Troncy
{firstname.lastname}@eurecom.fr,
@julienplu, @giusepperizzo, @rtroncy
Revealing Entities From Texts
With a Hybrid Approach
On June 21th, I went to Paris to see the Eiffel Tower and
to enjoy the world music day.
§ Goal: link (or disambiguate) entity mentions one can
find in text to their corresponding entries in a
knowledge base (e.g. DBpedia)
db:Paris db:Eiffel_Towerdb:Fête_de_la_Musiquedb:June_21
What is Entity Linking?
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 2
§ Extract entities in diverse type of textual documents:
Ø newspaper article, encyclopaedia article,
micropost (tweet, status, photo caption), video subtitle, etc.
Ø deal with grammar free and short texts that have littlecontext
§ Adapt what can be extracted depending on
guidelines or challenges
Ø #Micropost2014 NEEL challenge: link entities that may belong to:
Person, Location, Organization, Function, Amount, Animal, Event, Product,
Time, and Thing(languages, ethnic groups, nationalities, religions, diseases,
sports and astronomical objects)
Ø OKE2015 challenge: extract and link entities that must belong to:
Person, Location ,Organization, and Role
Problems
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 3
Research Question
How do we adapt an entity linking system
to solve these problems?
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 4
§ Input and output to different formats:
Ø Input: plain text, NIF, Micropost2014 (pruning phase)
Ø Output: NIF, TAC (tsv format), Micropost2014 (tsv format with no offset)
§ Text is classified according to its provenance
§ Text is normalized if necessary
For microposts content, RT symbols (in case of tweets) and emoticons are removed
Text
microposts
newspaper article,
video subtitle,
encyclopaedia article,
...
Text
Normalization Entity
Extractor
Entity
Linking
index
Pruning
ADEL
ADEL Workflow
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 5
§ Multiple extractors can be used:
Ø Possibility to switch on and off an extractor in order to adapt the system
to some guidelines
Ø Extractors can be:
Funsupervised: Dictionary, Hashtag + Mention, Number Extractor
Fsupervised: Date Extractor, POS Tagger, NER System
§ Overlaps are resolved by choosing the longest extracted
mention
Date
Extractor
Number
Extractor
POS
Tagger
(NNP/NNPS)
Dictionary NER
System (Stanford)
….
Hashtag +
Mention
Extractor
Overlap Resolution
Date Extractor: June 21
June 21
Number extractor: 21
Entity Extractor
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 6
§ From DBpedia:
Ø PageRank
Ø Title
Ø Redirects, Disambiguation
§ From Wikipedia:
Ø Anchors
Ø Link references
For example, from the EN Wikipedia article about Xabi Alonso:
index
(Arsenal F.C., 1);(Mikel Arteta, 2);
(San Sebastián, 1);(Liverpool, 2);
(Everton F.C., 1)
Alonso and [[Arsenal F.C.|Arsenal]] player [[Mikel Arteta]]
were neighbours on the same street while growing up in
[[San Sebastián]] and also lived near each other in
[[Liverpool]]. Alonso convinced [[Mikel Arteta|Arteta]] to
transfer to [[Everton F.C.|Everton] after he told him how
happy he was living in [[Liverpool]]].
How is the index created?
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 7
§ Generate candidates from a fuzzy match to the index
§ Filter candidates:
Ø Filter out candidates that are not semantically related
to other entities from the same sentence
§ Score each candidate using a linear formula:
score(cand) = (a * L(m, cand) + b * max(L(m, R(cand))) + c * max(L(m, D(cand)))) * PR(cand)
L for Levenshtein distance, R for set of redirects, D for set of disambiguation and PR for PageRank
a, b and c are weights set with a > b > c and a + b + c = 1
Candidate
Generation
Candidate
Filtering
Scoring
mention
index
query
Entity Linking
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 8
Sentence: I went to Paris to see the Eiffel Tower.
§ Generate Candidates:
Ø Paris: db:Paris, db:Paris_Hilton, db:Paris,_Ontario, db:Notre_Dame_de_Paris
Ø Eiffel Tower: db:Eiffel_Tower, db:Eiffel_Tower_(Paris,_Tennessee)
§ Filter candidates:
Ø db:Paris, db:Paris_Hilton, db:Paris,_Ontario, db:Notre_Dame_de_Paris
Ø db:Eiffel_Tower, db:Eiffel_Tower_(Paris,_Tennessee)
§ Scoring:
Ø Score(db:Paris)= (a * L(“Paris”, “Paris”) + b * max(L(“Paris”, R(“Parisien”, “Paname”))) + c *
max(L(“Paris”, D(“Paris (disambiguation)”)))) * PR(db:Paris)
Ø Score(db:Notre_Dame_de_Paris)= (a * L(“Paris”, “Notre Dame de Paris”) + b * max(L(“Paris”, R(“Nôtre
Dame”, “Paris Cathedral”))) + c * max(L(“Paris”, D(“Notre Dame”, “Notre Dame de Paris
(disambiguation)”)))) * PR(db:Notre_Dame_de_Paris)
Entity Linking example
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 9
§ k-NN machine learning algorithm training process:
Ø Run the system on a training set
Ø Classify entities as true/false according to the training set Gold Standard
Ø Create a file with the features of each entities and their true/false classification
Ø Train k-NN with the previous file to get a model
§ Use 10 features for the training:
• Length in number of characters
• Extracted mention
• Title
• Type
• PageRank
• HITS
• Number of inLinks
• Number of outLinks
• Redirects number
• Linking score
Training set ADEL
Create file
of features
Train
k-NN
Pruning
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 10
§ Tweets dataset
Ø Training set: 2340 tweets
Ø Test set: 1165 tweets
§ Link entities that may belong to one of these ten
types:
Ø Person, Location, Organization, Function, Amount, Animal, Event,
Product, Time, and Thing (languages, ethnic groups, nationalities,
religions, diseases, sports and astronomical objects)
§ Twitter user name dereferencing
§ Disambiguate in DBpedia 3.9
#Micropost2014 NEEL challenge
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 11
Results on #Micropost2014
§ Results of ADEL with and without pruning
§ Results of other systems
Without pruning With pruning
Precision Recall F-measure Precision Recall F-measure
Extraction 69.17 72.51 70.8 70 41.62 52.2
Linking 47.39 45.23 46.29 48.21 26.74 34.4
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 12
E2E UTwente DataTXT ADEL AIDA Hyberabad SAP
F-measure 70.06 54.93 49.9 46.29 45.37 45.23 39.02
§ Sentences from Wikipedia
Ø Training set: 96 sentences
Ø Test set: 101 sentences
§ Extract and link entities that must belong to one of
these four types:
Ø Person, Location, Organization and Role
§ Must disambiguate co-references
§ Allow emerging entities (NIL)
§ Disambiguate in DBpedia 3.9
OKE2015 challenge
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 13
Results on OKE2015
§ Results of ADEL with and without pruning
§ Results of other systems
https://github.com/anuzzolese/oke-challenge
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 14
Without pruning With pruning
Precision Recall F-measure Precision Recall F-measure
Extraction 78.2 65.4 71.2 83.8 9.3 16.8
Recognition 65.8 54.8 59.8 75.7 8.4 15.1
Linking 49.4 46.6 48 57.9 6.2 11.1
ADEL FOX FRED
F-measure 60.75 49.88 34.73
#Micropost2015 NEEL challenge
§ Tweets dataset:
Ø Training set: 3498
Ø Development set: 500
Ø Test set: 2027
§ Extract and link entities that must belong to one of
these seven types:
Ø Person, Location, Organization, Character, Event, Product, and Thing
(languages, ethnic groups, nationalities, religions, diseases, sports and
astronomical objects)
§ Twitter user name dereferencing
§ Disambiguate in DBpedia 3.9 + NIL
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 15
Results on #Micropost2015
§ Results of ADEL without pruning
§ Results of other systems
Ø Strong type mention match
Ø Strong link match (not considering the type correctness)
Precision Recall F-measure
Extraction 68.4 75.2 71.6
Recognition 62.8 45.5 52.8
Linking 48.8 47.1 47.9
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 16
ousia ADEL uva acubelab uniba ualberta cen_neel
F-measure 80.7 52.8 41.2 38.8 36.7 32.9 0
ousia acubelab ADEL uniba ualberta uva cen_neel
F-measure 76.2 52.3 47.9 46.4 41.5 31.6 0
Error Analysis
§ Issue for the extraction:
Ø “FB is a prime number.”
FFB stands for 251 in hexadecimal and will be extracted as Facebook acronym
by the wrong extractor
§ Issue for the filtering:
Ø “The series of HP books have been sold million times in France.”
FNo relation in Wikipedia between Harry Potter and France. Then no filtering is
applied.
§ Issue for the scoring:
Ø “The Spanish football player Alonso played twice for the national team
between 1954 and 1960.”
FXabi Alonso will be selected instead of Juan Alonso because of the PageRank.
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 17
§ Our system gives the possibility to adapt the entity
linking task to different kind of text
§ Our system gives the possibility to adapt the type of
extracted entities
§ Results are similar regardless of the kind of text
§ Performance at extraction stage similar to top
state-of-the-art systems (or slightly better)
§ Big drop of performance at linking stage mainly due
to an unsupervised approach
Conclusion
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 18
§ Add more adaptive features: language, knowledge base
§ Improve linking by using a graph-based algorithm:
Ø finding the common entities linked to each of the extracted entities
Ø example: “Rafael Nadal is a friend of Alonso” . There is no existing direct link
between Rafael Nadal and Alonso in DBpedia (or Wikipedia) but they have the
entity Spain in common
§ Improve pruning by:
Ø adding additional features:
Frelatedness: compute the relation score between one entity and all the others in the
text. If there are more than two, compute the average
FPOS tag of the previous and the next token in the sentence
Ø using other algorithms:
FEnsemble Learning
FUnsupervised Feature Learning + Deep Learning
Future Work
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 19
2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 20
http://www.slideshare.net/julienplu
http://xkcd.com/1319/

More Related Content

What's hot

Belfiore gsp-dpla-theme-final
Belfiore gsp-dpla-theme-finalBelfiore gsp-dpla-theme-final
Belfiore gsp-dpla-theme-finalDoreva Belfiore
 
SPARQL and the Open Linked Data initiative
SPARQL and the Open Linked Data initiativeSPARQL and the Open Linked Data initiative
SPARQL and the Open Linked Data initiativeFulvio Corno
 
The Web, one huge database ...
The Web, one huge database ...The Web, one huge database ...
The Web, one huge database ...Michael Hausenblas
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataBoris Villazón-Terrazas
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Biblissima's prototype on Medieval Manuscripts Illuminations and their Context
Biblissima's prototype on Medieval Manuscripts Illuminations and their ContextBiblissima's prototype on Medieval Manuscripts Illuminations and their Context
Biblissima's prototype on Medieval Manuscripts Illuminations and their ContextEquipex Biblissima
 
SPARQL queries on CIDOC-CRM data of BritishMuseum
SPARQL queries on CIDOC-CRM data of BritishMuseumSPARQL queries on CIDOC-CRM data of BritishMuseum
SPARQL queries on CIDOC-CRM data of BritishMuseumThomas Francart
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes senseFabien Gandon
 
Multiplicity and Publishing in Open Annotation (tutorial)
Multiplicity and Publishing in Open Annotation (tutorial)Multiplicity and Publishing in Open Annotation (tutorial)
Multiplicity and Publishing in Open Annotation (tutorial)Robert Sanderson
 
OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?Aidan Hogan
 
Using the Structure of DBpedia for Exploratory Search
Using the Structure of DBpedia for Exploratory SearchUsing the Structure of DBpedia for Exploratory Search
Using the Structure of DBpedia for Exploratory SearchSamantha Lam
 

What's hot (12)

Stack & Queue
Stack & QueueStack & Queue
Stack & Queue
 
Belfiore gsp-dpla-theme-final
Belfiore gsp-dpla-theme-finalBelfiore gsp-dpla-theme-final
Belfiore gsp-dpla-theme-final
 
SPARQL and the Open Linked Data initiative
SPARQL and the Open Linked Data initiativeSPARQL and the Open Linked Data initiative
SPARQL and the Open Linked Data initiative
 
The Web, one huge database ...
The Web, one huge database ...The Web, one huge database ...
The Web, one huge database ...
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Biblissima's prototype on Medieval Manuscripts Illuminations and their Context
Biblissima's prototype on Medieval Manuscripts Illuminations and their ContextBiblissima's prototype on Medieval Manuscripts Illuminations and their Context
Biblissima's prototype on Medieval Manuscripts Illuminations and their Context
 
SPARQL queries on CIDOC-CRM data of BritishMuseum
SPARQL queries on CIDOC-CRM data of BritishMuseumSPARQL queries on CIDOC-CRM data of BritishMuseum
SPARQL queries on CIDOC-CRM data of BritishMuseum
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes sense
 
Multiplicity and Publishing in Open Annotation (tutorial)
Multiplicity and Publishing in Open Annotation (tutorial)Multiplicity and Publishing in Open Annotation (tutorial)
Multiplicity and Publishing in Open Annotation (tutorial)
 
OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?
 
Using the Structure of DBpedia for Exploratory Search
Using the Structure of DBpedia for Exploratory SearchUsing the Structure of DBpedia for Exploratory Search
Using the Structure of DBpedia for Exploratory Search
 

Viewers also liked

Presentasimanager100
Presentasimanager100Presentasimanager100
Presentasimanager100toniski
 
Cuadro evaluación UD
Cuadro evaluación UD Cuadro evaluación UD
Cuadro evaluación UD Jen Rico
 
Sulkavan yhtenäiskoulun Meidän Saimaa, ennen nyt ja tulevaisuudessa. Projekti...
Sulkavan yhtenäiskoulun Meidän Saimaa, ennen nyt ja tulevaisuudessa. Projekti...Sulkavan yhtenäiskoulun Meidän Saimaa, ennen nyt ja tulevaisuudessa. Projekti...
Sulkavan yhtenäiskoulun Meidän Saimaa, ennen nyt ja tulevaisuudessa. Projekti...Anu Liljeström
 
Ресторан здорового питания Рецептор
Ресторан здорового питания РецепторРесторан здорового питания Рецептор
Ресторан здорового питания РецепторSTARTUP WOMEN
 
¿Sabemos lo que comemos?
¿Sabemos lo que comemos?¿Sabemos lo que comemos?
¿Sabemos lo que comemos?Jen Rico
 
Sabemos lo que comemos
Sabemos lo que comemosSabemos lo que comemos
Sabemos lo que comemosproplayers
 
Genre differentiation
Genre differentiationGenre differentiation
Genre differentiationToneEgan
 
Do you know how fast you are going? Agile Tour London 2015
Do you know how fast you are going? Agile Tour London 2015Do you know how fast you are going? Agile Tour London 2015
Do you know how fast you are going? Agile Tour London 2015Douglas Talbot
 
Proyecto: ¿Sabemos lo que comemos?
Proyecto: ¿Sabemos lo que comemos?Proyecto: ¿Sabemos lo que comemos?
Proyecto: ¿Sabemos lo que comemos?Jen Rico
 
How to building WEKA model and automatic test by command line
How to building WEKA model and automatic test by command lineHow to building WEKA model and automatic test by command line
How to building WEKA model and automatic test by command linePhate334
 
Careers in ict
Careers in ictCareers in ict
Careers in ictdeepak5007
 
Nuclear waste and its management
Nuclear waste and its managementNuclear waste and its management
Nuclear waste and its managementsagarpandey1996
 

Viewers also liked (19)

Publish de mil y vero
Publish de mil y veroPublish de mil y vero
Publish de mil y vero
 
Presentasimanager100
Presentasimanager100Presentasimanager100
Presentasimanager100
 
Cuadro evaluación UD
Cuadro evaluación UD Cuadro evaluación UD
Cuadro evaluación UD
 
Untitled-1
Untitled-1Untitled-1
Untitled-1
 
Sulkavan yhtenäiskoulun Meidän Saimaa, ennen nyt ja tulevaisuudessa. Projekti...
Sulkavan yhtenäiskoulun Meidän Saimaa, ennen nyt ja tulevaisuudessa. Projekti...Sulkavan yhtenäiskoulun Meidän Saimaa, ennen nyt ja tulevaisuudessa. Projekti...
Sulkavan yhtenäiskoulun Meidän Saimaa, ennen nyt ja tulevaisuudessa. Projekti...
 
Ресторан здорового питания Рецептор
Ресторан здорового питания РецепторРесторан здорового питания Рецептор
Ресторан здорового питания Рецептор
 
¿Sabemos lo que comemos?
¿Sabemos lo que comemos?¿Sabemos lo que comemos?
¿Sabemos lo que comemos?
 
Recomendation
RecomendationRecomendation
Recomendation
 
LITHUANIA_EN_v5
LITHUANIA_EN_v5LITHUANIA_EN_v5
LITHUANIA_EN_v5
 
Sabemos lo que comemos
Sabemos lo que comemosSabemos lo que comemos
Sabemos lo que comemos
 
Genre differentiation
Genre differentiationGenre differentiation
Genre differentiation
 
Geelong ict
Geelong ictGeelong ict
Geelong ict
 
Do you know how fast you are going? Agile Tour London 2015
Do you know how fast you are going? Agile Tour London 2015Do you know how fast you are going? Agile Tour London 2015
Do you know how fast you are going? Agile Tour London 2015
 
Proyecto: ¿Sabemos lo que comemos?
Proyecto: ¿Sabemos lo que comemos?Proyecto: ¿Sabemos lo que comemos?
Proyecto: ¿Sabemos lo que comemos?
 
How to building WEKA model and automatic test by command line
How to building WEKA model and automatic test by command lineHow to building WEKA model and automatic test by command line
How to building WEKA model and automatic test by command line
 
CIP project detail summary - jan. 2017
CIP project detail summary - jan. 2017CIP project detail summary - jan. 2017
CIP project detail summary - jan. 2017
 
Careers in ict
Careers in ictCareers in ict
Careers in ict
 
Nuclear waste and its management
Nuclear waste and its managementNuclear waste and its management
Nuclear waste and its management
 
Mh january 2016
Mh january 2016Mh january 2016
Mh january 2016
 

Similar to Revealing Entities From Texts With a Hybrid Approach

Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Julien PLU
 
Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Julien PLU
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Artificial Intelligence Institute at UofSC
 
Enhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER ModelsEnhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER ModelsJulien PLU
 
NELL: The Never-Ending Language Learning System
NELL: The Never-Ending Language Learning SystemNELL: The Never-Ending Language Learning System
NELL: The Never-Ending Language Learning SystemEstevam Hruschka
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
 
Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)Péter Király
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisStuart Wrigley
 
Eprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, MexicoEprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, MexicoEduserv Foundation
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationAn Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationEnrico Palumbo
 
Perspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from textPerspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from textJennifer D'Souza
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Julien PLU
 
httpwww.youtube.comwatchv=tdKiP41L0Echttpwww.youtube.com.docx
httpwww.youtube.comwatchv=tdKiP41L0Echttpwww.youtube.com.docxhttpwww.youtube.comwatchv=tdKiP41L0Echttpwww.youtube.com.docx
httpwww.youtube.comwatchv=tdKiP41L0Echttpwww.youtube.com.docxsheronlewthwaite
 
Open Data Mashups: linking fragments into mosaics
Open Data Mashups: linking fragments into mosaicsOpen Data Mashups: linking fragments into mosaics
Open Data Mashups: linking fragments into mosaicsphduchesne
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligencekrisztianbalog
 
NLP in Practice - Part I
NLP in Practice - Part INLP in Practice - Part I
NLP in Practice - Part IDelip Rao
 
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural LanguagesData Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural LanguagesIan Huston
 

Similar to Revealing Entities From Texts With a Hybrid Approach (20)

Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
 
Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
 
Enhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER ModelsEnhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER Models
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
NELL: The Never-Ending Language Learning System
NELL: The Never-Ending Language Learning SystemNELL: The Never-Ending Language Learning System
NELL: The Never-Ending Language Learning System
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
 
Eprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, MexicoEprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, Mexico
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationAn Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
 
Perspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from textPerspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from text
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
 
httpwww.youtube.comwatchv=tdKiP41L0Echttpwww.youtube.com.docx
httpwww.youtube.comwatchv=tdKiP41L0Echttpwww.youtube.com.docxhttpwww.youtube.comwatchv=tdKiP41L0Echttpwww.youtube.com.docx
httpwww.youtube.comwatchv=tdKiP41L0Echttpwww.youtube.com.docx
 
Open Data Mashups: linking fragments into mosaics
Open Data Mashups: linking fragments into mosaicsOpen Data Mashups: linking fragments into mosaics
Open Data Mashups: linking fragments into mosaics
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
NLP in Practice - Part I
NLP in Practice - Part INLP in Practice - Part I
NLP in Practice - Part I
 
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural LanguagesData Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
 

Recently uploaded

How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 

Recently uploaded (17)

How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 

Revealing Entities From Texts With a Hybrid Approach

  • 1. Julien Plu, Giuseppe Rizzo, Raphaël Troncy {firstname.lastname}@eurecom.fr, @julienplu, @giusepperizzo, @rtroncy Revealing Entities From Texts With a Hybrid Approach
  • 2. On June 21th, I went to Paris to see the Eiffel Tower and to enjoy the world music day. § Goal: link (or disambiguate) entity mentions one can find in text to their corresponding entries in a knowledge base (e.g. DBpedia) db:Paris db:Eiffel_Towerdb:Fête_de_la_Musiquedb:June_21 What is Entity Linking? 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 2
  • 3. § Extract entities in diverse type of textual documents: Ø newspaper article, encyclopaedia article, micropost (tweet, status, photo caption), video subtitle, etc. Ø deal with grammar free and short texts that have littlecontext § Adapt what can be extracted depending on guidelines or challenges Ø #Micropost2014 NEEL challenge: link entities that may belong to: Person, Location, Organization, Function, Amount, Animal, Event, Product, Time, and Thing(languages, ethnic groups, nationalities, religions, diseases, sports and astronomical objects) Ø OKE2015 challenge: extract and link entities that must belong to: Person, Location ,Organization, and Role Problems 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 3
  • 4. Research Question How do we adapt an entity linking system to solve these problems? 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 4
  • 5. § Input and output to different formats: Ø Input: plain text, NIF, Micropost2014 (pruning phase) Ø Output: NIF, TAC (tsv format), Micropost2014 (tsv format with no offset) § Text is classified according to its provenance § Text is normalized if necessary For microposts content, RT symbols (in case of tweets) and emoticons are removed Text microposts newspaper article, video subtitle, encyclopaedia article, ... Text Normalization Entity Extractor Entity Linking index Pruning ADEL ADEL Workflow 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 5
  • 6. § Multiple extractors can be used: Ø Possibility to switch on and off an extractor in order to adapt the system to some guidelines Ø Extractors can be: Funsupervised: Dictionary, Hashtag + Mention, Number Extractor Fsupervised: Date Extractor, POS Tagger, NER System § Overlaps are resolved by choosing the longest extracted mention Date Extractor Number Extractor POS Tagger (NNP/NNPS) Dictionary NER System (Stanford) …. Hashtag + Mention Extractor Overlap Resolution Date Extractor: June 21 June 21 Number extractor: 21 Entity Extractor 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 6
  • 7. § From DBpedia: Ø PageRank Ø Title Ø Redirects, Disambiguation § From Wikipedia: Ø Anchors Ø Link references For example, from the EN Wikipedia article about Xabi Alonso: index (Arsenal F.C., 1);(Mikel Arteta, 2); (San Sebastián, 1);(Liverpool, 2); (Everton F.C., 1) Alonso and [[Arsenal F.C.|Arsenal]] player [[Mikel Arteta]] were neighbours on the same street while growing up in [[San Sebastián]] and also lived near each other in [[Liverpool]]. Alonso convinced [[Mikel Arteta|Arteta]] to transfer to [[Everton F.C.|Everton] after he told him how happy he was living in [[Liverpool]]]. How is the index created? 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 7
  • 8. § Generate candidates from a fuzzy match to the index § Filter candidates: Ø Filter out candidates that are not semantically related to other entities from the same sentence § Score each candidate using a linear formula: score(cand) = (a * L(m, cand) + b * max(L(m, R(cand))) + c * max(L(m, D(cand)))) * PR(cand) L for Levenshtein distance, R for set of redirects, D for set of disambiguation and PR for PageRank a, b and c are weights set with a > b > c and a + b + c = 1 Candidate Generation Candidate Filtering Scoring mention index query Entity Linking 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 8
  • 9. Sentence: I went to Paris to see the Eiffel Tower. § Generate Candidates: Ø Paris: db:Paris, db:Paris_Hilton, db:Paris,_Ontario, db:Notre_Dame_de_Paris Ø Eiffel Tower: db:Eiffel_Tower, db:Eiffel_Tower_(Paris,_Tennessee) § Filter candidates: Ø db:Paris, db:Paris_Hilton, db:Paris,_Ontario, db:Notre_Dame_de_Paris Ø db:Eiffel_Tower, db:Eiffel_Tower_(Paris,_Tennessee) § Scoring: Ø Score(db:Paris)= (a * L(“Paris”, “Paris”) + b * max(L(“Paris”, R(“Parisien”, “Paname”))) + c * max(L(“Paris”, D(“Paris (disambiguation)”)))) * PR(db:Paris) Ø Score(db:Notre_Dame_de_Paris)= (a * L(“Paris”, “Notre Dame de Paris”) + b * max(L(“Paris”, R(“Nôtre Dame”, “Paris Cathedral”))) + c * max(L(“Paris”, D(“Notre Dame”, “Notre Dame de Paris (disambiguation)”)))) * PR(db:Notre_Dame_de_Paris) Entity Linking example 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 9
  • 10. § k-NN machine learning algorithm training process: Ø Run the system on a training set Ø Classify entities as true/false according to the training set Gold Standard Ø Create a file with the features of each entities and their true/false classification Ø Train k-NN with the previous file to get a model § Use 10 features for the training: • Length in number of characters • Extracted mention • Title • Type • PageRank • HITS • Number of inLinks • Number of outLinks • Redirects number • Linking score Training set ADEL Create file of features Train k-NN Pruning 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 10
  • 11. § Tweets dataset Ø Training set: 2340 tweets Ø Test set: 1165 tweets § Link entities that may belong to one of these ten types: Ø Person, Location, Organization, Function, Amount, Animal, Event, Product, Time, and Thing (languages, ethnic groups, nationalities, religions, diseases, sports and astronomical objects) § Twitter user name dereferencing § Disambiguate in DBpedia 3.9 #Micropost2014 NEEL challenge 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 11
  • 12. Results on #Micropost2014 § Results of ADEL with and without pruning § Results of other systems Without pruning With pruning Precision Recall F-measure Precision Recall F-measure Extraction 69.17 72.51 70.8 70 41.62 52.2 Linking 47.39 45.23 46.29 48.21 26.74 34.4 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 12 E2E UTwente DataTXT ADEL AIDA Hyberabad SAP F-measure 70.06 54.93 49.9 46.29 45.37 45.23 39.02
  • 13. § Sentences from Wikipedia Ø Training set: 96 sentences Ø Test set: 101 sentences § Extract and link entities that must belong to one of these four types: Ø Person, Location, Organization and Role § Must disambiguate co-references § Allow emerging entities (NIL) § Disambiguate in DBpedia 3.9 OKE2015 challenge 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 13
  • 14. Results on OKE2015 § Results of ADEL with and without pruning § Results of other systems https://github.com/anuzzolese/oke-challenge 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 14 Without pruning With pruning Precision Recall F-measure Precision Recall F-measure Extraction 78.2 65.4 71.2 83.8 9.3 16.8 Recognition 65.8 54.8 59.8 75.7 8.4 15.1 Linking 49.4 46.6 48 57.9 6.2 11.1 ADEL FOX FRED F-measure 60.75 49.88 34.73
  • 15. #Micropost2015 NEEL challenge § Tweets dataset: Ø Training set: 3498 Ø Development set: 500 Ø Test set: 2027 § Extract and link entities that must belong to one of these seven types: Ø Person, Location, Organization, Character, Event, Product, and Thing (languages, ethnic groups, nationalities, religions, diseases, sports and astronomical objects) § Twitter user name dereferencing § Disambiguate in DBpedia 3.9 + NIL 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 15
  • 16. Results on #Micropost2015 § Results of ADEL without pruning § Results of other systems Ø Strong type mention match Ø Strong link match (not considering the type correctness) Precision Recall F-measure Extraction 68.4 75.2 71.6 Recognition 62.8 45.5 52.8 Linking 48.8 47.1 47.9 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 16 ousia ADEL uva acubelab uniba ualberta cen_neel F-measure 80.7 52.8 41.2 38.8 36.7 32.9 0 ousia acubelab ADEL uniba ualberta uva cen_neel F-measure 76.2 52.3 47.9 46.4 41.5 31.6 0
  • 17. Error Analysis § Issue for the extraction: Ø “FB is a prime number.” FFB stands for 251 in hexadecimal and will be extracted as Facebook acronym by the wrong extractor § Issue for the filtering: Ø “The series of HP books have been sold million times in France.” FNo relation in Wikipedia between Harry Potter and France. Then no filtering is applied. § Issue for the scoring: Ø “The Spanish football player Alonso played twice for the national team between 1954 and 1960.” FXabi Alonso will be selected instead of Juan Alonso because of the PageRank. 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 17
  • 18. § Our system gives the possibility to adapt the entity linking task to different kind of text § Our system gives the possibility to adapt the type of extracted entities § Results are similar regardless of the kind of text § Performance at extraction stage similar to top state-of-the-art systems (or slightly better) § Big drop of performance at linking stage mainly due to an unsupervised approach Conclusion 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 18
  • 19. § Add more adaptive features: language, knowledge base § Improve linking by using a graph-based algorithm: Ø finding the common entities linked to each of the extracted entities Ø example: “Rafael Nadal is a friend of Alonso” . There is no existing direct link between Rafael Nadal and Alonso in DBpedia (or Wikipedia) but they have the entity Spain in common § Improve pruning by: Ø adding additional features: Frelatedness: compute the relation score between one entity and all the others in the text. If there are more than two, compute the average FPOS tag of the previous and the next token in the sentence Ø using other algorithms: FEnsemble Learning FUnsupervised Feature Learning + Deep Learning Future Work 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 19
  • 20. 2015/10/11 - 3rd NLP & DBpedia International Workshop – Bethlehem, Pennsylvania, USA - 20 http://www.slideshare.net/julienplu http://xkcd.com/1319/