Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Collective entity linking with WSRM DocEng'19
1. 1/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Using Knowledge Base Semantics in
Context-Aware Entity Linking
Cheikh Brahim El Vaigh, François Goasdoué, Guillaume Gravier
and Pascale Sébillot
DocEng ’19, September 23–26, 2019, Berlin, Germany
2. 2/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Context
Exploring large archive of a regional newspaper efficiently
iCODA1 :
Building an unified graph (RDF KB) with all data sources
Providing human friendly visualization for journalists
Bridging content and data : Linking content to the RDF
Knowledge Base
1
https://project.inria.fr/icoda/
3. 3/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
RDF Knowledge Bases (KB)?
Specification of RDF graphs with triples :
(s, p, o) ∈ (U ∪ B)xUx(U ∪ L ∪ B) s
p
−→ o
RDF triples for facts and knowledge
RDF fact Triple notation
Class assertion (s, τ, o)
Property assertion (s, p, o) with p ∈ {τ, subC,
subP, d, r}
RDF knowledge Triple notation
Subclass (s, subC, o)
Subproperty (s, subP, o)
Domain typing (s, d, o)
Range typing (s, r, o)
6. 6/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Standard Entity Linking pipeline
Definition (Entity Linking)
Identifying the entities of a reference knowledge base (KB) that are
mentioned in textual documents
7. 6/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Standard Entity Linking pipeline
Definition (Entity Linking)
Identifying the entities of a reference knowledge base (KB) that are
mentioned in textual documents
Standard pipeline :
Named Entity Recognition (NER)
8. 6/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Standard Entity Linking pipeline
Definition (Entity Linking)
Identifying the entities of a reference knowledge base (KB) that are
mentioned in textual documents
Standard pipeline :
Named Entity Recognition (NER)
Candidate Entity Generation
9. 6/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Standard Entity Linking pipeline
Definition (Entity Linking)
Identifying the entities of a reference knowledge base (KB) that are
mentioned in textual documents
Standard pipeline :
Named Entity Recognition (NER)
Candidate Entity Generation
Candidate Entity Ranking
10. 7/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Challenges
Mentions ambiguity :
Names variants : Jobs; Steve Jobs; Steven Paul Jobs
Indirect mentions : the CEO of Apple, Steve...
Less popular entities : Ploulec’h(Brittany)
11. 8/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Limits of Entity Linking Techniques
Disregarding mentions context : entity-by-entity linking
Leveraging mostly Wikipedia : hyperlink graphs
Limited use of RDF KBs : binary indicator
12. 9/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Collective Entity Linking
Definition (Collective Entity Linking)
Identifying all the entities of a reference knowledge base (KB) that
are mentioned in textual documents at once.
n : number of mention
mi : ith mention in the text
ei : a candidate entity for mi
(ˆe1, ..., ˆen)= arg max
e1,...,n
(
n
i=1
φ(mi |ei )
mention/entity similarity
+
n
i=1
n
j=1;j=i
ψ(ei |ej )
collective coherence score
)
CEL = φ() + ψ() + Optimisation
Optimisation = Graph search or Learning to rank
13. 10/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Candidate Entities Generation + Local score
Candidate Entities Generation :
Dictionary : Cross-Wiki
cross-wiki["stevejobs"] => [[’7412236’, 0.99],[’5042765’,0.01]]
Search engine : Wikipedia search
Local score (φ) :
Cosine similarity : cosine(Vmention, Vcandidate entity)
Wikipedia popularity(mention, candidate entity)
pop(m, e) =
n(m, e)
e ∈W
n(m, e )
(1)
n(m, e) = number of time m occurs as anchor of e
14. 11/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Word2vec computes φ()
Learning word representation based on its context.
Two models
Continuous Bag-of-Words (CBOW) : predicting word from
its context
Skip-Gram : predicting for a given word, its context
Example
dataset : the cat sits on the mat.
half-window : 1
CBOW :
([the,sits],cat),([cat,on],sits),([sits,the],on),([on,mat],the)
Skip-Gram :
(cat,[the,sits]),(sits,[cat,on]),(on,[sits,the]),(the,[on,mat])
Reflects semantic proximity
Computes cosine similarity
15. 12/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
WSRM : Weighted Semantic Relatedness Measure
Contribution : WSRM
Generic definition over RDF KBs
Take advantages of RDF KBs semantics
More relations = strong similarity
WSRM(ei , ej ) =
n(ei , ej )
e ∈E
n(ei , e )
(2)
ψ(ei , ej ) =
1
2
(WSRM(ei , ej ) + WSRM(ej , ei )) . (3)
Pre-computed from the RDF KB
Similar to Wikipedia popularity Eq.1
17. 14/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Local classifications : problem solved
Learning a matching mention/candidate entity (0 or 1)
Training logistic regression with 6 features : cos, pop, sum,
max@1, max@2, max@3
sum(eij ; m1, .., mn) =
n
l=1,k=i e∈C(ml )
ψ(eij , e) , (4)
maxk(eij ; m1, .., mn) =
n
max @k
l=1,j=i
max
e ∈C(ml )
ψ(ei , e) (5)
Global optimization (argmax) ⇔ local classifications
Rank with Posterior probability
18. 15/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Datasets
AIDA [4] : entity annotated corpus of Reuters news
documents
Reuters128 [9] : Economic news articles
RSS500 [9] : RSS feeds including all major worldwide
newspapers
TAC-KBP 2016-2017 datasets [5, 6] : Newswire and
forum-discussion documents
Dataset Nb. docs Nb. mentions Avg nb. mentions/doc
TAC-KBP 2016 eval 169 9231 54.6
TAC-KBP 2017 eval 167 6915 41.4
AIDA-train 846 18519 21.9
AIDA-valid 216 4784 22.1
AIDA-test 231 4479 19.4
Reuters128 128 881 6.9
RSS-500 500 1000 2
Table: Statistics on the used datasets.
19. 16/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Results (1/2) : Features Study
Features F1-score
popularity 72.3
popularity + cosine 72.9
popularity + cosine + sum 73.2
popularity + cosine + max1,2,3 75.7
popularity + cosine + sum + max1,2,3 75.9
Table: Linking accuracy (F1 score) on the TAC KBP-2017 dataset
Collective coherence improves Entity-by-Entity linking
SUM and MAX complementary
Local classification successfully aggregate local and coherence
scores
21. 18/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Conclusion and perspectives
CEL without Wikipedia
Improving CEL with WSRM = Binary indicator
Opening the door to semantic reasoning : paths of size
m, m > 1
22. 18/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Yixin Cao, Lei Hou, Juanzi Li, and Zhiyuan Liu.
Neural collective entity linking.
In Proceedings of the 27th International Conference on
Computational Linguistics, pages 675–686, 2018.
Matthew Francis-Landau, Greg Durrett, and Dan Klein.
Capturing semantic similarity for entity linking with
convolutional neural networks.
In Proceedings of the 15th Annual Conference of the North
American Chapter of the Association for Computational
Linguistics: Human Language Technologies, pages 1256–1261,
2016.
Octavian-Eugen Ganea, Marina Ganea, Aurelien Lucchi,
Carsten Eickhoff, and Thomas Hofmann.
Probabilistic bag-of-hyperlinks model for entity linking.
In Proceedings of the 25th International Conference on World
Wide Web, pages 927–938, 2016.
23. 18/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen
Fürstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva,
Stefan Thater, and Gerhard Weikum.
Robust disambiguation of named entities in text.
In Proceedings of the 2011 Conference on Empirical Methods
in Natural Language Processing, pages 782–792, 2011.
Heng Ji and Nothman.
Overview of tac-kbp2016 tri-lingual edl and its impact on
end-to-end cold-start kbp.
Proceedings of the 2016 Text Analysis Conference, 2016.
Heng Ji, Xiaoman Pan, Boliang Zhang, Joel Nothman, James
Mayfield, Paul McNamee, and Cash Costello.
Overview of tac-kbp2017 13 languages entity discovery and
linking.
In Proceedings of the 2017 Text Analysis Conference, 2017.
Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas
Hofmann.
24. 18/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
End-to-end neural entity linking.
In Proceedings of the 22nd Conference on Computational
Natural Language Learning, pages 519–529, 2018.
Andrea Moro, Alessandro Raganato, and Roberto Navigli.
Entity linking meets word sense disambiguation: a unified
approach.
Transactions of the Association for Computational Linguistics,
2:231–244, 2014.
Michael Röder, Ricardo Usbeck, Sebastian Hellmann, Daniel
Gerber, and Andreas Both.
N3- a collection of datasets for named entity recognition and
disambiguation in the nlp interchange format.
In Proceedings of the 9th International Conference on
Language Resources and Evaluation, pages 3529–3533, 2014.
Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Michael Röder,
Daniel Gerber, Sandro Athaide Coelho, Sören Auer, and
Andreas Both.
25. 18/18
Introduction Collective Entity Linking Conclusion and perspectives Bibliographie
Agdistis - graph-based disambiguation of named entities using
linked data.
In Proceedings of the International Semantic Web Conference,
pages 457–471, 2014.