Presented at WWW'15, Florence.
Gong Cheng, Danyun Xu, Yuzhong Qu. Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking. In Proceedings of the 24th International World Wide Web Conference (WWW), pages 184--194, 2015.
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking
1. Summarizing Entity Descriptions for
Effective and Efficient
Human-centered Entity Linking
Gong Cheng, Danyun Xu, Yuzhong Qu
Websoft Research Group
State Key Laboratory for Novel Software Technology
Nanjing University, China
2. Entity Linking (EL)
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
3. Human-centered EL is needed
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
• for defining gold standard,
• for crowdsourced EL.
4. entity description:
set of property-value pairs (called features)
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
6. Short, extractive summaries are
adequate for human-centered EL.
Apple (Inc.)
- type: Company
- product: iPhone 5
Apple (Corps)
- type: Company
- product: Let It Be
Apple (Fruit)
- type: Fruit
summary of k candidate entity descriptions: k subsets of features (subject to a length limit)
?… Apple
7. Short, extractive summaries are
adequate for human-centered EL.
Apple (Inc.)
- type: Company
- product: iPhone 5
Apple (Corps)
- type: Company
- product: Let It Be
Apple (Fruit)
- type: Fruit
?… Apple
summarizing entity descriptions combinatorial optimization
summary of k candidate entity descriptions: k subsets of features (subject to a length limit)
8. Optimization goal (1)
+characterizing power, -information overlap
• Characterizing power of a feature (ch)
ch(type: IT company) < ch(product: iPhone 5)
Apple (Inc.)
Samsung
Electronics
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
9. Optimization goal (1)
+characterizing power, -information overlap
• Characterizing power of a feature (ch)
ch(type: IT company) < ch(product: iPhone 5)
Apple (Inc.)
Samsung
Electronics
𝑐ℎ 𝑓 = − log
number of entities having 𝑓
number of all entities
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
10. Optimization goal (1)
+characterizing power, -information overlap
• Information overlap between features (ov)
a) logical inference
entailment = maximized ov
ov(type: IT company, type: Company) = MAX
b) string/numerical similarity
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
11. Optimization goal (1)
+characterizing power, -information overlap
• Information overlap between features (ov)
a) logical inference
entailment maximized ov
ov(type: IT company, type: Company) = MAX
b) string/numerical similarity
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
12. Optimization goal (1)
+characterizing power, -information overlap
• Information overlap between features (ov)
a) logical inference
entailment maximized ov
ov(type: IT company, type: Company) = MAX
b) string/numerical similarity
ov = max{similarity between properties, similarity between values}
ov(type: IT company, product: iPhone 5) = SMALL
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
13. Optimization goal (1)
+characterizing power, -information overlap
• Formulated as k Quadratic Knapsack Problems (QKP)
weight of a feature: length
profit of a pair of features:
to maximize characterizing power
to minimize information overlap
14. Optimization goal (2): +differentiating power
• Differentiating power of a pair of features (di)
a) string/numerical dissimilarity
di = property’s value uniqueness * dissimilarity between values
di(type: IT company, type: Fruit) = SMALL*LARGE = MEDIUM
(Single-valued properties are more useful.)
b) logical inference
entailment = minimized di
di(type: IT company, type: Company) = MIN
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
Samsung Electronics
- type: IT Company
- ...
15. Optimization goal (2): +differentiating power
• Differentiating power of a pair of features (di)
a) string/numerical dissimilarity
di = dissimilarity between values * property’s value uniqueness
di(type: IT company, type: Fruit) = LARGE*SMALL = MEDIUM
(Single-valued properties are more useful.)
b) logical inference
entailment = minimized di
di(type: IT company, type: Company) = MIN
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
Samsung Electronics
- type: IT Company
- ...
16. Optimization goal (2): +differentiating power
• Differentiating power of a pair of features (di)
a) string/numerical dissimilarity
di = dissimilarity between values * property’s value uniqueness
di(type: IT company, type: Fruit) = LARGE*SMALL = MEDIUM
(Single-valued properties are more useful.)
b) logical inference
entailment minimized di
di(type: IT company, type: Company) = MIN
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
Samsung Electronics
- type: IT Company
- ...
17. Optimization goal (2): +differentiating power
• Formulated as a Quadratic Multidimensional
Knapsack Problem (QMKP)
weight of a feature: length
profit of a pair of features: differentiating power
18. Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention
• cosine similarity in the class vector model (cs)
Vector(context) = {Smarphone, IT company}
Vector(type: Fruit) = {Fruit}
Vector(product: iPhone 5) = {Smartphone}
cs(context, product: iPhone 5) = HIGH
• class weighting: class frequency – inverse instance frequency (CF-IIF)
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
19. Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention
• cosine similarity in the class vector model (cs)
Vector(context) = {Smarphone, IT company}
Vector(type: Fruit) = {Fruit}
Vector(product: iPhone 5) = {Smartphone}
cs(context, product: iPhone 5) = HIGH
• class weighting: class frequency – inverse instance frequency (CF-IIF)
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
20. Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention
• cosine similarity in the class vector model (cs)
Vector(context) = {Smarphone, IT company}
Vector(type: Fruit) = {Fruit}
Vector(product: iPhone 5) = {Smartphone}
cs(context, product: iPhone 5) = HIGH
• class weighting: class frequency – inverse instance frequency (CF-IIF)
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
21. Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention
• cosine similarity in the class vector model (cs)
Vector(context) = {Smarphone, IT company}
Vector(type: Fruit) = {Fruit}
Vector(product: iPhone 5) = {Smartphone}
cs(context, product: iPhone 5) = HIGH
• class weighting: class frequency – inverse instance frequency (CF-IIF)
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
22. Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention
• cosine similarity in the class vector model (cs)
Vector(context) = {Smarphone, IT company}
Vector(type: Fruit) = {Fruit}
Vector(product: iPhone 5) = {Smartphone}
cs(context, product: iPhone 5) = HIGH
• class weighting: class frequency – inverse instance frequency (CF-IIF)
But with the release of the iPhone 6
and the 6 Plus phablet, Apple has finally
gone into big-screen territory, giving
Samsung a challenge in the category
that the company has been dominating
for some time now.
Text Knowledge Base
iPhone 6
- type: Smartphone
- ...
Samsung Electronics
- type: IT Company
- ...
Apple (Inc.)
- type: Company
- type: IT company
- product: iPhone 5
- ...
Apple (Fruit)
- type: Fruit
- genus: Malus
- ...
?
Candidate entities
23. Optimization goal (3): +relevance to context
• Solved by k Maximizing Marginal Relevance (MMR)
frameworks
• Features are iteratively selected.
• In each iteration, candidate features are re-ranked by
• relevance to context
• dissimilarity to selected features
25. Experiments: data sets
• Text corpora (with entity mentions linked to Wikipedia)
• AQUAINT
• IITB
• Knowledge base
• DBpedia
• Gold-standard links
• entity mentions Wikipedia articles DBpedia entities
26. Experiments: EL tasks
Apple (Inc.)
- type: Company
- product: iPhone 5
Apple (Corps)
- type: Company
- product: Let It Be
Apple (Fruit)
- type: Fruit
?
..., Apple has finally gone
into big-screen territory, …
1 target entity
• gold-standard
2 (very challenging) noise entities
• sharing a common name with the target entity,
obtained from Wikipedia’s disambiguation pages
27. Experiments: approaches
• Proposed approaches
• CHR: +characterizing power, -information overlap
• DFF: +differentiating power
• CNT: +relevance to context
• COMB: CHR+DFF+CNT
• Baseline approaches
• DESC: returns entire entity descriptions
• RELIN: a state-of-the-art entity summarization approach for
generic purposes
• average length of entity descriptions: 680 characters
• length limit for summaries: 100 characters (14.7%)
28. Experiments: extrinsic evaluation
• COMB is the only approach that achieved the following
statistically significant results on both data sets:
• accuracy (% of correct answers): COMB = DESC
• time: COMB < DESC (22-23% faster)