Current research on knowledge graph completion focuses on employing graph embeddings for the task of link prediction. But knowledge graph completion is more than link prediction and tasks such as adding formerly unknown long-tail entities to the graph, extending the schema of the graph with additional properties, and completing and updating numeric values are equally important tasks. In the talk, Christian Bizer will review recent results on using data from large numbers of independent websites to accomplish these tasks. He will focus on two types of web content – relational HTML tables and semantic annotations within HTML pages – and will discuss the potential of these types of content for set completion, schema extension, and fact checking, as well as their utility as training data for matching textual entity descriptions.
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
1. Data and Web Science GroupCompleting Knowledge Graphs using Data
from the Open Web
1Nov. 25‐27, 2019, Hangzhou, China
Prof. Dr. Christian Bizer
JIST2019
2. Data and Web Science Group
– Cross‐Domain Knowledge Graphs
– Domain‐Specific Knowledge Graphs
Knowledge Graphs
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 2
Product Knowledge
Graphs
Scholarly
Knowledge Graphs
Life Science
Knowledge Graphs
Enterprise
Knowledge Graphs
4. Data and Web Science Group
4
Types of Incompleteness
P1 P2 P3 P4 … PN ? ? ? …
E1 ? ? ? ? ? ? ?
E2 ? ? ? ? ? ?
E3 ? ? ? ?
… ? ? ? ? ?
Em ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
… ? ? ? ? ? ? ? ? ? ?
Class in KG
Relevant facts
are missing
Relevant
properties
are missing
in schema
Relevant
entities
are not
described
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
5. Data and Web Science Group
5
Knowledge Graph Completion Tasks
P1 P2 P3 P4 … PN ? ? ? …
E1 ? ? ? ? ? ? ?
E2 ? ? ? ? ? ?
E3 ? ? ? ?
… ? ? ? ? ?
Em ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
… ? ? ? ? ? ? ? ? ? ?
Link Prediction
add object property values for
existing instances and properties
Literal Slot Filling
add datatype property values for
existing instances and properties
Schema Expansion
add and fill new properties
Entity Expansion
• add new entities
• and their descriptions (values
for existing properties)
Class in KG
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
6. Data and Web Science Group
Outline
1. Internal Knowledge Graph Completion
2. KG Completion using Small Sets of External Sources
3. KG Completion using Web Table Data
– Profile of HTML Tables on the Web
– Potential for KG Completion
4. KG Completion using Schema.org Annotations
– Profile of Schema.org Annotations on Web
– Potential for KG Completion
5. Conclusions
6Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
7. Data and Web Science Group
1. Internal Knowledge Graph Completion
– Predict missing facts by exploiting patterns in the graph
– Classic value imputation applied to knowledge graphs
– Symbolic methods
– Rule learning, e.g. AIME, AnyBURL
– Heutistics, e.g. SDType
– Sub‐symbolic methods
– Knowledge Graph Embeddings
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 7
Paulheim: Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods. SWJ 2017.
Nguyen: An Overview of Embedding Models of Entities and Relationships for Knowledge Base Completion. ArXiv, 2019.
8. Data and Web Science Group
Knowledge Graph Embeddings:
Hits@10 Results on Link Prediction Task
8Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
Nguyen: An Overview of Embedding Models of Entities and Relationships for Knowledge Base Completion. ArXiv, 2019.
9. Data and Web Science Group
Weaknesses of Current Research
on Knowledge Graph Embeddings
1. Only link prediction covered
– What about literal slot filling? Entity expansion? Schema expansion?
2. Evaluations self‐referencial
– Hit@10 not relevant for KB completion
– Hits@1 performance too bad for practical use
– Accuracy evaluated on simplified dataset: FB13, 6 relations dropped!
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 9
Method Hits@1 on FB15K‐237
ComplEx (2018) 0.26
ConvE (2018) 0.24
TuckER (2019) 0.26
AnyBURL (2019) 0.23 (symbolic rule learner)
Meilicke, et al.: Fine‐grained Evaluation of Rule‐ and Embedding‐based Systems for Knowledge Graph Completion.
Wang, Ruffiinelli: On Evaluating Embeddings Models for Knowledge Base Completion. RepL4NLP‐2019.
10. Data and Web Science Group
– Open License Sources
– Commercial Sources
2. KG Completion Using a Small Set
of External Sources
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 10
Baike
11. Data and Web Science Group
Integration Process
– Classic data integration
techniques apply
– Small set of sources
– we can invest effort into the
integration of each data source
– write rules, label training data
– quality depends on effort spend
– All task are covered
– link prediction, slot filling,
– entity expansion, schema expansion
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 11
Data Collection / Extraction
Schema Mapping
Data Translation
Entity Matching
Data Quality Assessment
Data Fusion
AnHai, Halevy, Ives: Principles of Data Integration. Morgan Kaufmann, 2012.
12. Data and Web Science Group
KG Completion Example
– YAGO2 completed using Geonames
– 7 million additonal entities from Geonames
– 10 additional object properties (relations)
– 320 million triples from Geonames
– Accuracy of spartial relations (manual evaluation)
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 12
Hoffart, et al.: YAGO2: A spatially and temporally enhanced knowledge base extracted from Wikipedia. JWS 2013.
13. Data and Web Science Group
Challenges
1. Coverage of external sources
– Relevant entities and relevant properties might not be covered
– e.g. Wikipedia
2. Up‐to‐dateness of external sources
– Are sources regularly maintained?
– e.g. LOD cloud, research data
3. Cost of external sources
– Complete and up‐to‐date data often costs money
– Large companies can pay, other organizations often not
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 13
14. Data and Web Science Group
Let‘s use the Open Web
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 14
15. Data and Web Science Group
Let‘s use the Open Web
– Potentials
1. Coverage of long‐tail entities
2. Coverage of long‐tail properties
3. Content often up‐to‐date
– Challenges
1. Heterogeneity
2. Data quality depends on website
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 15
16. Data and Web Science Group
Web Tables
16Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
17. Data and Web Science Group
Schema.org Annotations
17
<div itemtype="http://schema.org/Hotel">
<span itemprop="name">Vienna Marriott Hotel</span>
<span itemprop="address" itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">Parkring 12a</span>
<span itemprop="addressLocality">Vienna</span>
</span>
</div>
<div itemtype="http://schema.org/Product">
<h1 itemprop="name">Signature Instant Dome 7</h1>
Item # <span itemprop=“productID">200734246807</span>
<img itemprop=“image“ src=“../20073424680.jpg">
<span itemprop=“review" itemtype="http://schema.org/Review">
<span itemprop=“ratingValue">5</span>
<span itemprop=“reviewBody">Great waterproof tent!</span>
</span>
</div>
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
18. Data and Web Science Group
3. Web Tables
18
Cafarella, et al.: WebTables: Exploring the Power of Tables on the Web. VLDB 2008.
Crestan, Pantel: Web‐Scale Table Census and Classification. WSDM 2011.
In corpus of 14B raw tables, 154M are relational tables (1.1%).
Cafarella et al. 2008
Our focus
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
19. Data and Web Science Group
Web Data Commons – Web Tables Corpus
19
Lehmberg, Ritze, Meusel, Bizer: A Large Public Corpus of Web Tables containing Time and Context Metadata.
WWW2016 Companion.
– extracted from Common Crawl 2015
– 1.78 billion pages
– 10.2 billion raw HTML tables
– Size of the web table corpus
– 90 million relational tables (0.9%)
– 139 million attribute/value tables (1,3%)
– uses 100 machines on Amazon EC2
– approx. 2000 machine/hours 250 €
– Download
– http://webdatacommons.org/webtables/
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
20. Data and Web Science Group
Task 1:
Slot Filling and Link Prediction
20
A1 A2 A3 A4 … AN
E1 ? ? ?
E2 ? ?
E3
… ?
Em ? ? ? ?
DBpedia Class
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
21. Data and Web Science Group
Two Step Process
21
1. Matching 2. Data Fusion
IATA Code
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
22. Data and Web Science Group
Holistic Entity‐ and Schema Matching
22
– T2K+ matches
– millions of web tables to the
– DBpedia knowledge base
– Exploits
– wide range of features
– synergies between matching tasks
– table corpus for indirect matching
– patterns from KB for post filtering
– Results
Dominique Ritze, Christian Bizer: Matching Web Tables To DBpedia – A Feature Utility Study. EDBT 2017.
Task Precision Recall F1
Class .94 .94 .94
Instance .90 .76 .82
Property .77 .65 .70
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
23. Data and Web Science Group
ISWC2019 Challenge on Tabular Data
to Knowledge Graph Matching
– Multiple tasks
– CPA task = property matching
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 23
http://www.cs.ox.ac.uk/isg/challenges/sem‐tab/
24. Data and Web Science Group
Large‐Scale Matching Results
24
– 1 million out of 30 million tables match DBpedia (~3%)
– 301,450 matching tables have attribute correspondences (~1%)
– Results in 8 million triples describing 717,000 DBpedia entities
Ritze, et al.: Profiling the Potential of Web Tables for Augmenting Cross‐domain Knowledge Bases. WWW2016.
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
25. Data and Web Science Group
– Fusion Results
– Group Sizes
– Large groups (Head): many values
already exist in the KB
– Small groups (Tail): often new
value but difficult to fuse correctly
Data Fusion Results
25
Fusion Strategy Precision Recall F1
Voting .369 .823 .509
PageRank .365 .814 .504
Knowledge-based Trust .639 .785 .705
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
26. Data and Web Science Group
What is going wrong?
Topical Mismatch
26
Websites contributing many tables # Tables Topic
apple.com 50,910 Music
baseball‐reference.com 25,647 Sports
latestf1news.com 17,726 Sports
nascar.com 17,465 Sports
amazon.com 16,551 Products
wikipedia.org 13,993 Various
inkjetsuperstore.com 12,282 Products
flightmemory.com 8,044 Flights
windshieldguy.com 7,305 Products
citytowninfo.com 6,293 Cities
blogspot.com 4,762 Various
7digital.com 4,462 Music
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
27. Data and Web Science Group
What is going wrong?
Attribute Semantics
27
https://www.bls.gov/oes/2017/may/oes_ca.htm
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
28. Data and Web Science Group
– Matching and fusion methods expect binary relations
– But many relations in Web tables are not binary!
28
Misinterpretation of Attribute Semantics
Occupation
code
Annual
wage
11‐3021 $128,805
15‐0000 $113,689
15‐1111 $39,180
Occupation
code
Annual
wage
11‐3021 $93,805
15‐0000 $83,472
15‐1111 $39,180
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
29. Data and Web Science Group
Synthesizing N‐ary Relations from Web
Tables
– Exploiting page context: Title and URL
– Stitching tables from the same site
29
Oliver Lehmberg, Christian Bizer: Synthesizing N‐ary Relations from Web Tables. WIMS2019.
Oliver Lehmberg, Christian Bizer: Stitching web tables for improving matching quality. VLDB Endowment, 2017.
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
30. Data and Web Science Group
Amount of N‐ary Relations
30
Oliver Lehmberg, Christian Bizer: Profiling the Semantics of N‐ary Web Table Data. SBD Workshop @ SIGMOD 2019.
Results for the WDC2015 corpus:
– 37.50% of all relations are non‐binary
– 50.47% of these require context attributes
Time dimension is
often part of keys
but also many
other attributes
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
31. Data and Web Science Group
Task 2: Add Long‐Tail Entities
to Knowledge Graph
31
A1 A2 A3 A4 … AN ? ? ? …
E1 ? ? ? ? ? ? ?
E2 ? ? ? ? ? ?
E3 ? ? ? ?
… ? ? ? ? ?
Em ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
… ? ? ? ? ? ? ? ? ? ?
Entity Expansion:
1. Find new entities
2. Compile descriptions of
these new entities
Class in KB
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
32. Data and Web Science Group
Long‐Tail Entity Expansion Pipeline
32
Row
Clustering
Entity
Creation
New
Detection
Knowledge
base
New entities added to
knowledge base
Schema
Matching
Web tables
Output of first iteration used to refine the
schema mapping in a second iteration
Yaser Oulabi, Christian Bizer: Extending cross‐domain knowledge bases with long tail entities using web table
data. EDBT 2019.
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
1. Cluster rows that describe the same instance together
2. Create entities by fusing rows in clusters
3. Determine which entities describe new instances
by comparing then to instances in KB
33. Data and Web Science Group
Entity Expansion Results
33
Class New Entities New Facts New Entities
Precision Goldst.
New Facts
Precision Goldst.
Football Player 13,983 (+67%) 43,800 (+32%) 0.82 0.85
Song 186,943 (+356%) 393,711 (+125%) 0.72 0.79
‐ 50.000 100.000 150.000 200.000 250.000
Football Player
Song
Existing New
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
DBpedia has low coverage for songs,
thus it is easy to find new ones
34. Data and Web Science Group
Conclusion: Web Tables
– Extremely wide range of very specific attributes
– many n‐ary relations with complex keys / time dimension
– table context is often required to understand the data
– Coverage of long tail entitles beyond Wikipedia
– Things work OK, but not good enough yet
– fact precision slot filling: 0.64 (without stitching)
– new entities accuracy: 0.72‐0.82
– For researchers: Rewarding but challenging domain
– For practitioners: Human in the loop still required for
verification
34Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
35. Data and Web Science Group
4. Schema.org Annotations
35
– ask site owners since 2011 to
annotate data for enriching
search results
– 675 Types: Event, Place, Local
Business, Product, Review, Person
– Encoding: Microdata, RDFa, JSON‐LD
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
36. Data and Web Science Group
Usage of Schema.org Data
Data snippets
within
search results
Local
businesses on
maps
Data snippets
in knowledge
panels
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
37. Data and Web Science Group
Web Data Commons – Structured Data
37
– extracts all Microformat, Microdata,
RDFa, JSON‐LD data from the Common Crawl
– analyzes and provides the extracted data for download
– statistics of some extraction runs
– 2018 CC Corpus: 2.5 billion HTML pages 31.5 billion RDF triples
– 2017 CC Corpus: 3.1 billion HTML pages 38.2 billion RDF triples
– 2014 CC Corpus: 2.0 billion HTML pages 20.4 billion RDF triples
– 2010 CC Corpus: 2.8 billion HTML pages 5.1 billion RDF triples
– Download
– http://webdatacommons.org/structureddata/
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
38. Data and Web Science Group
Overall Adoption 2018
http://webdatacommons.org/structureddata/2018‐12/
38
944 million HTML pages out of the 2.5 billion pages
provide semantic annotations (37.1%).
9.6 million pay-level-domains (PLDs) out of the
32.8 million pay-level-domains covered by the crawl
provide semantic annotations (29.3%).
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
40. Data and Web Science Group
Entities in WDC Dataset
for KG Completion
40
http://webdatacommons.org/structureddata/2018‐12/stats/schema_org_subsets.html
Class # Entities Names
Product 368,386,801
Organization 176,365,220
Local Business 44,100,625
Place 31,748,612
Event 26,035,295
Movie 5,233,520
Hotel 2,537,089
College or University 2,392,797
Recipe 2,036,417
Music Album 1,594,941
Restaurant 1,480,443
Airport 1,145,323
as Common Crawl is rather shallow
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
41. Data and Web Science Group
Use Case 1: Completing KG Describing
Books and Movies
– Results of KnowMore experiments
– Entity matching F1: book 0.88, movie 0.83
– New fact precision: book 0.88, movie 0.95
– Density increase for properties of existing entites
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 41
Yu, et al.: KnowMore–knowledge base augmentation with structured web markup. SWJ 2019.
Movies
Books
42. Data and Web Science Group
42
Top Attributes PLDs Microdata
# %
schema:Product/name 754,812 92 %
schema:Product/offers 645,994 79 %
schema:Offer/price 639,598 78 %
schema:Offer/priceCurrency 606,990 74 %
schema:Product/image 573,614 70 %
schema:Product/description 520,307 64 %
schema:Offer/availability 477,170 58 %
schema:Product/url 364,889 44 %
schema:Product/sku 160,343 19 %
schema:Product/aggregateRating 141,194 17 %
schema:Product/brand 113,209 13 %
schema:Product/category 62,170 7 %
schema:Product/productID 47,088 5 %
… … …
http://webdatacommons.org/structureddata/2018‐12/stats/html‐md.xlsx
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.25
Das Samsung Galaxy S4 ist der
unterhaltsame und hilfreiche Begleiter
für Ihr mobiles Leben. Es verbindet Sie
mit Ihren Liebsten. Es lässt Sie
gemeinsam unvergessliche Momente
erleben und festhalten. Es vereinfacht
Ihren Alltag.
UPC 610214632623
000214632623
Use Case 2: Completing Product KGs
43. Data and Web Science Group
HTML Tables Containing
Product Specifications
43
Qui, et al.: DEXTER: Large‐Scale Discovery and Extraction of Product Specifications on the Web. VLDB 2015.
Petrovski, et al: The WDC Gold Standards for Product Feature Extraction and Product Matching. ECWeb 2016.
s:name
Specifications
as HTML Table
s:breadcrumb
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.25
44. Data and Web Science Group
Use Product ID Annotations
as Supervision for Product Matching
– Some e‐shops annotate product IDs
– Most e‐shops do not
44
Properties PLDs
# %
schema:Product/name 754,812 92 %
schema:Product/description 520,307 64 %
schema:Product/sku 160,343 19 %
schema:Product/productID 47,088 5 %
schema:Product/mpn 12,882 1.6%
schema:Product/gtin13 7,994 1%
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
Bizer, Primpeli, Peeters: Using the Semantic Web as a Source of Training Data. Datenbank Spektrum, 2019.
45. Data and Web Science Group
Combine Specification Table Content
from All Offers for the Same Product
45
Product
offer
Product
offer
Product
offer
Product
offerProduct
offerProduct
offer
Product
offer
Product
offer
Product
offer
Product
offer
Product
offer
Product
offer
Clusters of offers
having the same product ID
Match additional
offers without
IDs
Learn
Matcher
Matcher
Product
offer
Product
offer
Combine spec
table content of
matching offers
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
46. Data and Web Science Group
Use Case 3: Distant Supervision for
Information Extraction
– Goal: Learn how to extract data from websites that do
not provide schema.org annotations using annotations as
training data.
– Example: Small Events Paper @ SIGIR 2015
– extract data about small venue concerts, theatre performances,
garage sales, etc.
– they use 217,000 schema.org events from ClueWeb2012
as distant supervision, e.g.
46
Foley, Bendersky, Josifovski: Learning to Extract Local Events from the Web. SIGIR2015.
<div itemtype="http://schema.org/MusicEvent">
<h1 itemprop="name">School Cool Concert</h1>
<p itemprop="startDate">25.09.2018 15:00</p>
<p itemprop=“location" itemtype="http://schema.org/Place">School Gym Puero High</span>
<div>
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
47. Data and Web Science Group
Small Events Paper @ SIGIR 2015
– Approach
– train an information extraction method that
combines structural and text features
– extracted attributes: What? When? Where?
– apply method to pages without semantic
annotations
– Result
– extract 200,000 additional small‐scale
events from ClueWeb2012
– double recall at 0.85 precision
47
Foley, Bendersky, Josifovski: Learning to Extract Local Events from the Web. SIGIR2015.
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
48. Data and Web Science Group
Use Case 4: Training Data for Taxonomy
Matching and Taxonomy Clustering
– E‐Shops annotate their product categorization
– The marketing ideas behind these categorizations differ widely
48
Relevant Properties # PLDs %
schema:Offer/category 62,170 7 % of all shops
schema:WebPage/BreadcrumbList 621,344 11% of all sites
Apparel > Summer > Mens > Jerseys
Philadelphia Eagles > Philadelphia Eagles Mens > Philadelphia Eagles Mens Jersey > over $60 s
Home > Outdoor & Garden > Barbecues & Outdoor Living > Garden Furniture > Tables
Shop > Tables > Dining Tables > Aland Wood Tables
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
Meusel, etal.: Exploiting Microdata Annotations to Consistently Categorize Product Offers at Web Scale. EC‐Web 2015.
49. Data and Web Science Group
Cool Taxonomy Clustering Problem
– How to cluster 62,000 taxonomies in order to discover the
most frequent categorization ideas within a domain?
– How do shops categorize action cameras?
– What subcategories are frequently used for action cameras?
49Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
50. Data and Web Science Group
Learn How to Match Product Taxonomies
using Schema.org Data as Supervision
50
Category
A
Category
B
Taxonomies that share
some products Unseen categories
Similar category?
P1
P2
P1
P2
P3
Learn
Matcher
Matcher
Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
51. Data and Web Science Group
Conclusions
1. Internal Knowledge Graph Completion
– Interesting, but not ready for prime time yet
– Focus on link prediction, hardly any results on other tasks
2. KG Completion using Small Sets of External Sources
– Production quality results for all tasks
– Manuel effort per source required
– Bottle neck is the availability and costs of external data sources
51Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
52. Data and Web Science Group
Conclusions
3. KG Completion using Web Table Data
– Wide coverage of long‐tail entities
– Wide coverage of long‐tail properties
– Difficult to get property semantics right (page context required)
4. KG Completion using Schema.org Annotations
– Comprehensive coverage of specific domains
– Potential as training data for different KG completion tasks
52Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
53. Data and Web Science Group
Thank you.
– Web Table Corpus
http://webdatacommons.org/webtables/
– Semantic Annotations
http://webdatacommons.org/structureddata/
– Product Matching Training Data and Goldstandard
http://webdatacommons.org/largescaleproductcorpus/v2/
53Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26