TAIPAN: Automatic Property Mapping for Tabular Data by Ivan Ermilov and Axel-Cyrille Ngonga Ngomo in Proceedings of 20th International Conference on Knowledge Engineering and Knowledge Management (EKAW'2016)
2. Web Scale Data Mining from Web Tables
Web Data Commons
Dresden Table Dataset
Other tables
The Web
TAIPAN
● Structured
● Schemaless
● Not using standards*
● SPARQL
● RDFS
● OWL
2
5. The Core of TAIPAN
Subject Column
Identification
● Unsupervised ML
● Structural features
● Semantic features
○ Support of a column
○ Connectivity
● Retrieve seed entities
● Rank entities
● Return top entity
Property Mapping
5
7. Subject Column Identification Experiments
Rule-based approach achieves
only 51.72% accuracy
Using support and connectivity
increase precision
Observations
Can be further improved using
ML techniques
7
8. Property Mapping Experiments
TAIPAN achieves better recall,
but lower precision than T2D
On the DBD dataset T2K could
match only 1 property
Observations
Overall TAIPAN performs
better than the state of the art
8
9. Conclusions & Future Work
Curated T2D & DBD datasets
Novel TAIPAN approach
Open Table Extraction
Table Extraction Benchmark (HOBBIT)
Integration of TAIPAN into GEISER project 9
10. Thank you!
Follow us on twitter
:)
Ivan Ermilov <iermilov@informatik.uni-leipzig.de>
@hobbit_project
10