TAIPAN: Automatic Property Mapping for Tabular Data

•

0 recomendaciones•758 vistas

Holistic Benchmarking of Big Linked Data

TAIPAN: Automatic Property Mapping for Tabular Data by Ivan Ermilov and Axel-Cyrille Ngonga Ngomo in Proceedings of 20th International Conference on Knowledge Engineering and Knowledge Management (EKAW'2016)

Ingeniería

TAIPAN: Automatic
Property Mapping for
Tabular Data
by Ivan Ermilov and Axel-Cyrille Ngonga Ngomo
November 22nd, 2016
1

Web Scale Data Mining from Web Tables
Web Data Commons
Dresden Table Dataset
Other tables
The Web
TAIPAN
● Structured
● Schemaless
● Not using standards*
● SPARQL
● RDFS
● OWL
2

TAIPAN Approach Overview
Identify
Subject
Column
Atomize a
Table
Identify
Property for
Each Table
Step 1 Step 2 Step 3 Step 4
Return
Mappings
3

TAIPAN Approach Overview (example)
1
2
3
4

The Core of TAIPAN
Subject Column
Identification
● Unsupervised ML
● Structural features
● Semantic features
○ Support of a column
○ Connectivity
● Retrieve seed entities
● Rank entities
● Return top entity
Property Mapping
5

Experimental setup
For T2K: 128GB, 4 Cores, Ubuntu 14.04
For TAIPAN: 16GB, 4 Cores Ubuntu 14.04
Dataset 1: curated T2D gold standard (T2D)
Dataset 2: DBpedia table dataset (DBD)
6

Subject Column Identification Experiments
Rule-based approach achieves
only 51.72% accuracy
Using support and connectivity
increase precision
Observations
Can be further improved using
ML techniques
7

Property Mapping Experiments
TAIPAN achieves better recall,
but lower precision than T2D
On the DBD dataset T2K could
match only 1 property
Observations
Overall TAIPAN performs
better than the state of the art
8

Conclusions & Future Work
Curated T2D & DBD datasets
Novel TAIPAN approach
Open Table Extraction
Table Extraction Benchmark (HOBBIT)
Integration of TAIPAN into GEISER project 9

Thank you!
Follow us on twitter
:)
Ivan Ermilov <iermilov@informatik.uni-leipzig.de>
@hobbit_project
10

Más contenido relacionado

Más de Holistic Benchmarking of Big Linked Data

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...Holistic Benchmarking of Big Linked Data

Benchmarking Big Linked Data: The case of the HOBBIT ProjectHolistic Benchmarking of Big Linked Data

Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...Holistic Benchmarking of Big Linked Data

The DEBS Grand Challenge 2018Holistic Benchmarking of Big Linked Data

Benchmarking of distributed linked data streaming systemsHolistic Benchmarking of Big Linked Data

SQCFramework: SPARQL Query Containment Benchmarks Generation FrameworkHolistic Benchmarking of Big Linked Data

LargeRDFBench: A billion triples benchmark for SPARQL endpoint federationHolistic Benchmarking of Big Linked Data

The DEBS Grand Challenge 2017Holistic Benchmarking of Big Linked Data

4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...Holistic Benchmarking of Big Linked Data

Scalable Link Discovery for Modern Data-Driven Applications (poster)Holistic Benchmarking of Big Linked Data

An Evaluation of Models for Runtime Approximation in Link DiscoveryHolistic Benchmarking of Big Linked Data

Scalable Link Discovery for Modern Data-Driven ApplicationsHolistic Benchmarking of Big Linked Data

Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...Holistic Benchmarking of Big Linked Data

SPgen: A Benchmark Generator for Spatial Link Discovery ToolsHolistic Benchmarking of Big Linked Data

Introducing the HOBBIT platform into the Ontology Alignment Evaluation CampaignHolistic Benchmarking of Big Linked Data

OKE2018 Challenge @ ESWC2018Holistic Benchmarking of Big Linked Data

MOCHA 2018 Challenge @ ESWC2018Holistic Benchmarking of Big Linked Data

Dynamic planning for link discovery - ESWC 2018Holistic Benchmarking of Big Linked Data

Hobbit project overview presented at EBDVF 2017Holistic Benchmarking of Big Linked Data

Leopard ISWC Semantic Web Challenge 2017 (poster)Holistic Benchmarking of Big Linked Data

Más de Holistic Benchmarking of Big Linked Data (20)

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...

Benchmarking Big Linked Data: The case of the HOBBIT Project

Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...

The DEBS Grand Challenge 2018

Benchmarking of distributed linked data streaming systems

SQCFramework: SPARQL Query Containment Benchmarks Generation Framework

LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation

The DEBS Grand Challenge 2017

4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...

Scalable Link Discovery for Modern Data-Driven Applications (poster)

An Evaluation of Models for Runtime Approximation in Link Discovery

Scalable Link Discovery for Modern Data-Driven Applications

Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...

SPgen: A Benchmark Generator for Spatial Link Discovery Tools

Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign

OKE2018 Challenge @ ESWC2018

MOCHA 2018 Challenge @ ESWC2018

Dynamic planning for link discovery - ESWC 2018

Hobbit project overview presented at EBDVF 2017

Leopard ISWC Semantic Web Challenge 2017 (poster)

Último

(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7Call Girls in Nagpur High Profile Call Girls

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla

Extrusion Processes and Their Limitations120cr0395

chapter 5.pptx: drainage and irrigation engineeringmulugeta48

Thermal Engineering-R & A / C - unit - VDineshKumar4165

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066

UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan

Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth

VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY

Generative AI or GenAI technology based PPTbhaskargani46

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan

Thermal Engineering Unit - I & II . pptDineshKumar4165

Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile

KubeKraft presentation @CloudNativeHooghlysanyuktamishra911

Double rodded leveling 1 pdf activity 01KreezheaRecto

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya

Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

TAIPAN: Automatic Property Mapping for Tabular Data

1. TAIPAN: Automatic Property Mapping for Tabular Data by Ivan Ermilov and Axel-Cyrille Ngonga Ngomo November 22nd, 2016 1

2. Web Scale Data Mining from Web Tables Web Data Commons Dresden Table Dataset Other tables The Web TAIPAN ● Structured ● Schemaless ● Not using standards* ● SPARQL ● RDFS ● OWL 2

3. TAIPAN Approach Overview Identify Subject Column Atomize a Table Identify Property for Each Table Step 1 Step 2 Step 3 Step 4 Return Mappings 3

4. TAIPAN Approach Overview (example) 1 2 3 4

5. The Core of TAIPAN Subject Column Identification ● Unsupervised ML ● Structural features ● Semantic features ○ Support of a column ○ Connectivity ● Retrieve seed entities ● Rank entities ● Return top entity Property Mapping 5

6. Experimental setup For T2K: 128GB, 4 Cores, Ubuntu 14.04 For TAIPAN: 16GB, 4 Cores Ubuntu 14.04 Dataset 1: curated T2D gold standard (T2D) Dataset 2: DBpedia table dataset (DBD) 6

7. Subject Column Identification Experiments Rule-based approach achieves only 51.72% accuracy Using support and connectivity increase precision Observations Can be further improved using ML techniques 7

8. Property Mapping Experiments TAIPAN achieves better recall, but lower precision than T2D On the DBD dataset T2K could match only 1 property Observations Overall TAIPAN performs better than the state of the art 8

9. Conclusions & Future Work Curated T2D & DBD datasets Novel TAIPAN approach Open Table Extraction Table Extraction Benchmark (HOBBIT) Integration of TAIPAN into GEISER project 9

10. Thank you! Follow us on twitter :) Ivan Ermilov <iermilov@informatik.uni-leipzig.de> @hobbit_project 10

TAIPAN: Automatic Property Mapping for Tabular Data

Recomendados

Recomendados

Más contenido relacionado

Más de Holistic Benchmarking of Big Linked Data

Más de Holistic Benchmarking of Big Linked Data (20)

Último

Último (20)

TAIPAN: Automatic Property Mapping for Tabular Data