SlideShare una empresa de Scribd logo
1 de 13
WDPlus:
Leveraging Wikidata to Link and
Extend Tabular Data
Daniel Garijo, Pedro Szekely
Information Sciences Institute and
Department of Computer Science
@dgarijov
dgarijo@isi.edu
Abundance of data sources in the Web
Users of data face three challenges
• How do I find relevant datasets
for my problem?
• How do I augment my dataset
with existing information?
• How can I share my integrated
results with the community?
Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
2
Raw data
Popular initiatives for addressing these challenges
• Search individual items
• Search is manual, based on user input
• LOD cloud of connected datasets
• Knowledge engineers are needed to map and
augment content
• ETL Frameworks (e.g, Karma, Open Refine)
• Pipelines are custom, expertise required
• Often not shared
Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
3
Sources: https://lod-cloud.net/versions/2019-03-29/lod-cloud.png; https://panoply.io/data-warehouse-guide/3-ways-to-build-an-etl-process/
WDPlus
A framework designed to:
• Discover data on the Web
• Improve raw data to make it useful
• Search, querying dataset structure
• Download fresh data
• Combine existing dataset
• Share improved data and methods
Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
4
1953-01-20
Harry Truman
Lamar
President
USA
1884-05-08
1972-12-26
male
1945-04-12
Bress Truman
weath
er
shoppi
ng
crime
sports
Core
Satellites
Metadata index
WDPlus architecture
Wikidata as a core KG
• 60 Million items
• 700 Million statements
• 20,000 + contributors
• +1 billion edits
• Collaborative!
5
1953-01-20
Harry Truman
Lamar
President
USA
1884-05-08
1972-12-26
male
1945-04-12
Bress Truman
Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
Core
WDPlus architecture
Satellite organization
• Detailed information on a domain
• Crime records, sport events, etc.
• Linked to the Wikidata core
• Link first strategy
• Custom properties and Qnodes
• Extensions to core model
• Synchronized with core
• Decentralized
• 1 satellite may be maintained by 1
community
6Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
1953-01-20
Harry Truman
Lamar
President
USA
1884-05-08
1972-12-26
male
1945-04-12
Bress Truman
weath
er
shoppi
ng
crime
sports
Core
Satellites
WDPlus architecture
Table models
• Tables are not materialized
• Able to become a satellite under
demand
• Described in machine-readable
metadata index
• Indexing columns names and
relevant instances for fast retrieval
• Link to table model is preserved
7Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
1953-01-20
Harry Truman
Lamar
President
USA
1884-05-08
1972-12-26
male
1945-04-12
Bress Truman
weath
er
shoppi
ng
crime
sports
Core
Satellites
Metadata index
Towards WDPLus
8Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
1953-01-20
Harry Truman
Lamar
President
USA
1884-05-08
1972-12-26
male
1945-04-12
Bress Truman
weath
er
shoppi
ng
crime
sports
Core
Satellites
Metadata index
WDPlus framework: Metadata index and
table Augmentation
• Search
• Keywords, variables or content
• Wikifier may be used in search
• Download
• Download a dataset or its metadata
• Augment
• Merge your dataset with contents from
other datasets automatically
• Upload
• Add new datasets (automated metadata
profiling and provenance)
• Enrich
• Header enrichment for search efficiency
9Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
WDPlus framework: T2WML
10Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
Table overview
Cell-based mapping.
This mapping is
saved in WDPlus for
future reference
Entity Linking
Result sample
Easy to share!
Creating Wikidata Satellites: Challenges
• Identify new properties to model satellites
• Currently done by hand by Knowledge engineers
• Creation of new Qnodes for satellite instances
• Identified a schema for each satellite
• Feedback loop to Wikidata
• How to select a “trusty” statement when several values are available?
• Namespace issues
• Single namespace, or namespace per satellite?
• Inter-satellite linkages
11Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
Conclusions
• Tabular data exists in heterogeneous formats
• Difficult to find, use, augment and share
• WDPlus is a framework to help discover, improve, search, augment,
combine and share tabular data
• WDPlus framework for profiling and enriching datasets
• T2WML language to generate linked instances from tabular data
• Encouraging early results on usability
12Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
Help us extend WDPlus!
Do you have comments, suggestions or use cases? Contact me at:
Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
13
dgarijo@isi.edu

Más contenido relacionado

La actualidad más candente

Progress with FITS for analyzing video
Progress with FITS for analyzing videoProgress with FITS for analyzing video
Progress with FITS for analyzing video
prwheatley
 
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Stefan Dietze
 
DataViz_What_How_Why
DataViz_What_How_WhyDataViz_What_How_Why
DataViz_What_How_Why
Shweta Gupte
 

La actualidad más candente (19)

Endings and new beginnings: update on the Jisc Content programme 2011-13
Endings and new beginnings: update on the Jisc Content programme 2011-13 Endings and new beginnings: update on the Jisc Content programme 2011-13
Endings and new beginnings: update on the Jisc Content programme 2011-13
 
Cornell 2011 05-13
Cornell 2011 05-13Cornell 2011 05-13
Cornell 2011 05-13
 
Data Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities WorkshopData Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities Workshop
 
Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015
 
Progress with FITS for analyzing video
Progress with FITS for analyzing videoProgress with FITS for analyzing video
Progress with FITS for analyzing video
 
Research data spring: clipper
Research data spring: clipperResearch data spring: clipper
Research data spring: clipper
 
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014
 
EGU 2018 Ian McHarg Lecture
EGU 2018 Ian McHarg LectureEGU 2018 Ian McHarg Lecture
EGU 2018 Ian McHarg Lecture
 
NREN 3.0
NREN 3.0NREN 3.0
NREN 3.0
 
Davis Digital Preservation and the Web: Challenges for Libraries
Davis Digital Preservation and the Web: Challenges for LibrariesDavis Digital Preservation and the Web: Challenges for Libraries
Davis Digital Preservation and the Web: Challenges for Libraries
 
DPSY 6121/8121 Week 11 Dscussion
DPSY 6121/8121 Week 11 DscussionDPSY 6121/8121 Week 11 Dscussion
DPSY 6121/8121 Week 11 Dscussion
 
Cal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCal Poly - Data Management for Researchers
Cal Poly - Data Management for Researchers
 
Clipper - Enhancing Time-Based Media for Research
Clipper - Enhancing Time-Based Media for ResearchClipper - Enhancing Time-Based Media for Research
Clipper - Enhancing Time-Based Media for Research
 
Open Data Movement around the Globe
Open Data Movement around the GlobeOpen Data Movement around the Globe
Open Data Movement around the Globe
 
DataViz_What_How_Why
DataViz_What_How_WhyDataViz_What_How_Why
DataViz_What_How_Why
 
Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship...
Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship...Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship...
Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship...
 
Exploring platform boundary resources with a data-driven approach
Exploring platform boundary resources with a data-driven approachExploring platform boundary resources with a data-driven approach
Exploring platform boundary resources with a data-driven approach
 
The Tribal Approach Academia Takes to Research Data Management
The Tribal Approach Academia Takes to Research Data ManagementThe Tribal Approach Academia Takes to Research Data Management
The Tribal Approach Academia Takes to Research Data Management
 

Similar a WDPlus: Leveraging Wikidata to Link and Extend Tabular Data

Knowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your KnowledgeKnowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your Knowledge
Neo4j
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
Dr.-Ing. Thomas Hartmann
 
Rober stephenson
Rober stephensonRober stephenson
Rober stephenson
NASAPMC
 

Similar a WDPlus: Leveraging Wikidata to Link and Extend Tabular Data (20)

Wikidata Introductory Workshop
Wikidata Introductory WorkshopWikidata Introductory Workshop
Wikidata Introductory Workshop
 
Introduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdfIntroduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdf
 
Hahn "Wikidata as a hub to library linked data re-use"
Hahn "Wikidata as a hub to library linked data re-use"Hahn "Wikidata as a hub to library linked data re-use"
Hahn "Wikidata as a hub to library linked data re-use"
 
Knowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your KnowledgeKnowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your Knowledge
 
Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
 
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
 
Estermann Wikidata and Heritage Data 20170914
Estermann Wikidata and Heritage Data 20170914Estermann Wikidata and Heritage Data 20170914
Estermann Wikidata and Heritage Data 20170914
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
 
Rober stephenson
Rober stephensonRober stephenson
Rober stephenson
 
Keynote: Mark Parsons - Plans are Useless, But Planning is Essential
Keynote: Mark Parsons - Plans are Useless, But Planning is EssentialKeynote: Mark Parsons - Plans are Useless, But Planning is Essential
Keynote: Mark Parsons - Plans are Useless, But Planning is Essential
 
"Plans are worthless, but planning is essential"
"Plans are worthless, but planning is essential""Plans are worthless, but planning is essential"
"Plans are worthless, but planning is essential"
 
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
 
Conclusions: Summary and Outlook
Conclusions: Summary and OutlookConclusions: Summary and Outlook
Conclusions: Summary and Outlook
 
Global identities for a global knowledge
Global identities for a global knowledgeGlobal identities for a global knowledge
Global identities for a global knowledge
 
Driving Innovation with Knowledge Sharing and Open Data
Driving Innovation with Knowledge Sharing and Open DataDriving Innovation with Knowledge Sharing and Open Data
Driving Innovation with Knowledge Sharing and Open Data
 
2. DATABASE MODELING_Database Fundamentals.pptx
2. DATABASE MODELING_Database Fundamentals.pptx2. DATABASE MODELING_Database Fundamentals.pptx
2. DATABASE MODELING_Database Fundamentals.pptx
 

Más de dgarijo

Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
dgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
dgarijo
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
dgarijo
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
dgarijo
 

Más de dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 

Último

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 

Último (20)

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 

WDPlus: Leveraging Wikidata to Link and Extend Tabular Data

  • 1. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data Daniel Garijo, Pedro Szekely Information Sciences Institute and Department of Computer Science @dgarijov dgarijo@isi.edu
  • 2. Abundance of data sources in the Web Users of data face three challenges • How do I find relevant datasets for my problem? • How do I augment my dataset with existing information? • How can I share my integrated results with the community? Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) 2 Raw data
  • 3. Popular initiatives for addressing these challenges • Search individual items • Search is manual, based on user input • LOD cloud of connected datasets • Knowledge engineers are needed to map and augment content • ETL Frameworks (e.g, Karma, Open Refine) • Pipelines are custom, expertise required • Often not shared Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) 3 Sources: https://lod-cloud.net/versions/2019-03-29/lod-cloud.png; https://panoply.io/data-warehouse-guide/3-ways-to-build-an-etl-process/
  • 4. WDPlus A framework designed to: • Discover data on the Web • Improve raw data to make it useful • Search, querying dataset structure • Download fresh data • Combine existing dataset • Share improved data and methods Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) 4 1953-01-20 Harry Truman Lamar President USA 1884-05-08 1972-12-26 male 1945-04-12 Bress Truman weath er shoppi ng crime sports Core Satellites Metadata index
  • 5. WDPlus architecture Wikidata as a core KG • 60 Million items • 700 Million statements • 20,000 + contributors • +1 billion edits • Collaborative! 5 1953-01-20 Harry Truman Lamar President USA 1884-05-08 1972-12-26 male 1945-04-12 Bress Truman Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) Core
  • 6. WDPlus architecture Satellite organization • Detailed information on a domain • Crime records, sport events, etc. • Linked to the Wikidata core • Link first strategy • Custom properties and Qnodes • Extensions to core model • Synchronized with core • Decentralized • 1 satellite may be maintained by 1 community 6Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) 1953-01-20 Harry Truman Lamar President USA 1884-05-08 1972-12-26 male 1945-04-12 Bress Truman weath er shoppi ng crime sports Core Satellites
  • 7. WDPlus architecture Table models • Tables are not materialized • Able to become a satellite under demand • Described in machine-readable metadata index • Indexing columns names and relevant instances for fast retrieval • Link to table model is preserved 7Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) 1953-01-20 Harry Truman Lamar President USA 1884-05-08 1972-12-26 male 1945-04-12 Bress Truman weath er shoppi ng crime sports Core Satellites Metadata index
  • 8. Towards WDPLus 8Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) 1953-01-20 Harry Truman Lamar President USA 1884-05-08 1972-12-26 male 1945-04-12 Bress Truman weath er shoppi ng crime sports Core Satellites Metadata index
  • 9. WDPlus framework: Metadata index and table Augmentation • Search • Keywords, variables or content • Wikifier may be used in search • Download • Download a dataset or its metadata • Augment • Merge your dataset with contents from other datasets automatically • Upload • Add new datasets (automated metadata profiling and provenance) • Enrich • Header enrichment for search efficiency 9Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019)
  • 10. WDPlus framework: T2WML 10Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) Table overview Cell-based mapping. This mapping is saved in WDPlus for future reference Entity Linking Result sample Easy to share!
  • 11. Creating Wikidata Satellites: Challenges • Identify new properties to model satellites • Currently done by hand by Knowledge engineers • Creation of new Qnodes for satellite instances • Identified a schema for each satellite • Feedback loop to Wikidata • How to select a “trusty” statement when several values are available? • Namespace issues • Single namespace, or namespace per satellite? • Inter-satellite linkages 11Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019)
  • 12. Conclusions • Tabular data exists in heterogeneous formats • Difficult to find, use, augment and share • WDPlus is a framework to help discover, improve, search, augment, combine and share tabular data • WDPlus framework for profiling and enriching datasets • T2WML language to generate linked instances from tabular data • Encouraging early results on usability 12Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019)
  • 13. Help us extend WDPlus! Do you have comments, suggestions or use cases? Contact me at: Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) 13 dgarijo@isi.edu