SlideShare a Scribd company logo
1 of 13
Download to read offline
Cutting Long Stories Short
Fact Extraction from Wikipedia
Marco Fossati
fossati@spaziodati.eu
Poznan, 25th June 2015
What?
A Google Summer of Code Project for DBpedia
What?
Teaching Machines
to Read
Natural Language
Why?
Text Contains a Huge Amount of Knowledge
Why?
DBpedia Focuses on Semi-structured Data
Discovery of New Relations
Automatic Knowledge Base Population
How?
Machine Learning
+
Lexical Semantics
How?
Poland victory World Cup 2014
“Poland won the World Cup in 2014”
Approach
1. Lexical Units
1.1.Extraction via POS Tagging
1.2.Statistical Ranking
2. Frame Database (FrameNet, Kicktionary)
The Data-driven Way
Approach
3. Frame + Frame Elements Classification
Unsupervised, Rule-based
Supervised
4. Crowdsourced Training Set Construction
5. RDF Serialization
The Data-driven Way
Crowdsourcing the Annotation
Label words with Frame Elements
Use Case
Soccer Domain
Widely Represented (223.000 articles)
Lots of Semi-structured Data
Italian Wikipedia
Wanna contribute?
https://github.com/dbpedia/
fact-extractor
That’s all Folks!
Marco Fossati
fossati@spaziodati.eu

More Related Content

Similar to Fact Extraction from Wikipedia

Collaborative Ontology Building Project
Collaborative Ontology Building Project  Collaborative Ontology Building Project
Collaborative Ontology Building Project
Jie Bao
 
Cilip Seminar 6th October - Integrating With Open Source
Cilip Seminar 6th October - Integrating With Open SourceCilip Seminar 6th October - Integrating With Open Source
Cilip Seminar 6th October - Integrating With Open Source
Jonathan Field
 
E Swug2010 Info Lit
E Swug2010 Info LitE Swug2010 Info Lit
E Swug2010 Info Lit
Marcia Henry
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
Peter Mika
 
IKHarvester - Informal Knowledge Harvester
IKHarvester - Informal Knowledge HarvesterIKHarvester - Informal Knowledge Harvester
IKHarvester - Informal Knowledge Harvester
Jaroslaw Dobrzanski
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
Anja Jentzsch
 

Similar to Fact Extraction from Wikipedia (20)

Toward FAIR Semantic Resources
Toward FAIR Semantic ResourcesToward FAIR Semantic Resources
Toward FAIR Semantic Resources
 
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and RepairLOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
 
Collaborative Ontology Building Project
Collaborative Ontology Building Project  Collaborative Ontology Building Project
Collaborative Ontology Building Project
 
Cilip Seminar 6th October - Integrating With Open Source
Cilip Seminar 6th October - Integrating With Open SourceCilip Seminar 6th October - Integrating With Open Source
Cilip Seminar 6th October - Integrating With Open Source
 
Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries
 
E Swug2010 Info Lit
E Swug2010 Info LitE Swug2010 Info Lit
E Swug2010 Info Lit
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 
CNR Semantic Lab presentation
CNR Semantic Lab presentationCNR Semantic Lab presentation
CNR Semantic Lab presentation
 
IKHarvester - Informal Knowledge Harvester
IKHarvester - Informal Knowledge HarvesterIKHarvester - Informal Knowledge Harvester
IKHarvester - Informal Knowledge Harvester
 
Irish Digital Libraries Summit
Irish Digital Libraries SummitIrish Digital Libraries Summit
Irish Digital Libraries Summit
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticseLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in Linguistics
 
BESOCIAL A Knowledge Graph for Social Media Archiving
BESOCIAL A Knowledge Graph for Social Media ArchivingBESOCIAL A Knowledge Graph for Social Media Archiving
BESOCIAL A Knowledge Graph for Social Media Archiving
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
Linking data for Europeana
Linking data for EuropeanaLinking data for Europeana
Linking data for Europeana
 
Knowledge Hub on DSpace making Distance learning easier
Knowledge Hub on DSpace making Distance learning easierKnowledge Hub on DSpace making Distance learning easier
Knowledge Hub on DSpace making Distance learning easier
 
IPTC and the Semantic Web: Two Paths and Seven Lessons
IPTC and the Semantic Web: Two Paths and Seven LessonsIPTC and the Semantic Web: Two Paths and Seven Lessons
IPTC and the Semantic Web: Two Paths and Seven Lessons
 
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
 
WP2 1st Review
WP2 1st ReviewWP2 1st Review
WP2 1st Review
 

More from Marco Fossati

More from Marco Fossati (8)

StrepHit IEG Kick-off Seminar
StrepHit IEG Kick-off SeminarStrepHit IEG Kick-off Seminar
StrepHit IEG Kick-off Seminar
 
Unsupervised Learning of an Extensive and Usable Taxonomy for DBpedia
Unsupervised Learning of an Extensive and Usable Taxonomy for DBpediaUnsupervised Learning of an Extensive and Usable Taxonomy for DBpedia
Unsupervised Learning of an Extensive and Usable Taxonomy for DBpedia
 
What you Can Make Out of Linked Data
What you Can Make Out of Linked DataWhat you Can Make Out of Linked Data
What you Can Make Out of Linked Data
 
DBpedia: Glue for all Wikipedias and a Use Case for Multilingualism
DBpedia: Glue for all Wikipedias and a Use Case for MultilingualismDBpedia: Glue for all Wikipedias and a Use Case for Multilingualism
DBpedia: Glue for all Wikipedias and a Use Case for Multilingualism
 
Primo mapping sprint della DBpedia italiana
Primo mapping sprint della DBpedia italianaPrimo mapping sprint della DBpedia italiana
Primo mapping sprint della DBpedia italiana
 
DBpedia italiana
DBpedia italianaDBpedia italiana
DBpedia italiana
 
On Data quality
On Data qualityOn Data quality
On Data quality
 
Outsourcing FrameNet to the Crowd
Outsourcing FrameNet to the CrowdOutsourcing FrameNet to the Crowd
Outsourcing FrameNet to the Crowd
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Fact Extraction from Wikipedia