SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
DBpedia ♥ Commons 
Gaurav Vaidya - Dimitris Kontokostas - Andrea Di Menna - Jim O'Regan 
2nd DBpedia Meeting Leipzig 03.09.2014
~23M pages like this 
2nd DBpedia Meeting Leipzig 03.09.2014
~23M pages like this 
2nd DBpedia Meeting Leipzig 03.09.2014
A lot of pages like this 
2nd DBpedia Meeting Leipzig 03.09.2014
Many pages like this 
2nd DBpedia Meeting Leipzig 03.09.2014
Not very similar to pages like this 
2nd DBpedia Meeting Leipzig 03.09.2014
DBpedia Extraction Framework 
2nd DBpedia Meeting Leipzig 03.09.2014 
✔ “Wiki agnostic” 
✔ Pluggable 
extractors 
✔ Out of the box 
support for 
common 
metadata 
✗ Tuned for extraction in the main namespace (not File:) 
✗ Many other challenges left
2nd DBpedia Meeting Leipzig 03.09.2014 
Challenges 
✔ File metadata 
✔ KML files 
✔ Image Galleries 
✔ Image Annotations 
✔ Mappings Wiki 
✔ Bootstrap community mappings 
✔ Template Statistics 
✔ Licensing 
✔ Technical details I'll not go into
Out-of-the-box support 
2nd DBpedia Meeting Leipzig 03.09.2014 
● Categories (skos) 
● External links 
● Geo-coordinates 
● Raw infobox properties 
● Labels 
● PageIds / Revisions 
● Links (internal / external) 
● Mappings Wiki (with some tweaking / more on that later)
2nd DBpedia Meeting Leipzig 03.09.2014 
File metadata 
● New Extractor 
● New file Class hierarchy 
– dbo:File, dbo:Image, dbo:StillImage, dbo:MovingImage and 
dbo:Sound 
Sample Output: 
:Aeropetes.JPG a dbo:StillImage, dbo:Image, dbo:Document, dbo:File, Work; 
dcterms:type dbo:StillImage 
dbo:fileExtension "jpg" 
dcterms:format "image/jpeg" 
dbo:fileURL commons-path:Aeropetes.JPG ; 
foaf:depiction commons-path:Aeropetes.JPG ; 
dbo:thumbnail commons-path:Aeropetes.JPG?width=300 .
2nd DBpedia Meeting Leipzig 03.09.2014 
Image Galleries 
● Attach each gallery 
item to the page 
resource 
:Colorado dbo:hasGalleryItem 
Colorado.JPG, 
Denver_Colorado_Art.jpg, 
ColoradoCenter1.jpg.
Image Annotations 
2nd DBpedia Meeting Leipzig 03.09.2014 
● Annotation 
Gadget 
● Boxes with 
optional 
description
Image Annotations 
● W3 Media Fragments recommendation 
● Embed the box in the URI 
– ?width=15130&height=1886#xywh=pixel:10431,324,1670,1208> . 
● Add descriptions in the new resource 
2nd DBpedia Meeting Leipzig 03.09.2014
2nd DBpedia Meeting Leipzig 03.09.2014 
Mappings Wiki
Template Statistics 
2nd DBpedia Meeting Leipzig 03.09.2014
2nd DBpedia Meeting Leipzig 03.09.2014 
Licensing 
● Identified & imported automatically ~360 licence templates 
● Use the mappings wiki 
● Needed some hacking to make it work 
– e.g. {{Self|GFDL|cc-by-sa-3.0,2.5,2.0,1.0}} 
:Acraea_circeis.JPG dbo:license 
<http://creativecommons.org/publicdomain/mark/1.0/> 
:Antepipona_deflenda_-_2012-10-17.webm dbo:license < 
http://creativecommons.org/licenses/by-sa/3.0/ >
KML Annotations attached to media 
Attach raw KML data to resource with custom extractor 
Sample Output: 
:Yellowstone_1871b.jpg dbo:hasKMLData “”” 
?xml version=1.0 encoding=UTF-8?> 
<kml xmlns=http://earth.google.com/kml/2.2”> 
<GroundOverlay> 
<name>Yorktown, Indiana (1878)</name> 
<description>An 1878 map of Yorktown in Tippecanoe County, Indiana. Source: Kingman 
Brothers&apos; Combination Atlas Map of Tippecanoe County, Indiana, 1878.</description> 
<color>99ffffff</color><Icon><href>BIG_LINK_HERE</href> 
<viewBoundScale>0.75</viewBoundScale></Icon> 
<LatLonBox> 
<north>40.26126145890567</north><south>40.25777915632657</south> 
<east>-86.77033439383223</east><west>-86.77398493316619</west> 
<rotation>-1.123009884936565</rotation></LatLonBox> 
</GroundOverlay></kml>“”"^^rdfs:XMLLiteral . 
2nd DBpedia Meeting Leipzig 03.09.2014
2nd DBpedia Meeting Leipzig 03.09.2014 
Left TODOs 
● Nested templates are commonly used and cannot be handled 
by the mappings wiki atm 
– e.g. Media descriptions (although mapped) are missing 
{{Information |Description= {{en|Logo of the [[w:en:DBpedia|DBpedia project]]}} {{fr| 
Logo du projet [[w:fr:DBpedia|DBpedia]]}} 
● Annotation descriptions need some tweaking 
– Need to render wikitext 
● Put it under a SPARQL Endpoint 
● Provide Linked Data 
– http://commons.dbpedia.org
2nd DBpedia Meeting Leipzig 03.09.2014 
Thank You! 
Special thanks to: 
● Alexandru Todor (importing the License templates) 
● Google Summer of Code for sponsoring this project 
(Gaurav Vaidya) 
Questions? 
Dataset: http://nl.dbpedia.org/downloads/commonswiki 
Dataset samples: https://github.com/gaurav/commons-extraction

Más contenido relacionado

La actualidad más candente

Societal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending ComparisonSocietal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending ComparisonBigData_Europe
 
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...PROIDEA
 
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4BigData_Europe
 
Doing E-commerce Right – Magento on DigitalOcean
Doing E-commerce Right – Magento on DigitalOceanDoing E-commerce Right – Magento on DigitalOcean
Doing E-commerce Right – Magento on DigitalOceanDigitalOcean
 
BDE SC4 Hangout - Hajira Jabeen, general architecture
BDE SC4 Hangout - Hajira Jabeen, general architectureBDE SC4 Hangout - Hajira Jabeen, general architecture
BDE SC4 Hangout - Hajira Jabeen, general architectureBigData_Europe
 
PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors Victor de Boer
 

La actualidad más candente (6)

Societal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending ComparisonSocietal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending Comparison
 
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
 
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
 
Doing E-commerce Right – Magento on DigitalOcean
Doing E-commerce Right – Magento on DigitalOceanDoing E-commerce Right – Magento on DigitalOcean
Doing E-commerce Right – Magento on DigitalOcean
 
BDE SC4 Hangout - Hajira Jabeen, general architecture
BDE SC4 Hangout - Hajira Jabeen, general architectureBDE SC4 Hangout - Hajira Jabeen, general architecture
BDE SC4 Hangout - Hajira Jabeen, general architecture
 
PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors
 

Destacado

RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)Dimitris Kontokostas
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFDimitris Kontokostas
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsDimitris Kontokostas
 
Semantically enhanced quality assurance in the jurion business use case
Semantically enhanced quality assurance in the jurion  business use caseSemantically enhanced quality assurance in the jurion  business use case
Semantically enhanced quality assurance in the jurion business use caseDimitris Kontokostas
 
DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)Dimitris Kontokostas
 
DBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDimitris Kontokostas
 
8th DBpedia meeting / California 2016
8th DBpedia meeting /  California 20168th DBpedia meeting /  California 2016
8th DBpedia meeting / California 2016Dimitris Kontokostas
 
Assessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset QualityAssessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset Qualityandimou
 

Destacado (9)

DBpedia past, present & future
DBpedia past, present & futureDBpedia past, present & future
DBpedia past, present & future
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDF
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
Semantically enhanced quality assurance in the jurion business use case
Semantically enhanced quality assurance in the jurion  business use caseSemantically enhanced quality assurance in the jurion  business use case
Semantically enhanced quality assurance in the jurion business use case
 
DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)
 
DBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in Dublin
 
8th DBpedia meeting / California 2016
8th DBpedia meeting /  California 20168th DBpedia meeting /  California 2016
8th DBpedia meeting / California 2016
 
Assessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset QualityAssessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset Quality
 

Similar a DBpedia ♥ Commons

Azure Nights August2017
Azure Nights August2017Azure Nights August2017
Azure Nights August2017Michael Frank
 
Recent c++ goodies (March 2018)
Recent c++ goodies (March 2018)Recent c++ goodies (March 2018)
Recent c++ goodies (March 2018)Bartlomiej Filipek
 
2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar SlidesDuraSpace
 
Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...
Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...
Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...MariaDB plc
 
Strategies for Context Data Persistence
Strategies for Context Data PersistenceStrategies for Context Data Persistence
Strategies for Context Data PersistenceFIWARE
 
Categorizing Docker Hub Public Images
Categorizing Docker Hub Public ImagesCategorizing Docker Hub Public Images
Categorizing Docker Hub Public ImagesRoberto Hashioka
 
Bring Your Own Container: Using Docker Images In Production
Bring Your Own Container: Using Docker Images In ProductionBring Your Own Container: Using Docker Images In Production
Bring Your Own Container: Using Docker Images In ProductionDatabricks
 
Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017
Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017
Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017Alexey Grigorev
 
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data PersistenceFIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data PersistenceFIWARE
 
Modern database in browsers, Дмитро Тарасенко
Modern database in browsers, Дмитро ТарасенкоModern database in browsers, Дмитро Тарасенко
Modern database in browsers, Дмитро ТарасенкоSigma Software
 
Drupal 7 and RDF
Drupal 7 and RDFDrupal 7 and RDF
Drupal 7 and RDFscorlosquet
 
Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1Henry S
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)Michael Rys
 
IWMW 1998: Deploying new web technologies
IWMW 1998: Deploying new web technologiesIWMW 1998: Deploying new web technologies
IWMW 1998: Deploying new web technologiesIWMW
 
Scaling and hardware provisioning for databases (lessons learned at wikipedia)
Scaling and hardware provisioning for databases (lessons learned at wikipedia)Scaling and hardware provisioning for databases (lessons learned at wikipedia)
Scaling and hardware provisioning for databases (lessons learned at wikipedia)Jaime Crespo
 
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_CloudKoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_CloudTobias Koprowski
 
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Radulescu Adina-Valentina
 
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...Amazon Web Services
 
Unicon June 2014 IAM Briefing
Unicon June 2014 IAM BriefingUnicon June 2014 IAM Briefing
Unicon June 2014 IAM BriefingJohn Gasper
 

Similar a DBpedia ♥ Commons (20)

The DBpedia databus
The DBpedia databusThe DBpedia databus
The DBpedia databus
 
Azure Nights August2017
Azure Nights August2017Azure Nights August2017
Azure Nights August2017
 
Recent c++ goodies (March 2018)
Recent c++ goodies (March 2018)Recent c++ goodies (March 2018)
Recent c++ goodies (March 2018)
 
2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides
 
Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...
Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...
Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...
 
Strategies for Context Data Persistence
Strategies for Context Data PersistenceStrategies for Context Data Persistence
Strategies for Context Data Persistence
 
Categorizing Docker Hub Public Images
Categorizing Docker Hub Public ImagesCategorizing Docker Hub Public Images
Categorizing Docker Hub Public Images
 
Bring Your Own Container: Using Docker Images In Production
Bring Your Own Container: Using Docker Images In ProductionBring Your Own Container: Using Docker Images In Production
Bring Your Own Container: Using Docker Images In Production
 
Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017
Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017
Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017
 
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data PersistenceFIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
 
Modern database in browsers, Дмитро Тарасенко
Modern database in browsers, Дмитро ТарасенкоModern database in browsers, Дмитро Тарасенко
Modern database in browsers, Дмитро Тарасенко
 
Drupal 7 and RDF
Drupal 7 and RDFDrupal 7 and RDF
Drupal 7 and RDF
 
Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
 
IWMW 1998: Deploying new web technologies
IWMW 1998: Deploying new web technologiesIWMW 1998: Deploying new web technologies
IWMW 1998: Deploying new web technologies
 
Scaling and hardware provisioning for databases (lessons learned at wikipedia)
Scaling and hardware provisioning for databases (lessons learned at wikipedia)Scaling and hardware provisioning for databases (lessons learned at wikipedia)
Scaling and hardware provisioning for databases (lessons learned at wikipedia)
 
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_CloudKoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
 
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
 
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
 
Unicon June 2014 IAM Briefing
Unicon June 2014 IAM BriefingUnicon June 2014 IAM Briefing
Unicon June 2014 IAM Briefing
 

Último

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

DBpedia ♥ Commons

  • 1. DBpedia ♥ Commons Gaurav Vaidya - Dimitris Kontokostas - Andrea Di Menna - Jim O'Regan 2nd DBpedia Meeting Leipzig 03.09.2014
  • 2. ~23M pages like this 2nd DBpedia Meeting Leipzig 03.09.2014
  • 3. ~23M pages like this 2nd DBpedia Meeting Leipzig 03.09.2014
  • 4. A lot of pages like this 2nd DBpedia Meeting Leipzig 03.09.2014
  • 5. Many pages like this 2nd DBpedia Meeting Leipzig 03.09.2014
  • 6. Not very similar to pages like this 2nd DBpedia Meeting Leipzig 03.09.2014
  • 7. DBpedia Extraction Framework 2nd DBpedia Meeting Leipzig 03.09.2014 ✔ “Wiki agnostic” ✔ Pluggable extractors ✔ Out of the box support for common metadata ✗ Tuned for extraction in the main namespace (not File:) ✗ Many other challenges left
  • 8. 2nd DBpedia Meeting Leipzig 03.09.2014 Challenges ✔ File metadata ✔ KML files ✔ Image Galleries ✔ Image Annotations ✔ Mappings Wiki ✔ Bootstrap community mappings ✔ Template Statistics ✔ Licensing ✔ Technical details I'll not go into
  • 9. Out-of-the-box support 2nd DBpedia Meeting Leipzig 03.09.2014 ● Categories (skos) ● External links ● Geo-coordinates ● Raw infobox properties ● Labels ● PageIds / Revisions ● Links (internal / external) ● Mappings Wiki (with some tweaking / more on that later)
  • 10. 2nd DBpedia Meeting Leipzig 03.09.2014 File metadata ● New Extractor ● New file Class hierarchy – dbo:File, dbo:Image, dbo:StillImage, dbo:MovingImage and dbo:Sound Sample Output: :Aeropetes.JPG a dbo:StillImage, dbo:Image, dbo:Document, dbo:File, Work; dcterms:type dbo:StillImage dbo:fileExtension "jpg" dcterms:format "image/jpeg" dbo:fileURL commons-path:Aeropetes.JPG ; foaf:depiction commons-path:Aeropetes.JPG ; dbo:thumbnail commons-path:Aeropetes.JPG?width=300 .
  • 11. 2nd DBpedia Meeting Leipzig 03.09.2014 Image Galleries ● Attach each gallery item to the page resource :Colorado dbo:hasGalleryItem Colorado.JPG, Denver_Colorado_Art.jpg, ColoradoCenter1.jpg.
  • 12. Image Annotations 2nd DBpedia Meeting Leipzig 03.09.2014 ● Annotation Gadget ● Boxes with optional description
  • 13. Image Annotations ● W3 Media Fragments recommendation ● Embed the box in the URI – ?width=15130&height=1886#xywh=pixel:10431,324,1670,1208> . ● Add descriptions in the new resource 2nd DBpedia Meeting Leipzig 03.09.2014
  • 14. 2nd DBpedia Meeting Leipzig 03.09.2014 Mappings Wiki
  • 15. Template Statistics 2nd DBpedia Meeting Leipzig 03.09.2014
  • 16. 2nd DBpedia Meeting Leipzig 03.09.2014 Licensing ● Identified & imported automatically ~360 licence templates ● Use the mappings wiki ● Needed some hacking to make it work – e.g. {{Self|GFDL|cc-by-sa-3.0,2.5,2.0,1.0}} :Acraea_circeis.JPG dbo:license <http://creativecommons.org/publicdomain/mark/1.0/> :Antepipona_deflenda_-_2012-10-17.webm dbo:license < http://creativecommons.org/licenses/by-sa/3.0/ >
  • 17. KML Annotations attached to media Attach raw KML data to resource with custom extractor Sample Output: :Yellowstone_1871b.jpg dbo:hasKMLData “”” ?xml version=1.0 encoding=UTF-8?> <kml xmlns=http://earth.google.com/kml/2.2”> <GroundOverlay> <name>Yorktown, Indiana (1878)</name> <description>An 1878 map of Yorktown in Tippecanoe County, Indiana. Source: Kingman Brothers&apos; Combination Atlas Map of Tippecanoe County, Indiana, 1878.</description> <color>99ffffff</color><Icon><href>BIG_LINK_HERE</href> <viewBoundScale>0.75</viewBoundScale></Icon> <LatLonBox> <north>40.26126145890567</north><south>40.25777915632657</south> <east>-86.77033439383223</east><west>-86.77398493316619</west> <rotation>-1.123009884936565</rotation></LatLonBox> </GroundOverlay></kml>“”"^^rdfs:XMLLiteral . 2nd DBpedia Meeting Leipzig 03.09.2014
  • 18. 2nd DBpedia Meeting Leipzig 03.09.2014 Left TODOs ● Nested templates are commonly used and cannot be handled by the mappings wiki atm – e.g. Media descriptions (although mapped) are missing {{Information |Description= {{en|Logo of the [[w:en:DBpedia|DBpedia project]]}} {{fr| Logo du projet [[w:fr:DBpedia|DBpedia]]}} ● Annotation descriptions need some tweaking – Need to render wikitext ● Put it under a SPARQL Endpoint ● Provide Linked Data – http://commons.dbpedia.org
  • 19. 2nd DBpedia Meeting Leipzig 03.09.2014 Thank You! Special thanks to: ● Alexandru Todor (importing the License templates) ● Google Summer of Code for sponsoring this project (Gaurav Vaidya) Questions? Dataset: http://nl.dbpedia.org/downloads/commonswiki Dataset samples: https://github.com/gaurav/commons-extraction