SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
Query modeling and information retrieval within
              the Web of Data

                  Cristian LAI
                      clai@crs4.it




                       CRS4




                    september 6, 2012        1 / 37
Outline


G   Motivation

G   UnStructured Data

G   Structured Data

G   Query building

G   Applications

G   Conclusion




                        september 6, 2012   2 / 37
Context
Semantic Web




http://www.w3.org/2006/Talks/1023-sb-W3CTechSemWeb/

                                                      september 6, 2012   3 / 37
Motivation
Search on the Web




http://www.slideshare.net/novaspivack/web-evolution-nova-spivack-twine

                                                                    september 6, 2012   4 / 37
Outline


G   Motivation

G   UnStructured Data

G   Structured Data

G   Query building

G   Applications

G   Conclusion




                        september 6, 2012   5 / 37
Wikipedia


 G   Started in 2001.
 G   Is a multilingual, web-based, free-content encyclopedia project based on
     an openly editable model.
 G   Is the 5th site on the web and serves 454 million unique visitors monthly as
     of March 2011.
 G   Has fewer than 100 employees.
 G   Wikipedia holds an annual fundraiser instead of accepting advertising. You
     may have seen "A personal appeal from Wikipedia founder Jimmy Wales" if
     you’ve used the online encyclopedia during the last weeks of 2011. Google
     co-founder Sergey Brin and his wife, Anne Wojcicki, has given a 500,000
     dollars grant to help Wikipedia fund its 28.3 million dollars annual budget.




                                   september 6, 2012                        6 / 37
Wikipedia



 G   Pros:
       H   Is a highly-efficient not-for-profit organization.
       H   Is the finest example of truly collaborative created content: >19M articles;
           >270 languages, >82k active contributors.
       H   Covers many topics and domains, articles are a result of a community
           consensus.
 G   Cons:
       H   Contains many inconsistencies.
              G   Disclaimer: Wikipedia cannot guarantee the validity of the information found here.
       H   Is not very well integrated with other data sources.
       H   Queries and search are not facilitated due to the lacks of structured
           representation.




                                            september 6, 2012                                 7 / 37
Issues




 G   UnStructured data, keywords based search.
 G   Simple questions are hard to answer.
       H   People who were born in Rome before 1900.
       H   Italian musicians with English and French descriptions.
       H   The official websites of companies with more than 500 employees.
 G   The information required to answer these is contained in Wikipedia.
 G   Transforming Wikipedia into a knowledge base.
       H   To reveal the structure and semantics of Wikipedia content
       H   The DBpedia project.




                                       september 6, 2012                     8 / 37
Structure in Wikipedia

  G   Wikipedia articles consist mostly of free text, but also contain different
      types of structured information, such as infobox templates,categorisation
      information, images, geo-coordinates, and links to external Web pages.
  G   Title
  G   Abstract
  G   Infobox Template
  G   Geo-coordinates
  G   Caegories
  G   Images
  G   Links
        H     other language version
        H     other Wikipedia pages
        H     redirects
        H     disambiguation


                                       september 6, 2012                     9 / 37
Structured Information in Wikipedia




                          september 6, 2012   10 / 37
Structured Information in Wikipedia




                          september 6, 2012   11 / 37
Structured Information in Wikipedia




                          september 6, 2012   12 / 37
Outline


G   Motivation

G   UnStructured Data

G   Structured Data

G   Query building

G   Applications

G   Conclusion




                        september 6, 2012   13 / 37
RDF representation
Knowledge Base



dbp:Cagliari rdf:type dbp:City
dbp:Cagliari dbp:Title "Cagliari"
dbp:Cagliari dbp:Country dbp:Italy
dbp:Cagliari dbp:postalCode 09100
dbp:Cagliari geo:lat "39.246387"xsd:float
dbp:Cagliari geo:long "9.057500"xsd:float
dbp:Cagliari rdf:type yago:MediterraneanPortCitiesAndTownsInItaly
...
  G   An environment for collecting and structuring data.
  G   Well defined structure of classification.




                                    september 6, 2012               14 / 37
RDF


 G   Triples: (subject, predicate, object)
 G   Subject and object
       H   are both URIs that each identify a resource, or a URI and a string literal
           respectively.
       H

 G   Predicate
       H   specifies how the subject and object are related, and is also represented by a
           URI.
 G   For example:
       H   A knows B
       H   C isAuthorOf D
       H   Two resources linked in this fashion can be drawn from different data sets on
           the Web, allowing data in one data source to be linked to that in another,
           thereby creating a Web of Data.



                                        september 6, 2012                               15 / 37
DBpedia


 G   Started in 2007.
 G   Is the result of a community effort to extract structured information from
     Wikipedia.
 G   Makes Wikipedia data available as RDF.
 G   Results: The DBpedia Data Set
       H   describes 3.64 million "things" with over half a billion "facts" (July 2011), 364k
           persons, 462k places, 99k music albums, 54k films, 148k organisations;
       H   extraction in 97 different languages;
       H   672M RDF triples
 G   It is maintained by: Universität Leipzig, Freie Universität Berlin, OpenLink
     Software, Inc.
 G   See http://wiki.dbpedia.org/Team



                                         september 6, 2012                             16 / 37
Nucleus of the Web of Data



 G   Within the W3C Linking Open Data (LOD) community effort.
 G   Tim Berners-Lee’s Linked Data principles.
       H   URI
       H   HTTP
       H   RDF, SPARQL
       H   Interlinking among data providers
 G   An increasing number of data providers have started to publish and
     interlink data on the Web.
 G   Several billion RDF triples and covers domains such as geographic
     information, people, companies, online communities, films, music, books
     and scientific publications.




                                       september 6, 2012                  17 / 37
LOD Datasets




               september 6, 2012   18 / 37
LOD Datasets




               september 6, 2012   19 / 37
Outline


G   Motivation

G   UnStructured Data

G   Structured Data

G   Query building

G   Applications

G   Conclusion




                        september 6, 2012   20 / 37
SPARQL Query Language


 G   RDF is a directed, labeled graph data format for representing information
     (also in the Web).
 G   SPARQL is a language for querying RDF graphs by specifying templates
     against which to compare graph components. Data which matches or
     satisfies a template is returned from the query.
 G   A triple template contains variables that represent triplet components (e.g.,
     ?s, ?p, or ?o within a triplet).
 G   Example:
       H   ?person ex:age "20"xsd:integer .
       H   Identifies a list of triplet subjects that have an ex:age property of "20".
           Analogous to asking "Who has age 20?".
       H   The SPARQL query engine will return a list of the subject component of triples
           that satisfy each query through value substitution.



                                       september 6, 2012                          21 / 37
SPARQL Queries

                        SELECT variables_list
                            FROM < RDF_source_URL >
                        WHERE {
                            { triple_pattern_1 .
                              . . .
                              triple_pattern_n . }.
                        }

       SELECT ?person
           FROM < http://ex.com >                       ?person
       WHERE {                                          ------------------
           ?person ex:age "20"xsd:integer .              _p1
       }                                                 _p2
                                                         . . .


                                    september 6, 2012                        22 / 37
The DBpedia SPARQL endpoint




 G   All data sets are available for queries via the DBpedia SPARQL endpoint
     (http://dbpedia.org/sparql).
 G   Querying the data set:
       H   ...
       H   Abstracts of movies starring Tom Cruise, released before 1999.
       H   The official websites of companies with more than 50000 employees.
       H   Cities with more than 2 million habitants.
       H   ...




                                     september 6, 2012                         23 / 37
Abstracts of movies starring Tom Cruise, released before
1999

                                              SPARQL
             SELECT ?subject ?label ?released ?abstract WHERE {
                ?subject rdf:type <http://dbpedia.org/ontology/Film>.
                ?subject dbpedia2:starring <http://dbpedia.org/resource/Tom_Cruise>.
                ?subject rdfs:comment ?abstract.
                ?subject rdfs:label ?label.
                  FILTER(lang(?abstract) = "en" && lang(?label) = "en").
                ?subject <http://dbpedia.org/ontology/releaseDate> ?released.
                  FILTER(xsd:date(?released) < "2000-01-01"^^xsd:date).
             } ORDER BY ?released




                                        september 6, 2012                              24 / 37
Outline


G   Motivation

G   UnStructured Data

G   Structured Data

G   Query building

G   Applications

G   Conclusion




                        september 6, 2012   25 / 37
Linked Data Search Engines and Indexes




 G   A number of search engines have been developed that crawl Linked Data
     from the Web by following RDF links, and provide query capabilities over
     aggregated data.
     Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web:

     Theory and Technology, 1:1, 1-136. Morgan & Claypool.


 G   Google, Bing and Yahoo! agree to create and support a common
     vocabulary for structured data markup on web pages.
 G   Facebook has started to support RDF and Linked Data URIs and now
     provides access to parts of its user data via a Linked Data API.




                                                                september 6, 2012                                                          26 / 37
Google rich snippets




                       september 6, 2012   27 / 37
Twitter, #annotations
Twitter API based client




                           september 6, 2012   28 / 37
Twitter, #annotations
Lookup annotations




                        september 6, 2012   29 / 37
Twitter, #annotations
Resource #dbpedia:Cagliari




                             september 6, 2012   30 / 37
Twitter, #annotations
Resource #dbpedia:Cagliari




                             september 6, 2012   31 / 37
Question answering
Risorsa Cagliari




                     september 6, 2012   32 / 37
Question answering
Template




                     september 6, 2012   33 / 37
Question answering
RDF/XML




                     september 6, 2012   34 / 37
Outline


G   Motivation

G   UnStructured Data

G   Structured Data

G   Query building

G   Applications

G   Conclusion




                        september 6, 2012   35 / 37
Conclusion




 G   Data on the Web is a major challenge; technologies are needed to use
     them, to interact with them, to integrate them.
 G   Semantic Web technologies (RDF, SPARQL, etc.) can play a major role in
     publishing and using Data on the Web.
 G   Users can largely benefit from the wide world of structured content.
 G   Content providers joining the Linking Open Data project are contributing
     to create more meaningful navigation paths not only within websites but
     across the whole web.




                                   september 6, 2012                       36 / 37
Q&A




september 6, 2012   37 / 37

Más contenido relacionado

La actualidad más candente

Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationVladimir Alexiev, PhD, PMP
 
Distributed Query Processing for Federated RDF Data Management
Distributed Query Processing for Federated RDF Data ManagementDistributed Query Processing for Federated RDF Data Management
Distributed Query Processing for Federated RDF Data ManagementOlafGoerlitz
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationVladimir Alexiev, PhD, PMP
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of DataRinke Hoekstra
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsArangoDB Database
 
Modelling context and statement-level metadata in knowledge graphs
Modelling context and statement-level metadata in knowledge graphsModelling context and statement-level metadata in knowledge graphs
Modelling context and statement-level metadata in knowledge graphsFabrizio Orlandi
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphsandyseaborne
 
EDF2012 Mariana Damova - Factforge
EDF2012   Mariana Damova - FactforgeEDF2012   Mariana Damova - Factforge
EDF2012 Mariana Damova - FactforgeEuropean Data Forum
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic WebMyungjin Lee
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In PracticeMarcia Zeng
 
The Dublin Core 1:1 Principle in the Age of Linked Data
The Dublin Core 1:1 Principle in the Age of Linked DataThe Dublin Core 1:1 Principle in the Age of Linked Data
The Dublin Core 1:1 Principle in the Age of Linked DataRichard Urban
 

La actualidad más candente (20)

Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 
Distributed Query Processing for Federated RDF Data Management
Distributed Query Processing for Federated RDF Data ManagementDistributed Query Processing for Federated RDF Data Management
Distributed Query Processing for Federated RDF Data Management
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
 
Dublin Core Intro
Dublin Core IntroDublin Core Intro
Dublin Core Intro
 
Linked Open Data stuff
Linked Open Data stuffLinked Open Data stuff
Linked Open Data stuff
 
Jesús Barrasa
Jesús BarrasaJesús Barrasa
Jesús Barrasa
 
2011 07 keynote-ktw
2011 07 keynote-ktw2011 07 keynote-ktw
2011 07 keynote-ktw
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge Graphs
 
Modelling context and statement-level metadata in knowledge graphs
Modelling context and statement-level metadata in knowledge graphsModelling context and statement-level metadata in knowledge graphs
Modelling context and statement-level metadata in knowledge graphs
 
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data StreamsEfficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data Streams
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphs
 
EDF2012 Mariana Damova - Factforge
EDF2012   Mariana Damova - FactforgeEDF2012   Mariana Damova - Factforge
EDF2012 Mariana Damova - Factforge
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic Web
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Lod2 review meeting
Lod2 review meetingLod2 review meeting
Lod2 review meeting
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In Practice
 
The Dublin Core 1:1 Principle in the Age of Linked Data
The Dublin Core 1:1 Principle in the Age of Linked DataThe Dublin Core 1:1 Principle in the Age of Linked Data
The Dublin Core 1:1 Principle in the Age of Linked Data
 
Dublin core Presentation
Dublin core PresentationDublin core Presentation
Dublin core Presentation
 

Similar a Query Modeling and Information Retrieval within the Web of Data

cristian_lai_webofdata
cristian_lai_webofdatacristian_lai_webofdata
cristian_lai_webofdataCristian Lai
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RESChristophe Guéret
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)net2-project
 
How google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrowHow google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrowVasu Jain
 
Learning Commonalities in RDF
Learning Commonalities in RDFLearning Commonalities in RDF
Learning Commonalities in RDFSara EL HASSAD
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2Dimitris Kontokostas
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOChris Mungall
 
Triplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataTriplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataRoberto García
 
Using Hyperlinks to Enrich Message Board Content with Linked Data
Using Hyperlinks to Enrich Message Board Content with Linked DataUsing Hyperlinks to Enrich Message Board Content with Linked Data
Using Hyperlinks to Enrich Message Board Content with Linked DataSheila Kinsella
 
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Ontotext
 
Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021hala Skaf
 
Semantic Markup
Semantic Markup Semantic Markup
Semantic Markup R A Akerkar
 
Towards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsTowards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsSpeck&Tech
 
Sharing irish place names as linked open data - Rebecca Grant
Sharing irish place names as linked open data - Rebecca GrantSharing irish place names as linked open data - Rebecca Grant
Sharing irish place names as linked open data - Rebecca Grantdri_ireland
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupalemmanuel_jamin
 

Similar a Query Modeling and Information Retrieval within the Web of Data (20)

cristian_lai_webofdata
cristian_lai_webofdatacristian_lai_webofdata
cristian_lai_webofdata
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RES
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 
KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
How google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrowHow google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrow
 
Learning Commonalities in RDF
Learning Commonalities in RDFLearning Commonalities in RDF
Learning Commonalities in RDF
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
 
Triplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataTriplificating and linking XBRL financial data
Triplificating and linking XBRL financial data
 
Using Hyperlinks to Enrich Message Board Content with Linked Data
Using Hyperlinks to Enrich Message Board Content with Linked DataUsing Hyperlinks to Enrich Message Board Content with Linked Data
Using Hyperlinks to Enrich Message Board Content with Linked Data
 
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
 
Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021
 
Semantic Markup
Semantic Markup Semantic Markup
Semantic Markup
 
HyperGraphQL
HyperGraphQLHyperGraphQL
HyperGraphQL
 
Where is the World is my Open Government Data?
Where is the World is my Open Government Data?Where is the World is my Open Government Data?
Where is the World is my Open Government Data?
 
Towards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsTowards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIs
 
Linked Data
Linked DataLinked Data
Linked Data
 
Sharing irish place names as linked open data - Rebecca Grant
Sharing irish place names as linked open data - Rebecca GrantSharing irish place names as linked open data - Rebecca Grant
Sharing irish place names as linked open data - Rebecca Grant
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 

Más de CRS4 Research Center in Sardinia

Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015
Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015
Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015CRS4 Research Center in Sardinia
 
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...CRS4 Research Center in Sardinia
 
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...CRS4 Research Center in Sardinia
 
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid Alfonso Damiano (Università di Cagliari) ICT per Smart Grid
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid CRS4 Research Center in Sardinia
 
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...CRS4 Research Center in Sardinia
 
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...CRS4 Research Center in Sardinia
 
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015CRS4 Research Center in Sardinia
 
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...CRS4 Research Center in Sardinia
 
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)CRS4 Research Center in Sardinia
 
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...CRS4 Research Center in Sardinia
 
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...CRS4 Research Center in Sardinia
 

Más de CRS4 Research Center in Sardinia (20)

The future is close
The future is closeThe future is close
The future is close
 
The future is close
The future is closeThe future is close
The future is close
 
Presentazione Linea B2 progetto Tutti a Iscol@ 2017
Presentazione Linea B2 progetto Tutti a Iscol@ 2017Presentazione Linea B2 progetto Tutti a Iscol@ 2017
Presentazione Linea B2 progetto Tutti a Iscol@ 2017
 
Iscola linea B 2016
Iscola linea B 2016Iscola linea B 2016
Iscola linea B 2016
 
Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015
Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015
Sequenziamento Esomico. Maria Valentini (CRS4), Cagliari, 18 Novembre 2015
 
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
 
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...
 
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid Alfonso Damiano (Università di Cagliari) ICT per Smart Grid
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid
 
Big Data Infrastructures - Hadoop ecosystem, M. E. Piras
Big Data Infrastructures - Hadoop ecosystem, M. E. PirasBig Data Infrastructures - Hadoop ecosystem, M. E. Piras
Big Data Infrastructures - Hadoop ecosystem, M. E. Piras
 
Big Data Analytics, Giovanni Delussu e Marco Enrico Piras
 Big Data Analytics, Giovanni Delussu e Marco Enrico Piras  Big Data Analytics, Giovanni Delussu e Marco Enrico Piras
Big Data Analytics, Giovanni Delussu e Marco Enrico Piras
 
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...
 
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...
 
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015
 
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...
 
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)
 
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
 
SmartGeo/Eiagrid portal (Guido Satta, CRS4)
SmartGeo/Eiagrid portal (Guido Satta, CRS4)SmartGeo/Eiagrid portal (Guido Satta, CRS4)
SmartGeo/Eiagrid portal (Guido Satta, CRS4)
 
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...
 
Mobile Graphics (part2)
Mobile Graphics (part2)Mobile Graphics (part2)
Mobile Graphics (part2)
 
Mobile Graphics (part1)
Mobile Graphics (part1)Mobile Graphics (part1)
Mobile Graphics (part1)
 

Query Modeling and Information Retrieval within the Web of Data

  • 1. Query modeling and information retrieval within the Web of Data Cristian LAI clai@crs4.it CRS4 september 6, 2012 1 / 37
  • 2. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 2 / 37
  • 4. Motivation Search on the Web http://www.slideshare.net/novaspivack/web-evolution-nova-spivack-twine september 6, 2012 4 / 37
  • 5. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 5 / 37
  • 6. Wikipedia G Started in 2001. G Is a multilingual, web-based, free-content encyclopedia project based on an openly editable model. G Is the 5th site on the web and serves 454 million unique visitors monthly as of March 2011. G Has fewer than 100 employees. G Wikipedia holds an annual fundraiser instead of accepting advertising. You may have seen "A personal appeal from Wikipedia founder Jimmy Wales" if you’ve used the online encyclopedia during the last weeks of 2011. Google co-founder Sergey Brin and his wife, Anne Wojcicki, has given a 500,000 dollars grant to help Wikipedia fund its 28.3 million dollars annual budget. september 6, 2012 6 / 37
  • 7. Wikipedia G Pros: H Is a highly-efficient not-for-profit organization. H Is the finest example of truly collaborative created content: >19M articles; >270 languages, >82k active contributors. H Covers many topics and domains, articles are a result of a community consensus. G Cons: H Contains many inconsistencies. G Disclaimer: Wikipedia cannot guarantee the validity of the information found here. H Is not very well integrated with other data sources. H Queries and search are not facilitated due to the lacks of structured representation. september 6, 2012 7 / 37
  • 8. Issues G UnStructured data, keywords based search. G Simple questions are hard to answer. H People who were born in Rome before 1900. H Italian musicians with English and French descriptions. H The official websites of companies with more than 500 employees. G The information required to answer these is contained in Wikipedia. G Transforming Wikipedia into a knowledge base. H To reveal the structure and semantics of Wikipedia content H The DBpedia project. september 6, 2012 8 / 37
  • 9. Structure in Wikipedia G Wikipedia articles consist mostly of free text, but also contain different types of structured information, such as infobox templates,categorisation information, images, geo-coordinates, and links to external Web pages. G Title G Abstract G Infobox Template G Geo-coordinates G Caegories G Images G Links H other language version H other Wikipedia pages H redirects H disambiguation september 6, 2012 9 / 37
  • 10. Structured Information in Wikipedia september 6, 2012 10 / 37
  • 11. Structured Information in Wikipedia september 6, 2012 11 / 37
  • 12. Structured Information in Wikipedia september 6, 2012 12 / 37
  • 13. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 13 / 37
  • 14. RDF representation Knowledge Base dbp:Cagliari rdf:type dbp:City dbp:Cagliari dbp:Title "Cagliari" dbp:Cagliari dbp:Country dbp:Italy dbp:Cagliari dbp:postalCode 09100 dbp:Cagliari geo:lat "39.246387"xsd:float dbp:Cagliari geo:long "9.057500"xsd:float dbp:Cagliari rdf:type yago:MediterraneanPortCitiesAndTownsInItaly ... G An environment for collecting and structuring data. G Well defined structure of classification. september 6, 2012 14 / 37
  • 15. RDF G Triples: (subject, predicate, object) G Subject and object H are both URIs that each identify a resource, or a URI and a string literal respectively. H G Predicate H specifies how the subject and object are related, and is also represented by a URI. G For example: H A knows B H C isAuthorOf D H Two resources linked in this fashion can be drawn from different data sets on the Web, allowing data in one data source to be linked to that in another, thereby creating a Web of Data. september 6, 2012 15 / 37
  • 16. DBpedia G Started in 2007. G Is the result of a community effort to extract structured information from Wikipedia. G Makes Wikipedia data available as RDF. G Results: The DBpedia Data Set H describes 3.64 million "things" with over half a billion "facts" (July 2011), 364k persons, 462k places, 99k music albums, 54k films, 148k organisations; H extraction in 97 different languages; H 672M RDF triples G It is maintained by: Universität Leipzig, Freie Universität Berlin, OpenLink Software, Inc. G See http://wiki.dbpedia.org/Team september 6, 2012 16 / 37
  • 17. Nucleus of the Web of Data G Within the W3C Linking Open Data (LOD) community effort. G Tim Berners-Lee’s Linked Data principles. H URI H HTTP H RDF, SPARQL H Interlinking among data providers G An increasing number of data providers have started to publish and interlink data on the Web. G Several billion RDF triples and covers domains such as geographic information, people, companies, online communities, films, music, books and scientific publications. september 6, 2012 17 / 37
  • 18. LOD Datasets september 6, 2012 18 / 37
  • 19. LOD Datasets september 6, 2012 19 / 37
  • 20. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 20 / 37
  • 21. SPARQL Query Language G RDF is a directed, labeled graph data format for representing information (also in the Web). G SPARQL is a language for querying RDF graphs by specifying templates against which to compare graph components. Data which matches or satisfies a template is returned from the query. G A triple template contains variables that represent triplet components (e.g., ?s, ?p, or ?o within a triplet). G Example: H ?person ex:age "20"xsd:integer . H Identifies a list of triplet subjects that have an ex:age property of "20". Analogous to asking "Who has age 20?". H The SPARQL query engine will return a list of the subject component of triples that satisfy each query through value substitution. september 6, 2012 21 / 37
  • 22. SPARQL Queries SELECT variables_list FROM < RDF_source_URL > WHERE { { triple_pattern_1 . . . . triple_pattern_n . }. } SELECT ?person FROM < http://ex.com > ?person WHERE { ------------------ ?person ex:age "20"xsd:integer . _p1 } _p2 . . . september 6, 2012 22 / 37
  • 23. The DBpedia SPARQL endpoint G All data sets are available for queries via the DBpedia SPARQL endpoint (http://dbpedia.org/sparql). G Querying the data set: H ... H Abstracts of movies starring Tom Cruise, released before 1999. H The official websites of companies with more than 50000 employees. H Cities with more than 2 million habitants. H ... september 6, 2012 23 / 37
  • 24. Abstracts of movies starring Tom Cruise, released before 1999 SPARQL SELECT ?subject ?label ?released ?abstract WHERE { ?subject rdf:type <http://dbpedia.org/ontology/Film>. ?subject dbpedia2:starring <http://dbpedia.org/resource/Tom_Cruise>. ?subject rdfs:comment ?abstract. ?subject rdfs:label ?label. FILTER(lang(?abstract) = "en" && lang(?label) = "en"). ?subject <http://dbpedia.org/ontology/releaseDate> ?released. FILTER(xsd:date(?released) < "2000-01-01"^^xsd:date). } ORDER BY ?released september 6, 2012 24 / 37
  • 25. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 25 / 37
  • 26. Linked Data Search Engines and Indexes G A number of search engines have been developed that crawl Linked Data from the Web by following RDF links, and provide query capabilities over aggregated data. Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool. G Google, Bing and Yahoo! agree to create and support a common vocabulary for structured data markup on web pages. G Facebook has started to support RDF and Linked Data URIs and now provides access to parts of its user data via a Linked Data API. september 6, 2012 26 / 37
  • 27. Google rich snippets september 6, 2012 27 / 37
  • 28. Twitter, #annotations Twitter API based client september 6, 2012 28 / 37
  • 29. Twitter, #annotations Lookup annotations september 6, 2012 29 / 37
  • 32. Question answering Risorsa Cagliari september 6, 2012 32 / 37
  • 33. Question answering Template september 6, 2012 33 / 37
  • 34. Question answering RDF/XML september 6, 2012 34 / 37
  • 35. Outline G Motivation G UnStructured Data G Structured Data G Query building G Applications G Conclusion september 6, 2012 35 / 37
  • 36. Conclusion G Data on the Web is a major challenge; technologies are needed to use them, to interact with them, to integrate them. G Semantic Web technologies (RDF, SPARQL, etc.) can play a major role in publishing and using Data on the Web. G Users can largely benefit from the wide world of structured content. G Content providers joining the Linking Open Data project are contributing to create more meaningful navigation paths not only within websites but across the whole web. september 6, 2012 36 / 37