SlideShare una empresa de Scribd logo
1 de 107
Descargar para leer sin conexión
Linked Data Management
 3rd GATE Training Course @ Montreal
              Module 15
        Marin Dimitrov (Ontotext)



             August 2010
3rd GATE Training Course and
       Developer Sprint @ Montreal, Aug 2010

• https://gate.ac.uk/sale/images/gate-art/fig-
  posters/fig3-poster.pdf




                                                        Aug 2010   #2
                RDF, SPARQL and Semantic Repositories
Module 15 programme
9.45-11.00
              •    Linked Data principles
              •    Vocabularies & datasets
11.00-11.15
              • Coffee break
11.15-12.30
              •    Open Government Data
              •    Tools
              •    Open issues & challenges
12.30-14.00   Lunch break
14.00-16.00
              •    Introduction to FactForge and LinkedLifeData
              •    The “Modigliani test” for the Semantic Web
16.00-16.30   Coffee

                               Linked Data Management     Aug 2010   #3
LINKED DATA PRINCIPLES




           Linked Data Management   Aug 2010   #4
Linked Data

• “To make the Semantic Web a reality, it is necessary to have a
  large volume of data available on the Web in a standard,
  reachable and manageable format. In addition the
  relationships among data also need to be made available. This
  collection of interrelated data on the Web can also be referred
  to as Linked Data. Linked Data lies at the heart of the
  Semantic Web: large scale integration of, and reasoning on,
  data on the Web.” (W3C)
• Linked Data is a set of principles that allows publishing,
  querying and browsing of RDF data, distributed across
  different servers
   • similar to the way HTML is currently published & consumed


                           Linked Data Management           Aug 2010   #5
Linked Data design principles

1.       Unambiguous identifiers for objects (resources)
     –     Use URIs as names for things
2.       Use the structure of the web
     –     Use HTTP URIs so that people can look up the names
3.       Make is easy to discover information about an
         object (resource)
     –     When someone lookups a URI, provide useful
           information
4.       Link the object (resource) to related objects
     –     Include links to other URIs

                            Linked Data Management      Aug 2010   #6
Linked datasets

                    owl:SymmetricProperty                     owl:inverseOf
    inferred
                                                  rdf:type




                                  rdf:type
                                                     owl:inverseOf

                owl:relativeOf                                 ptop:parentOf
                                             rdfs:subPropertyOf
ptop:Agent




                                                                owl:inverseOf
                                 owl:inverseOf


                            ptop:Person
     rdf:type




                                                 rdfs:range

                                                               ptop:childOf

myData:Ivan



                                                               ptop:Woman



                                   myData: Maria

                         Linked Data Management                                 Aug 2010   #7
Linked Data evolution – Oct 2007




          Linked Data Management   Aug 2010   #8
Linked Data evolution – Sep 2008




          Linked Data Management   Aug 2010   #9
Linked Data evolution – Jul 2009




          Linked Data Management   Aug 2010   #10
Linked Data evolution – Sep 2010




          Linked Data Management   Aug 2010   #11
Linked Data evolution – Sep 2010

• 220 interlinked datasets
• 24 billion RDF triples
   –   Data.gov + data.gov wiki – 11.5 billion
   –   LinkedGeoData – 3 billion
   –   UniProt – 1.1 billion
   –   DBpedia – 1 billion
   –   US Census Data – 1 billion
   –   PubMed – 0.8 billion
   –   AudioScrobbler – 0.6 billion
   –   …
   –   Freebase – 0,1 billion
                           Linked Data Management   Aug 2010   #12
Linked Data example –
http://factforge.net/resource/dbpedia/Montreal




                 Linked Data Management   Aug 2010   #13
Linked Data example (2)

• The description for             Montreal      on   FactForge
  aggregates data from
  –   DBpedia
  –   GeoNames
  –   Freebase
  –   NY Times




                       Linked Data Management        Aug 2010   #14
Linked Data example (3)




DBpedia
                GeoNames




Freebase




                 Linked Data Management   Aug 2010   #15
Why use Linked Data?

• Facilitate data integration
   – Use LOD as an “interlingua” for EDI
      • Additional public information can help alignment and linking

• Add value to proprietary data
   – Public data can allow enhanced content and more
     analytics on top of proprietary data
      • E.g. linking to spatial data from GeoNames, search for images
   – Better description and access to content
• Make enterprise data more open & accessible
   – Public identifiers and vocabularies can be used to access
     them

                           Linked Data Management            Aug 2010   #16
Success Stories

• BBC Music
  – Integrates information from MusicBrainz and Wikipedia for
    artist/band infopages
  – Information also available in RDF (in addition to web
    pages)
  – 3rd party applications built on top of the BBC data
  – BBC also contributes data back to the MusicBrainz
• NY times
  – Maps its thesaurus of 1 million entity descriptions (people,
    organisations, places, etc) to DBpedia and Freebase


                        Linked Data Management       Aug 2010   #17
VOCABULARIES & DATASETS




           Linked Data Management   Aug 2010   #18
Vocabularies

• Existing vocabularies make publishing & integrating
  Linked Data easier
   – Friend-of-a-Friend (FOAF)
      • http://xmlns.com/foaf/0.1/
      • Vocabulary for describing people (names, contact info, …)
   – Dublin Core (DC)
      • http://dublincore.org/documents/dcmes-xml/
      • Vocabulary for general metadata attributes (author, topic, …)
   – Semantically-Interlinked Online Communities (SIOC)
      • http://sioc-project.org/
      • Social Web data



                           Linked Data Management            Aug 2010   #19
Vocabularies (2)

• Existing vocabularies (contd.)
   – SKOS
      • http://www.w3.org/2004/02/skos/
   – GoodRelations
      • Vocabulary for describing products and business entities
      • http://www.heppnetz.de/ontologies/goodrelations/v1
   – Music Ontology
      • http://musicontology.com/
   – Linked Open Description of Events (LODE)
      • http://linkedevents.org/ontology/
   – Creative Commons
      • http://creativecommons.org/ns

                           Linked Data Management            Aug 2010   #20
Vocabularies (3)




  Linked Data Management   Aug 2010   #21
Datasets

• DBpedia
  – Linked Data version of Wikipedia
  – 3.5 million entities, incl. 410K places, 310K persons, 146K
    species, 140K organisations, 95K music albums, 50K films,
    33K buildings, 15K videogames, 5K diseases
  – Descriptions available in 90 languages
  – 1 billion triples, 10 million links to external RDF datasets
  – Ontology – 260 classes, 1200 properties, 1.5 million
    instances
     • http://www4.wiwiss.fu-berlin.de/dbpedia/dev/ontology.htm




                        Linked Data Management          Aug 2010   #22
Datasets (2)

• Freebase
  – Similar to DBpedia
  – Higher data quality but ten times less data
• GeoNames
  – Information about 6 million places
  – Ontology:
    http://www.geonames.org/ontology/ontology_v2.1.rdf
• MusicBrainz
  – 55K artists, 22K albums, 36 million triples



                        Linked Data Management    Aug 2010   #23
OPEN GOVERNMENT DATA




          Linked Data Management   Aug 2010   #24
data.gov (USA)




 Linked Data Management   Aug 2010   #25
data.gov.uk (UK)




  Linked Data Management   Aug 2010   #26
data.gov.uk (2)

• “…we will aim for the majority of government-
  published information to be reusable, linked data by
  June 2011; and we will establish a common licence to
  reuse data which is interoperable with the
  internationally recognised Creative Commons
  model.” (UK Government, Dec 2009)




                     Linked Data Management   Aug 2010   #27
gov.opendata.at (Austria)




      Linked Data Management   Aug 2010   #28
at.ckan.net (Austria)




    Linked Data Management   Aug 2010   #29
openbelgium.be/data (Belgium)




         Linked Data Management   Aug 2010   #30
digitaliser.dk (Denmark)




      Linked Data Management   Aug 2010   #31
pub.stat.ee (Estonia)




    Linked Data Management   Aug 2010   #32
opengov.fi (Finland)




    Linked Data Management   Aug 2010   #33
data-publica.com (France)




      Linked Data Management   Aug 2010   #34
data-gov.fr (France)




    Linked Data Management   Aug 2010   #35
opendata-network.org (Germany)




         Linked Data Management   Aug 2010   #36
offenedaten.de (Germany)




      Linked Data Management   Aug 2010   #37
geodata.gov.gr (Greece)




     Linked Data Management   Aug 2010   #38
hu.ckan.net (Hungary)




     Linked Data Management   Aug 2010   #39
ie.ckan.net (Ireland)




    Linked Data Management   Aug 2010   #40
datagov.it (Italy)




  Linked Data Management   Aug 2010   #41
LinkedOpenData.it (Italy)




      Linked Data Management   Aug 2010   #42
it.ckan.net (Italy)




Linked Government Data @ Ontotext
     Linked Data Management         Aug 2010   #43
data.norge.no (Norway)




     Linked Data Management   Aug 2010   #44
datanest.fair-play.sk (Slovakia)




          Linked Data Management   Aug 2010   #45
si.ckan.net (Slovenia)




     Linked Data Management   Aug 2010   #46
opengov.es (Spain)




Linked Government Data @ Ontotext   Oct 2010   #47
opendata.euskadi.net (Spain / Basque Country)




                 Linked Data Management   Aug 2010   #48
opengov.se (Sweden)




    Linked Data Management   Aug 2010   #49
opendatani.info (UK / Northern Ireland)




           Linked Government Data @ Ontotext   Oct 2010   #50
Eurostat (EU)




Linked Data Management   Aug 2010   #51
data.australia.gov.au (Australia)




          Linked Data Management    Aug 2010   #52
data.gov.au (Australia)




     Linked Data Management   Aug 2010   #53
DataDotGC.ca (Canada)




     Linked Data Management   Aug 2010   #54
databox.openlabs.go.jp (Japan)




         Linked Data Management   Aug 2010   #55
opendata.go.ke (Kenya)




     Linked Data Management   Aug 2010   #56
data.govt.nz (New Zealand)




       Linked Data Management   Aug 2010   #57
opengovdata.ru (Russia)




      Linked Data Management   Aug 2010   #58
Open Government Data statistics

• Data.gov
  – 2,400 datasets
  – … but only 400 datasets RDFized at present
  – 6.5 billion triples / 0.5 billion entities
• Data.gov.uk
  – 3,000 datasets
• Data Publica
  – 2,000 datasets
• Eurostat
  – 4,000 datasets

                       Linked Data Management    Aug 2010   #59
ThisWeKnow




Linked Data Management   Aug 2010   #60
ThisWeKnow (2)




SPARQL query




               Linked Data Management   Aug 2010   #61
data.worldbank.org




 Linked Government Data @ Ontotext   Oct 2010   #62
TOOLS




        Linked Data Management   Aug 2010   #63
Linked Data browsers – Marbles

• http://marbles.sourceforge.net
• XHTML views of RDF data (SPARQL endpoint),
  caching, predicate traversal




                    Linked Data Management   Aug 2010   #64
Linked Data browsers – RelFinder

• http://relfinder.dbpedia.org
• Explore & navigate relationships in a RDF graph




                     Linked Data Management   Aug 2010   #65
Linked Data browsers – gFacet

• http://gfacet.semanticweb.org/
• Graph based visualisation & faceted filtering of RDF
  data




                     Linked Data Management   Aug 2010   #66
Linked Data browsers – Forest

• Front end to FactForge and LinkedLifeData




                    Linked Data Management    Aug 2010   #67
Linked Data browsers – Information Workbench

• http://iwb.fluidops.com/main.jsp




                    Linked Data Management   Aug 2010   #68
Linked Data browsers – Information Workbench (2)




                  Linked Data Management   Aug 2010   #69
Linked Data browsers – OpenLink RDF Browser

• http://demo.openlinksw.com/DAV/JS/rdfbrowser/in
  dex.html
• Explore & navigate relationships in a RDF graph




                     Linked Data Management   Aug 2010   #70
DBpedia Mobile

• http://wiki.dbpedia.org/DBpediaMobile
• Based on user’s GPS position, renders a map with
  nearby places of interest (from DBpedia)




                    Linked Data Management   Aug 2010   #71
Pubby – A Linked Data Frontend for SPARQL Endpoints

• http://www4.wiwiss.fu-berlin.de/pubby/
• Linked Data interface to local/remote SPARQL
  endpoints
• URI rewriting of SPARQL resultsets
• Simple HTML interface




                     Linked Data Management   Aug 2010   #72
OPEN ISSUES AND CHALLENGES




           Linked Data Management   Aug 2010   #73
Linked Data – open issues

• LOD is hard to comprehend
   – Schema diversity & proliferation
• Quality of data is poor
   – Many of the datasets are well positioned to serve as
     “master data” but their quality is very far from the
     enterprise standards
   – No kind of consistency is guarantees
• Issues with reliability of data end-points
   – High down-time is not unusual
   – There is no SLA provided


                        Linked Data Management   Aug 2010   #74
Linked Data – open issues (2)

• Querying of linked data is slow
   – Data is distributed on the web
      • Federated SPARQL queries are slow
   – Even single SPARQL endpoints can be slow
      • Most end-points are experimental/research   projects with no
        resources for quality guarantees

• Licensing issues
   – majority of datasets carry no explicit open license
   – Copyright-based licenses (CC) are difficult to apply to
     factual data



                          Linked Data Management        Aug 2010   #75
Linked Data – licensing issues




                                             (c) Leigh Dodds




         Linked Data Management   Aug 2010           #76
Weaving The Pedantic Web

• Initiative of DERI / KIT
• http://pedantic-web.org
• Goals
   – Analyse most common errors in RDF publishing
   – Propose possible approaches to avoid (publisher side) or
     deal with (consumer side) such errors




                       Linked Data Management     Aug 2010   #77
Weaving The Pedantic Web (2)
Category                Problem
                                        Dereferencability issues
     Incompleteness                  No structured data available
                                      Misreported content types
                                        RDF/XML Syntax Errors
                         Atypical use of collections, containers and reification
                               Use of undefined classes and properties
                                     Misplaced classes/properties
      Incoherence            Misuse of owl:DatatypeProperty (ObjectProperty)
                              Members of deprecated classes/properties
                                      Malformed datatype literals
                              Literals incompatible with datatype range
                                          Ontology hijacking
           Hijacking           Bogus owl:InverseFunctionalProperty values
                                          Ontology hijacking
     Inconsistencies          Literals incompatible with datatype range
                                         OWL inconsistencies
                        Linked Data Management                          Aug 2010   #78
Weaving The Pedantic Web (3)

• Dereferencability issues
   – URI lookup returns an error (violates 3rd LOD principle)
   – Or results in a redirect (with the wrong code)
• No structured data available
   – RDF data should be returned
• Misreported content types
   – A consumer application needs the correct content type in
     order to decide if it can consume the content (should be
     application/rdf+xml)



                         Linked Data Management       Aug 2010   #79
Weaving The Pedantic Web (4)

• RDF/XML Syntax Errors
• Atypical use of collections, containers and reification
• Use of undefined classes and properties
   – although not prohibited, ad-hoc/undefined classes and
     properties lead to more complex data integration and less
     effective inferences
• Misplaced classes/properties
   – Sometimes, a URI defined as a class is used as a property
     or vice versa (such usage ruins the inference)



                       Linked Data Management      Aug 2010   #80
Weaving The Pedantic Web (5)

• Members of deprecated classes/properties
• Malformed datatype literals / Literals incompatible
  with datatype range
• Bogus owl:InverseFunctionalProperty values
   – When two resources have the same value for an inverse-
     functional property the reasoner will treat them as
     equivalent
• Ontology hijacking
   – Redefinition by 3rd parties of external classes/properties
     affects the reasoner results

                        Linked Data Management      Aug 2010   #81
INTRODUCTION TO FACTFORGE
AND LINKEDLIFEDATA



          Linked Data Management   Aug 2010   #82
Reason-able Views to the Web of Data

• Reason-able views represent an approach for
  reasoning and management of linked data
  – Integrate selected datasets and ontologies in one dataset
     • Clean up, post-process and enrich the datasets if necessary
  – Load the compound dataset in a single RDF repository
  – Perform inference with respect to tractable OWL dialects
  – Define sample queries against the integrated dataset




                          Linked Data Management             Aug 2010   #83
Reason-able Views: Objectives

• Make reasoning and query evaluation feasible
• Guarantee a basic level of consistency
   – The sample queries guarantee provide “regression tests”
     w.r.t. the consistency of the data
• Guarantee availability
• Better usability for querying and data exploration
   – URI auto-complete and RDF search
   – Sample queries provide re-usable extraction patterns,
     which reduce the time for learning about new datasets
     and their inter-relations

                       Linked Data Management     Aug 2010   #84
Two Reason-able Views to the Web of Linked Data

• FactForge
   –   Integrates some of the most central LOD datasets
   –   General-purpose information (not specific to a domain)
   –   1.2B explicit plus 1B inferred statements (10B retrievable)
   –   The largest upper-level knowledge base
   –    http://www.FactForge.net/
• Linked Life Data
   – 25 of the most popular life-science datasets
   – 2.7B explicit and 1.4B inferred triples
   – http://www.LinkedLifeData.com


                          Linked Data Management        Aug 2010     #85
FactForge and LinkedLifeData data sources




FactForge

LinkedLifeData



                          Linked Data Management   Aug 2010   #86
FactForge: Fast Track to the Center of the Web of Data

• Datasets
   – DBpedia, Freebase, Geonames, UMBEL, MusicBrainz,
     Wordnet, CIA World Factbook, Lingvoj
• Ontologies
   – Dublin Core, SKOS, RSS, FOAF
• Inference
   – materialization with respect to OWL 2 RL
   – owl:sameAs optimization in OWLIM allows reduction of
     the indices without loss of semantics



                       Linked Data Management   Aug 2010   #87
FactForge: Fast Track to the Center of the Web of Data
                           (2)

• Free public service at http://www.FactForge.net
   –   Very fast incremental URI auto-completion
   –   Querying and exploration through Forest and Tabulator
   –   RDF Search: retrieve ranked list of URIs by keywords
   –   SPARQL end-point




                         Linked Data Management      Aug 2010   #88
FactForge – Loading and Inference Statistics

                           Explicit     Inferred     Total # of    Entities
                          Indexed       Indexed       Stored       ('000 of Inferred
        Dataset
                           Triples       Triples      Triples     nodes in closure
                            ('000)       ('000)        ('000)    the graph)   ratio
Sechmata and ontologies            11           7             18            6      0.6
DBpedia (categories)            2,877      42,587         45,464       1,144     14.8
DBpedia (sameAs)                5,544         566          6,110       8,464       0.1
UMBEL                           5,162      42,212         47,374         500       8.2
Lingvoj                            20         863            883           18    43.8
CIA Factbook                       76           4             80           25      0.1
Wordnet                         2,281       9,296         11,577         830       4.1
Geonames                       91,908     125,025       216,933       33,382       1.4
DBpedia core                 560,096      198,043       758,139      127,931       0.4
Freebase                     463,689       40,840       504,529       94,810       0.1
MusicBrainz                    45,536     421,093       466,630       15,595       9.2
Total                     1,177,961       881,224     2,058,185      283,253       0.7

                                  Linked Data Management                 Aug 2010        #89
Fact Forge – post-processing

• Several kinds of post-processing were performed
   – Goal: to allow easier navigation and browsing
   – E.g. preferred labels, text snippets, RDF Rank for nodes
        • Results available through system predicates

• Final Statistics
   –   Number of entities (RDF graph nodes): 405M
   –   Number of inserted statements (NIS): 1.2B
   –   Number of stored statements (NSS): 2.2B
   –   Number of retrievable statements (NRS): 9.8B
        • 7.6B statements “compressed” through OWLIM’s owl:sameAs
          optimisation


                             Linked Data Management     Aug 2010   #90
Guess who is the most popular German entertainer?

(run the query at http://factforge.net/sparql)
SELECT * WHERE {
     ?Person dbp-ont:birthPlace ?BirthPlace ;
             rdf:type opencyc:Entertainer ;
             ff:hasRDFRank ?RR .
     ?BirthPlace geo-ont:parentFeature dbpedia:Germany .
} ORDER BY DESC(?RR) LIMIT 100


• Without FF, answering such queries in real time is impossible
    • Used data from: DBPedia, Geonames, UMBEL and MusicBrainz
• The most popular entertainer born in Germany is:
    • Asking factual questions to a global KB can bring unexpected and
                      F. Nietzsche
      strange results
    • We ask who is the most popular person, who qualifies as an
      entertainer
    • It uses a simple notion of popularity – RDFRank

                             Linked Data Management            Aug 2010   #91
Linked Life Data

• Quick facts
   –   Integrates more than 25 popular data sources
   –   5 billion RDF statements, 0.5 billion entities
   –   Querying & exploration of integrated data
   –   Public SPARQL end point
   –   http://linkedlifedata.com/




                          Linked Data Management        Aug 2010   #92
FactForge and LinkedLifeData data sources




FactForge

LinkedLifeData



                          Linked Data Management   Aug 2010   #93
Linked Life Data – ETL process




         Linked Data Management   Aug 2010   #94
Linked Life Data – ETL process (2)

                                   Data Source
                                  Identification



   Flat files       OBO files           XML                  RDBMS               RDF


Special tailored   OBO to SKOS                             RDBMS to
                                  Custom XSLT
 transformer        converter                             RDF formatter



                                 RDF warehouse



                                    Instance                Semantic
                    Reasoner
                                    Mappings               Annotations


                                 Linked Data Management                   Aug 2010     #95
Linked Life Data – triple distribution


          4%

                                           Genes & Protieins
                       27%
                                           Documents
                                           Ontologies & Thesauri
                                           Dbpedia
                                           Linked Open Drug Data
50%                                        BioPAX
                         11%               Inferred
                                           Semanitc Annotations
                  7%      0%

                1% 0%

                  Linked Data Management               Aug 2010    #96
Linked Life Data – Complex Cross-domain Query

filter human                                                    Physiological
genes                    Molecular                                process
                         Technique
                                              participate in
                                                                                          cause
          Gene


                                           Molecular
                                          Interaction
                      analyzed by                                               Disease
express
protein


            Protein           curated
                           interaction                  Drugs
                                                                         treated
                                                                            with

                                     target
                                Linked Data Management                      Aug 2010          #97
New Type of Possible Query #1
                               PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                               PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

Select drugs related to        PREFIX biopax2: <http://www.biopax.org/release/biopax-
                                    level2.owl#>
                               PREFIX uniprot: <http://purl.uniprot.org/core/>
asthma that are linked to      PREFIX drugbank: <http://www4.wiwiss.fu-
                                    berlin.de/drugbank/resource/drugbank/>
a curated molecular            SELECT DISTINCT ?fullname ?drugname

interaction in the             WHERE {
                                 ?interaction rdf:type biopax2:physicalInteraction .
                                 ?interaction biopax2:PARTICIPANTS ?participant .
literature where the             ?participant biopax2:PHYSICAL-ENTITY ?physicalEntity .
                                 ?physicalEntity skos:exactMatch ?protein .
protein is known to cause        ?protein uniprot:classifiedWith
                                    <http://purl.uniprot.org/go/0006954>.

inflammatory response            ?protein uniprot:recommendedName ?name.
                                 ?name uniprot:fullName ?fullname .
                                 ?target skos:exactMatch ?protein .
                                 ?drug drugbank:target ?target .
                                 ?drug drugbank:genericName ?drugname .
                                 ?drug drugbank:indication ?indication .
                               }



                                    The red graph patterns indicate
                                    the usage of mapping rules.
                     Linked Data Management                              Aug 2010          #98
New Type of Possible Query #2
                                PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Select all located in Y-        PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
                                PREFIX gene:
chromosome, human               <http://linkedlifedata.com/resource/entrezgene/>
                                PREFIX core: <http://purl.uniprot.org/core/>

genes with known                PREFIX biopax2: <http://www.biopax.org/release/biopax-
                                level2.owl#>
                                PREFIX lifeskim:
molecular interactions,         <http://linkedlifedata.com/resource/lifeskim/>
                                PREFIX umls: <http://linkedlifedata.com/resource/umls/>
which are analysed with         PREFIX pubmed:
                                <http://linkedlifedata.com/resource/pubmed/>

'Transfection'                  SELECT distinct ?genedescription ?prefLabel ?pmid
                                WHERE {
                                  ?interaction rdf:type biopax2:interaction .
                                  ?interaction biopax2:PARTICIPANTS ?p .
                                  ?p biopax2:PHYSICAL-ENTITY ?protein .
                                  ?protein skos:exactMatch ?uniprotaccession .
                                  ?uniprotaccession core:organism
                                <http://purl.uniprot.org/taxonomy/9606> .
                                  ?geneid gene:uniprotAccession ?uniprotaccession .
                                  ?geneid gene:description ?genedescription .
                                  ?geneid gene:pubmed ?pmid .
                                  ?geneid gene:chromosome 'Y' .
                                  ?pmid lifeskim:mentions ?umlsid .
                                  ?umlsid skos:prefLabel 'Transfection' .
                                  ?umlsid skos:prefLabel ?prefLabel .
                                }

                      Linked Data Management                             Aug 2010           #99
New Type of Possible Query #3
                                 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Select all participating in      PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
                                 PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

interactions human genes         PREFIX gene: <http://linkedlifedata.com/resource/entrezgene/>
                                 PREFIX core: <http://purl.uniprot.org/core/>
                                 PREFIX biopax2: <http://www.biopax.org/release/biopax-
which are drug target and        level2.owl#>
                                 PREFIX lifeskim: <http://linkedlifedata.com/resource/lifeskim/>

are analysed with                PREFIX umls: <http://linkedlifedata.com/resource/umls/>
                                 PREFIX pubmed: <http://linkedlifedata.com/resource/pubmed/>
                                 PREFIX drugbank: <http://www4.wiwiss.fu-
'Transfection'                   berlin.de/drugbank/resource/drugbank/>

                                 SELECT distinct ?genedescription ?prefLabel ?drugname ?pmid
                                     WHERE {
                                       ?interaction rdf:type biopax2:interaction .
                                       ?interaction biopax2:PARTICIPANTS ?p .
                                       ?p biopax2:PHYSICAL-ENTITY ?protein .
                                       ?protein skos:exactMatch ?uniprotaccession .
                                       ?uniprotaccession core:organism
                                 <http://purl.uniprot.org/taxonomy/9606> .
                                       ?geneid gene:uniprotAccession ?uniprotaccession .
                                       ?geneid gene:description ?genedescription .
                                       ?geneid gene:pubmed ?pmid .
                                       ?pmid lifeskim:mentions ?umlsid .
                                       ?umlsid skos:prefLabel 'Transfection' .
                                       ?umlsid skos:prefLabel ?prefLabel .
                                       ?target skos:closeMatch ?geneid.
                                       ?drug drugbank:target ?target .
                                       ?drug rdfs:label ?drugname .
                                     }


                       Linked Data Management                                 Aug 2010             #100
THE “MODIGLIANI TEST” FOR THE
SEMANTIC WEB



           Linked Data Management   Aug 2010   #101
The tipping point for the Semantic Web

• http://www.readwriteweb.com/archives/the_modigliani_test_semantic_
  web_tipping_point.php




                           Linked Data Management        Aug 2010   #102
The tipping point for the Semantic Web (2)

• Richard McManus (ReadWriteWeb)
   – “…the tipping point for the Semantic Web may be when
     one can … deliver – using Linked Data – a comprehensive
     list of locations of original Modigliani art works …” (Apr,
     2010)
• FactForge was the first system to pass the Modigliani
  test
   – Using data from 3 different datasets
   – Neither DBPedia, not Freebase alone can pass the test




                        Linked Data Management       Aug 2010   #103
Passing the test with FactForge




         Linked Data Management   Aug 2010   #104
Passing the test with FactForge (2)

PREFIX   fb: <http://rdf.freebase.com/ns/>
PREFIX   dbpedia: <http://dbpedia.org/resource/>
PREFIX   dbp-prop: <http://dbpedia.org/property/>
PREFIX   dbp-ont: <http://dbpedia.org/ontology/>
PREFIX   umbel-sc: <http://umbel.org/umbel/sc/>
PREFIX   ff: <http://factforge.net/>

SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_loc ?city_db_cit
WHERE {
         ?painting fb:visual_art.artwork.artist dbpedia:Amedeo_Modigliani ;
                   fb:visual_art.artwork.owners
                             [fb:visual_art.artwork_owner_relationship.owner ?ow];
                   ff:preferredLabel ?painting_l .
         ?ow ff:preferredLabel ?owner_l .
         OPTIONAL {
           ?ow fb:location.location.containedby [ff:preferredLabel ?city_fb_con]} .
         OPTIONAL { ?ow dbp-ont:city [ff:preferredLabel ?city_db_cit] } .
         OPTIONAL { ?ow dbp-prop:location ?loc .
                ?loc rdf:type umbel-sc:City ;
                     ff:preferredLabel ?city_db_loc }
}




                                    Linked Data Management          Aug 2010   #105
Summary of this module

• Linked Data is a set of principles that allows
  publishing, querying and browsing of RDF data,
  distributed across different servers
   – similar to the way HTML is currently published &
     consumed
• Linked Data principles
   – Unambiguous identifiers for objects (resources)
   – Use the structure of the web
   – Make is easy to discover information about an object
     (resource)
   – Link the object (resource) to related objects

                      Linked Data Management   Aug 2010   #106
Summary of this module (2)

• As of Sep 2010 the Linked Open Data cloud includes
  180+ interlinked datasets with 20+ billion triples
• Open Government Data is still not RDF-ized
  completely
• Linked Data open issues
   – Data quality, end-point reliability & performance, licensing
• FactForge and LinkedLifeData integrated subsets of
  the LOD cloud in order to provide better data quality,
  query performance and additional inference


                         Linked Data Management       Aug 2010   #107

Más contenido relacionado

La actualidad más candente

Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in LibrariesCarl Hess
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic WebOscar Corcho
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009Peter Mika
 
Publishing data on the Semantic Web
Publishing data on the Semantic WebPublishing data on the Semantic Web
Publishing data on the Semantic WebPeter Mika
 
Year of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyYear of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyPeter Mika
 
Web Data Management with RDF
Web Data Management with RDFWeb Data Management with RDF
Web Data Management with RDFM. Tamer Özsu
 
Basic concept of Linked & Linked open Government data
Basic concept of Linked & Linked open Government data Basic concept of Linked & Linked open Government data
Basic concept of Linked & Linked open Government data saima hanif
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"Fabien Gandon
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic WebMyungjin Lee
 
Keystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenanceKeystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenancePaolo Missier
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data introvafopoulos
 
Sharing Data on the Web
Sharing Data on the WebSharing Data on the Web
Sharing Data on the Web3 Round Stones
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked datavafopoulos
 
Linked Data Technology and Status
Linked Data Technology and StatusLinked Data Technology and Status
Linked Data Technology and StatusMyungjin Lee
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked DataJuan Sequeda
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
 
Linked Data Integration and semantic web
Linked Data Integration and semantic webLinked Data Integration and semantic web
Linked Data Integration and semantic webDiego Pessoa
 

La actualidad más candente (20)

Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009
 
Publishing data on the Semantic Web
Publishing data on the Semantic WebPublishing data on the Semantic Web
Publishing data on the Semantic Web
 
Year of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyYear of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkey
 
Web Data Management with RDF
Web Data Management with RDFWeb Data Management with RDF
Web Data Management with RDF
 
Basic concept of Linked & Linked open Government data
Basic concept of Linked & Linked open Government data Basic concept of Linked & Linked open Government data
Basic concept of Linked & Linked open Government data
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 
The Danish National Bibliography as LOD
The Danish National Bibliography as LODThe Danish National Bibliography as LOD
The Danish National Bibliography as LOD
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic Web
 
Keystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenanceKeystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenance
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
 
Sharing Data on the Web
Sharing Data on the WebSharing Data on the Web
Sharing Data on the Web
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
 
Linked Data Technology and Status
Linked Data Technology and StatusLinked Data Technology and Status
Linked Data Technology and Status
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked Data
 
Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
Linked Data Integration and semantic web
Linked Data Integration and semantic webLinked Data Integration and semantic web
Linked Data Integration and semantic web
 

Destacado

UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic WebChimezie Ogbuji
 
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001Zohaib HUSSAIN
 
Advanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osuAdvanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osuBen Busby
 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database nadeem akhter
 
A Snapshot of America’s Consumer Credit Market
A Snapshot of America’s Consumer Credit MarketA Snapshot of America’s Consumer Credit Market
A Snapshot of America’s Consumer Credit MarketGlobal Client Solutions
 
University of Worcester Children's Conference
University of Worcester Children's Conference University of Worcester Children's Conference
University of Worcester Children's Conference Amy Burvall
 
Malerei | Raum Architektur und Landschaft | Nikolaus Kriese
Malerei | Raum Architektur und Landschaft | Nikolaus KrieseMalerei | Raum Architektur und Landschaft | Nikolaus Kriese
Malerei | Raum Architektur und Landschaft | Nikolaus KrieseNikolaus Kriese
 
Public Opinion Landscape: Election 2016 - New Hampshire 2-9-2016
Public Opinion Landscape: Election 2016 - New Hampshire 2-9-2016Public Opinion Landscape: Election 2016 - New Hampshire 2-9-2016
Public Opinion Landscape: Election 2016 - New Hampshire 2-9-2016GloverParkGroup
 
スマホで済ます習い事
スマホで済ます習い事スマホで済ます習い事
スマホで済ます習い事stucon
 
悦己
悦己悦己
悦己washng
 
握生命中的每一天
握生命中的每一天握生命中的每一天
握生命中的每一天honan4108
 
Evolucion de la gestion humana
Evolucion de la gestion humanaEvolucion de la gestion humana
Evolucion de la gestion humanaAndres Marulanda
 
Novo domínio .Ltda
Novo domínio .LtdaNovo domínio .Ltda
Novo domínio .LtdaDiego Remus
 
Why Dentists should sink their Teeth into Social Media
Why Dentists should sink their Teeth into Social MediaWhy Dentists should sink their Teeth into Social Media
Why Dentists should sink their Teeth into Social MediaGreg Fry
 
IOS 4: Multitasking
IOS 4: MultitaskingIOS 4: Multitasking
IOS 4: MultitaskingKhatt Jah
 
버즈니 플랫폼 : 의견 검색? 의견 검색!
버즈니 플랫폼 : 의견 검색? 의견 검색!버즈니 플랫폼 : 의견 검색? 의견 검색!
버즈니 플랫폼 : 의견 검색? 의견 검색!mosaicnet
 

Destacado (19)

UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic Web
 
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
 
Advanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osuAdvanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osu
 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database
 
A Snapshot of America’s Consumer Credit Market
A Snapshot of America’s Consumer Credit MarketA Snapshot of America’s Consumer Credit Market
A Snapshot of America’s Consumer Credit Market
 
University of Worcester Children's Conference
University of Worcester Children's Conference University of Worcester Children's Conference
University of Worcester Children's Conference
 
Calendar Check-In - Q2
Calendar Check-In - Q2Calendar Check-In - Q2
Calendar Check-In - Q2
 
Malerei | Raum Architektur und Landschaft | Nikolaus Kriese
Malerei | Raum Architektur und Landschaft | Nikolaus KrieseMalerei | Raum Architektur und Landschaft | Nikolaus Kriese
Malerei | Raum Architektur und Landschaft | Nikolaus Kriese
 
Public Opinion Landscape: Election 2016 - New Hampshire 2-9-2016
Public Opinion Landscape: Election 2016 - New Hampshire 2-9-2016Public Opinion Landscape: Election 2016 - New Hampshire 2-9-2016
Public Opinion Landscape: Election 2016 - New Hampshire 2-9-2016
 
スマホで済ます習い事
スマホで済ます習い事スマホで済ます習い事
スマホで済ます習い事
 
悦己
悦己悦己
悦己
 
Improving Findability Inside the Firewall
Improving Findability Inside the FirewallImproving Findability Inside the Firewall
Improving Findability Inside the Firewall
 
握生命中的每一天
握生命中的每一天握生命中的每一天
握生命中的每一天
 
Evolucion de la gestion humana
Evolucion de la gestion humanaEvolucion de la gestion humana
Evolucion de la gestion humana
 
Novo domínio .Ltda
Novo domínio .LtdaNovo domínio .Ltda
Novo domínio .Ltda
 
Why Dentists should sink their Teeth into Social Media
Why Dentists should sink their Teeth into Social MediaWhy Dentists should sink their Teeth into Social Media
Why Dentists should sink their Teeth into Social Media
 
IOS 4: Multitasking
IOS 4: MultitaskingIOS 4: Multitasking
IOS 4: Multitasking
 
버즈니 플랫폼 : 의견 검색? 의견 검색!
버즈니 플랫폼 : 의견 검색? 의견 검색!버즈니 플랫폼 : 의견 검색? 의견 검색!
버즈니 플랫폼 : 의견 검색? 의견 검색!
 
Expresiones
ExpresionesExpresiones
Expresiones
 

Similar a Linked Data Management

Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupalemmanuel_jamin
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityunivTope Omitola
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1manujam
 
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearchTope Omitola
 
Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Oscar Corcho
 
The CSO Open Data Experience
The CSO Open Data ExperienceThe CSO Open Data Experience
The CSO Open Data ExperienceDublinked .
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementRENDER project
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?Anna Fensel
 
Linked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIGLinked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIGChris Ewing
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in RomaniaVlad Posea
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Cory Lampert
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RESChristophe Guéret
 
RDFa Introductory Course Session 4/4 When RDFa
RDFa Introductory Course Session 4/4 When RDFaRDFa Introductory Course Session 4/4 When RDFa
RDFa Introductory Course Session 4/4 When RDFaPlatypus
 

Similar a Linked Data Management (20)

Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
Here Comes Everything
Here Comes EverythingHere Comes Everything
Here Comes Everything
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1
 
Linked Data
Linked DataLinked Data
Linked Data
 
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearch
 
Linked Data
Linked DataLinked Data
Linked Data
 
Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)
 
The CSO Open Data Experience
The CSO Open Data ExperienceThe CSO Open Data Experience
The CSO Open Data Experience
 
20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data Management
 
Introducción a Linked Open Data (espacios enlazados y enlazables)
Introducción a Linked Open Data (espacios enlazados y enlazables)Introducción a Linked Open Data (espacios enlazados y enlazables)
Introducción a Linked Open Data (espacios enlazados y enlazables)
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?
 
Linked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIGLinked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIG
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in Romania
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
 
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RES
 
RDFa Introductory Course Session 4/4 When RDFa
RDFa Introductory Course Session 4/4 When RDFaRDFa Introductory Course Session 4/4 When RDFa
RDFa Introductory Course Session 4/4 When RDFa
 

Más de Marin Dimitrov

Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Marin Dimitrov
 
Mapping Your Career Journey
Mapping Your Career JourneyMapping Your Career Journey
Mapping Your Career JourneyMarin Dimitrov
 
Trust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & OrganisationsTrust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & OrganisationsMarin Dimitrov
 
Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018Marin Dimitrov
 
Machine Learning @ Uber
Machine Learning @ UberMachine Learning @ Uber
Machine Learning @ UberMarin Dimitrov
 
Career Advice for My Younger Self
Career Advice for My Younger SelfCareer Advice for My Younger Self
Career Advice for My Younger SelfMarin Dimitrov
 
Scaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed SitesScaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed SitesMarin Dimitrov
 
Building, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance TeamsBuilding, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance TeamsMarin Dimitrov
 
Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)Marin Dimitrov
 
GraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL QueriesGraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL QueriesMarin Dimitrov
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceMarin Dimitrov
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudMarin Dimitrov
 
Low-cost Open Data As-a-Service
Low-cost Open Data As-a-ServiceLow-cost Open Data As-a-Service
Low-cost Open Data As-a-ServiceMarin Dimitrov
 
Text Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-ServiceText Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-ServiceMarin Dimitrov
 
RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4Marin Dimitrov
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataMarin Dimitrov
 
Enabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseEnabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseMarin Dimitrov
 
S4: The Self-Service Semantic Suite
S4: The Self-Service Semantic SuiteS4: The Self-Service Semantic Suite
S4: The Self-Service Semantic SuiteMarin Dimitrov
 
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the CloudScaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the CloudMarin Dimitrov
 

Más de Marin Dimitrov (20)

Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...
 
Mapping Your Career Journey
Mapping Your Career JourneyMapping Your Career Journey
Mapping Your Career Journey
 
Open Source @ Uber
Open Source @ Uber Open Source @ Uber
Open Source @ Uber
 
Trust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & OrganisationsTrust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & Organisations
 
Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018
 
Machine Learning @ Uber
Machine Learning @ UberMachine Learning @ Uber
Machine Learning @ Uber
 
Career Advice for My Younger Self
Career Advice for My Younger SelfCareer Advice for My Younger Self
Career Advice for My Younger Self
 
Scaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed SitesScaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed Sites
 
Building, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance TeamsBuilding, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance Teams
 
Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)
 
GraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL QueriesGraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL Queries
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the Cloud
 
Low-cost Open Data As-a-Service
Low-cost Open Data As-a-ServiceLow-cost Open Data As-a-Service
Low-cost Open Data As-a-Service
 
Text Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-ServiceText Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-Service
 
RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Enabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseEnabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and Reuse
 
S4: The Self-Service Semantic Suite
S4: The Self-Service Semantic SuiteS4: The Self-Service Semantic Suite
S4: The Self-Service Semantic Suite
 
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the CloudScaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
 

Último

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 

Último (20)

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 

Linked Data Management

  • 1. Linked Data Management 3rd GATE Training Course @ Montreal Module 15 Marin Dimitrov (Ontotext) August 2010
  • 2. 3rd GATE Training Course and Developer Sprint @ Montreal, Aug 2010 • https://gate.ac.uk/sale/images/gate-art/fig- posters/fig3-poster.pdf Aug 2010 #2 RDF, SPARQL and Semantic Repositories
  • 3. Module 15 programme 9.45-11.00 • Linked Data principles • Vocabularies & datasets 11.00-11.15 • Coffee break 11.15-12.30 • Open Government Data • Tools • Open issues & challenges 12.30-14.00 Lunch break 14.00-16.00 • Introduction to FactForge and LinkedLifeData • The “Modigliani test” for the Semantic Web 16.00-16.30 Coffee Linked Data Management Aug 2010 #3
  • 4. LINKED DATA PRINCIPLES Linked Data Management Aug 2010 #4
  • 5. Linked Data • “To make the Semantic Web a reality, it is necessary to have a large volume of data available on the Web in a standard, reachable and manageable format. In addition the relationships among data also need to be made available. This collection of interrelated data on the Web can also be referred to as Linked Data. Linked Data lies at the heart of the Semantic Web: large scale integration of, and reasoning on, data on the Web.” (W3C) • Linked Data is a set of principles that allows publishing, querying and browsing of RDF data, distributed across different servers • similar to the way HTML is currently published & consumed Linked Data Management Aug 2010 #5
  • 6. Linked Data design principles 1. Unambiguous identifiers for objects (resources) – Use URIs as names for things 2. Use the structure of the web – Use HTTP URIs so that people can look up the names 3. Make is easy to discover information about an object (resource) – When someone lookups a URI, provide useful information 4. Link the object (resource) to related objects – Include links to other URIs Linked Data Management Aug 2010 #6
  • 7. Linked datasets owl:SymmetricProperty owl:inverseOf inferred rdf:type rdf:type owl:inverseOf owl:relativeOf ptop:parentOf rdfs:subPropertyOf ptop:Agent owl:inverseOf owl:inverseOf ptop:Person rdf:type rdfs:range ptop:childOf myData:Ivan ptop:Woman myData: Maria Linked Data Management Aug 2010 #7
  • 8. Linked Data evolution – Oct 2007 Linked Data Management Aug 2010 #8
  • 9. Linked Data evolution – Sep 2008 Linked Data Management Aug 2010 #9
  • 10. Linked Data evolution – Jul 2009 Linked Data Management Aug 2010 #10
  • 11. Linked Data evolution – Sep 2010 Linked Data Management Aug 2010 #11
  • 12. Linked Data evolution – Sep 2010 • 220 interlinked datasets • 24 billion RDF triples – Data.gov + data.gov wiki – 11.5 billion – LinkedGeoData – 3 billion – UniProt – 1.1 billion – DBpedia – 1 billion – US Census Data – 1 billion – PubMed – 0.8 billion – AudioScrobbler – 0.6 billion – … – Freebase – 0,1 billion Linked Data Management Aug 2010 #12
  • 13. Linked Data example – http://factforge.net/resource/dbpedia/Montreal Linked Data Management Aug 2010 #13
  • 14. Linked Data example (2) • The description for Montreal on FactForge aggregates data from – DBpedia – GeoNames – Freebase – NY Times Linked Data Management Aug 2010 #14
  • 15. Linked Data example (3) DBpedia GeoNames Freebase Linked Data Management Aug 2010 #15
  • 16. Why use Linked Data? • Facilitate data integration – Use LOD as an “interlingua” for EDI • Additional public information can help alignment and linking • Add value to proprietary data – Public data can allow enhanced content and more analytics on top of proprietary data • E.g. linking to spatial data from GeoNames, search for images – Better description and access to content • Make enterprise data more open & accessible – Public identifiers and vocabularies can be used to access them Linked Data Management Aug 2010 #16
  • 17. Success Stories • BBC Music – Integrates information from MusicBrainz and Wikipedia for artist/band infopages – Information also available in RDF (in addition to web pages) – 3rd party applications built on top of the BBC data – BBC also contributes data back to the MusicBrainz • NY times – Maps its thesaurus of 1 million entity descriptions (people, organisations, places, etc) to DBpedia and Freebase Linked Data Management Aug 2010 #17
  • 18. VOCABULARIES & DATASETS Linked Data Management Aug 2010 #18
  • 19. Vocabularies • Existing vocabularies make publishing & integrating Linked Data easier – Friend-of-a-Friend (FOAF) • http://xmlns.com/foaf/0.1/ • Vocabulary for describing people (names, contact info, …) – Dublin Core (DC) • http://dublincore.org/documents/dcmes-xml/ • Vocabulary for general metadata attributes (author, topic, …) – Semantically-Interlinked Online Communities (SIOC) • http://sioc-project.org/ • Social Web data Linked Data Management Aug 2010 #19
  • 20. Vocabularies (2) • Existing vocabularies (contd.) – SKOS • http://www.w3.org/2004/02/skos/ – GoodRelations • Vocabulary for describing products and business entities • http://www.heppnetz.de/ontologies/goodrelations/v1 – Music Ontology • http://musicontology.com/ – Linked Open Description of Events (LODE) • http://linkedevents.org/ontology/ – Creative Commons • http://creativecommons.org/ns Linked Data Management Aug 2010 #20
  • 21. Vocabularies (3) Linked Data Management Aug 2010 #21
  • 22. Datasets • DBpedia – Linked Data version of Wikipedia – 3.5 million entities, incl. 410K places, 310K persons, 146K species, 140K organisations, 95K music albums, 50K films, 33K buildings, 15K videogames, 5K diseases – Descriptions available in 90 languages – 1 billion triples, 10 million links to external RDF datasets – Ontology – 260 classes, 1200 properties, 1.5 million instances • http://www4.wiwiss.fu-berlin.de/dbpedia/dev/ontology.htm Linked Data Management Aug 2010 #22
  • 23. Datasets (2) • Freebase – Similar to DBpedia – Higher data quality but ten times less data • GeoNames – Information about 6 million places – Ontology: http://www.geonames.org/ontology/ontology_v2.1.rdf • MusicBrainz – 55K artists, 22K albums, 36 million triples Linked Data Management Aug 2010 #23
  • 24. OPEN GOVERNMENT DATA Linked Data Management Aug 2010 #24
  • 25. data.gov (USA) Linked Data Management Aug 2010 #25
  • 26. data.gov.uk (UK) Linked Data Management Aug 2010 #26
  • 27. data.gov.uk (2) • “…we will aim for the majority of government- published information to be reusable, linked data by June 2011; and we will establish a common licence to reuse data which is interoperable with the internationally recognised Creative Commons model.” (UK Government, Dec 2009) Linked Data Management Aug 2010 #27
  • 28. gov.opendata.at (Austria) Linked Data Management Aug 2010 #28
  • 29. at.ckan.net (Austria) Linked Data Management Aug 2010 #29
  • 30. openbelgium.be/data (Belgium) Linked Data Management Aug 2010 #30
  • 31. digitaliser.dk (Denmark) Linked Data Management Aug 2010 #31
  • 32. pub.stat.ee (Estonia) Linked Data Management Aug 2010 #32
  • 33. opengov.fi (Finland) Linked Data Management Aug 2010 #33
  • 34. data-publica.com (France) Linked Data Management Aug 2010 #34
  • 35. data-gov.fr (France) Linked Data Management Aug 2010 #35
  • 36. opendata-network.org (Germany) Linked Data Management Aug 2010 #36
  • 37. offenedaten.de (Germany) Linked Data Management Aug 2010 #37
  • 38. geodata.gov.gr (Greece) Linked Data Management Aug 2010 #38
  • 39. hu.ckan.net (Hungary) Linked Data Management Aug 2010 #39
  • 40. ie.ckan.net (Ireland) Linked Data Management Aug 2010 #40
  • 41. datagov.it (Italy) Linked Data Management Aug 2010 #41
  • 42. LinkedOpenData.it (Italy) Linked Data Management Aug 2010 #42
  • 43. it.ckan.net (Italy) Linked Government Data @ Ontotext Linked Data Management Aug 2010 #43
  • 44. data.norge.no (Norway) Linked Data Management Aug 2010 #44
  • 45. datanest.fair-play.sk (Slovakia) Linked Data Management Aug 2010 #45
  • 46. si.ckan.net (Slovenia) Linked Data Management Aug 2010 #46
  • 47. opengov.es (Spain) Linked Government Data @ Ontotext Oct 2010 #47
  • 48. opendata.euskadi.net (Spain / Basque Country) Linked Data Management Aug 2010 #48
  • 49. opengov.se (Sweden) Linked Data Management Aug 2010 #49
  • 50. opendatani.info (UK / Northern Ireland) Linked Government Data @ Ontotext Oct 2010 #50
  • 51. Eurostat (EU) Linked Data Management Aug 2010 #51
  • 52. data.australia.gov.au (Australia) Linked Data Management Aug 2010 #52
  • 53. data.gov.au (Australia) Linked Data Management Aug 2010 #53
  • 54. DataDotGC.ca (Canada) Linked Data Management Aug 2010 #54
  • 55. databox.openlabs.go.jp (Japan) Linked Data Management Aug 2010 #55
  • 56. opendata.go.ke (Kenya) Linked Data Management Aug 2010 #56
  • 57. data.govt.nz (New Zealand) Linked Data Management Aug 2010 #57
  • 58. opengovdata.ru (Russia) Linked Data Management Aug 2010 #58
  • 59. Open Government Data statistics • Data.gov – 2,400 datasets – … but only 400 datasets RDFized at present – 6.5 billion triples / 0.5 billion entities • Data.gov.uk – 3,000 datasets • Data Publica – 2,000 datasets • Eurostat – 4,000 datasets Linked Data Management Aug 2010 #59
  • 61. ThisWeKnow (2) SPARQL query Linked Data Management Aug 2010 #61
  • 62. data.worldbank.org Linked Government Data @ Ontotext Oct 2010 #62
  • 63. TOOLS Linked Data Management Aug 2010 #63
  • 64. Linked Data browsers – Marbles • http://marbles.sourceforge.net • XHTML views of RDF data (SPARQL endpoint), caching, predicate traversal Linked Data Management Aug 2010 #64
  • 65. Linked Data browsers – RelFinder • http://relfinder.dbpedia.org • Explore & navigate relationships in a RDF graph Linked Data Management Aug 2010 #65
  • 66. Linked Data browsers – gFacet • http://gfacet.semanticweb.org/ • Graph based visualisation & faceted filtering of RDF data Linked Data Management Aug 2010 #66
  • 67. Linked Data browsers – Forest • Front end to FactForge and LinkedLifeData Linked Data Management Aug 2010 #67
  • 68. Linked Data browsers – Information Workbench • http://iwb.fluidops.com/main.jsp Linked Data Management Aug 2010 #68
  • 69. Linked Data browsers – Information Workbench (2) Linked Data Management Aug 2010 #69
  • 70. Linked Data browsers – OpenLink RDF Browser • http://demo.openlinksw.com/DAV/JS/rdfbrowser/in dex.html • Explore & navigate relationships in a RDF graph Linked Data Management Aug 2010 #70
  • 71. DBpedia Mobile • http://wiki.dbpedia.org/DBpediaMobile • Based on user’s GPS position, renders a map with nearby places of interest (from DBpedia) Linked Data Management Aug 2010 #71
  • 72. Pubby – A Linked Data Frontend for SPARQL Endpoints • http://www4.wiwiss.fu-berlin.de/pubby/ • Linked Data interface to local/remote SPARQL endpoints • URI rewriting of SPARQL resultsets • Simple HTML interface Linked Data Management Aug 2010 #72
  • 73. OPEN ISSUES AND CHALLENGES Linked Data Management Aug 2010 #73
  • 74. Linked Data – open issues • LOD is hard to comprehend – Schema diversity & proliferation • Quality of data is poor – Many of the datasets are well positioned to serve as “master data” but their quality is very far from the enterprise standards – No kind of consistency is guarantees • Issues with reliability of data end-points – High down-time is not unusual – There is no SLA provided Linked Data Management Aug 2010 #74
  • 75. Linked Data – open issues (2) • Querying of linked data is slow – Data is distributed on the web • Federated SPARQL queries are slow – Even single SPARQL endpoints can be slow • Most end-points are experimental/research projects with no resources for quality guarantees • Licensing issues – majority of datasets carry no explicit open license – Copyright-based licenses (CC) are difficult to apply to factual data Linked Data Management Aug 2010 #75
  • 76. Linked Data – licensing issues (c) Leigh Dodds Linked Data Management Aug 2010 #76
  • 77. Weaving The Pedantic Web • Initiative of DERI / KIT • http://pedantic-web.org • Goals – Analyse most common errors in RDF publishing – Propose possible approaches to avoid (publisher side) or deal with (consumer side) such errors Linked Data Management Aug 2010 #77
  • 78. Weaving The Pedantic Web (2) Category Problem Dereferencability issues Incompleteness No structured data available Misreported content types RDF/XML Syntax Errors Atypical use of collections, containers and reification Use of undefined classes and properties Misplaced classes/properties Incoherence Misuse of owl:DatatypeProperty (ObjectProperty) Members of deprecated classes/properties Malformed datatype literals Literals incompatible with datatype range Ontology hijacking Hijacking Bogus owl:InverseFunctionalProperty values Ontology hijacking Inconsistencies Literals incompatible with datatype range OWL inconsistencies Linked Data Management Aug 2010 #78
  • 79. Weaving The Pedantic Web (3) • Dereferencability issues – URI lookup returns an error (violates 3rd LOD principle) – Or results in a redirect (with the wrong code) • No structured data available – RDF data should be returned • Misreported content types – A consumer application needs the correct content type in order to decide if it can consume the content (should be application/rdf+xml) Linked Data Management Aug 2010 #79
  • 80. Weaving The Pedantic Web (4) • RDF/XML Syntax Errors • Atypical use of collections, containers and reification • Use of undefined classes and properties – although not prohibited, ad-hoc/undefined classes and properties lead to more complex data integration and less effective inferences • Misplaced classes/properties – Sometimes, a URI defined as a class is used as a property or vice versa (such usage ruins the inference) Linked Data Management Aug 2010 #80
  • 81. Weaving The Pedantic Web (5) • Members of deprecated classes/properties • Malformed datatype literals / Literals incompatible with datatype range • Bogus owl:InverseFunctionalProperty values – When two resources have the same value for an inverse- functional property the reasoner will treat them as equivalent • Ontology hijacking – Redefinition by 3rd parties of external classes/properties affects the reasoner results Linked Data Management Aug 2010 #81
  • 82. INTRODUCTION TO FACTFORGE AND LINKEDLIFEDATA Linked Data Management Aug 2010 #82
  • 83. Reason-able Views to the Web of Data • Reason-able views represent an approach for reasoning and management of linked data – Integrate selected datasets and ontologies in one dataset • Clean up, post-process and enrich the datasets if necessary – Load the compound dataset in a single RDF repository – Perform inference with respect to tractable OWL dialects – Define sample queries against the integrated dataset Linked Data Management Aug 2010 #83
  • 84. Reason-able Views: Objectives • Make reasoning and query evaluation feasible • Guarantee a basic level of consistency – The sample queries guarantee provide “regression tests” w.r.t. the consistency of the data • Guarantee availability • Better usability for querying and data exploration – URI auto-complete and RDF search – Sample queries provide re-usable extraction patterns, which reduce the time for learning about new datasets and their inter-relations Linked Data Management Aug 2010 #84
  • 85. Two Reason-able Views to the Web of Linked Data • FactForge – Integrates some of the most central LOD datasets – General-purpose information (not specific to a domain) – 1.2B explicit plus 1B inferred statements (10B retrievable) – The largest upper-level knowledge base – http://www.FactForge.net/ • Linked Life Data – 25 of the most popular life-science datasets – 2.7B explicit and 1.4B inferred triples – http://www.LinkedLifeData.com Linked Data Management Aug 2010 #85
  • 86. FactForge and LinkedLifeData data sources FactForge LinkedLifeData Linked Data Management Aug 2010 #86
  • 87. FactForge: Fast Track to the Center of the Web of Data • Datasets – DBpedia, Freebase, Geonames, UMBEL, MusicBrainz, Wordnet, CIA World Factbook, Lingvoj • Ontologies – Dublin Core, SKOS, RSS, FOAF • Inference – materialization with respect to OWL 2 RL – owl:sameAs optimization in OWLIM allows reduction of the indices without loss of semantics Linked Data Management Aug 2010 #87
  • 88. FactForge: Fast Track to the Center of the Web of Data (2) • Free public service at http://www.FactForge.net – Very fast incremental URI auto-completion – Querying and exploration through Forest and Tabulator – RDF Search: retrieve ranked list of URIs by keywords – SPARQL end-point Linked Data Management Aug 2010 #88
  • 89. FactForge – Loading and Inference Statistics Explicit Inferred Total # of Entities Indexed Indexed Stored ('000 of Inferred Dataset Triples Triples Triples nodes in closure ('000) ('000) ('000) the graph) ratio Sechmata and ontologies 11 7 18 6 0.6 DBpedia (categories) 2,877 42,587 45,464 1,144 14.8 DBpedia (sameAs) 5,544 566 6,110 8,464 0.1 UMBEL 5,162 42,212 47,374 500 8.2 Lingvoj 20 863 883 18 43.8 CIA Factbook 76 4 80 25 0.1 Wordnet 2,281 9,296 11,577 830 4.1 Geonames 91,908 125,025 216,933 33,382 1.4 DBpedia core 560,096 198,043 758,139 127,931 0.4 Freebase 463,689 40,840 504,529 94,810 0.1 MusicBrainz 45,536 421,093 466,630 15,595 9.2 Total 1,177,961 881,224 2,058,185 283,253 0.7 Linked Data Management Aug 2010 #89
  • 90. Fact Forge – post-processing • Several kinds of post-processing were performed – Goal: to allow easier navigation and browsing – E.g. preferred labels, text snippets, RDF Rank for nodes • Results available through system predicates • Final Statistics – Number of entities (RDF graph nodes): 405M – Number of inserted statements (NIS): 1.2B – Number of stored statements (NSS): 2.2B – Number of retrievable statements (NRS): 9.8B • 7.6B statements “compressed” through OWLIM’s owl:sameAs optimisation Linked Data Management Aug 2010 #90
  • 91. Guess who is the most popular German entertainer? (run the query at http://factforge.net/sparql) SELECT * WHERE { ?Person dbp-ont:birthPlace ?BirthPlace ; rdf:type opencyc:Entertainer ; ff:hasRDFRank ?RR . ?BirthPlace geo-ont:parentFeature dbpedia:Germany . } ORDER BY DESC(?RR) LIMIT 100 • Without FF, answering such queries in real time is impossible • Used data from: DBPedia, Geonames, UMBEL and MusicBrainz • The most popular entertainer born in Germany is: • Asking factual questions to a global KB can bring unexpected and F. Nietzsche strange results • We ask who is the most popular person, who qualifies as an entertainer • It uses a simple notion of popularity – RDFRank Linked Data Management Aug 2010 #91
  • 92. Linked Life Data • Quick facts – Integrates more than 25 popular data sources – 5 billion RDF statements, 0.5 billion entities – Querying & exploration of integrated data – Public SPARQL end point – http://linkedlifedata.com/ Linked Data Management Aug 2010 #92
  • 93. FactForge and LinkedLifeData data sources FactForge LinkedLifeData Linked Data Management Aug 2010 #93
  • 94. Linked Life Data – ETL process Linked Data Management Aug 2010 #94
  • 95. Linked Life Data – ETL process (2) Data Source Identification Flat files OBO files XML RDBMS RDF Special tailored OBO to SKOS RDBMS to Custom XSLT transformer converter RDF formatter RDF warehouse Instance Semantic Reasoner Mappings Annotations Linked Data Management Aug 2010 #95
  • 96. Linked Life Data – triple distribution 4% Genes & Protieins 27% Documents Ontologies & Thesauri Dbpedia Linked Open Drug Data 50% BioPAX 11% Inferred Semanitc Annotations 7% 0% 1% 0% Linked Data Management Aug 2010 #96
  • 97. Linked Life Data – Complex Cross-domain Query filter human Physiological genes Molecular process Technique participate in cause Gene Molecular Interaction analyzed by Disease express protein Protein curated interaction Drugs treated with target Linked Data Management Aug 2010 #97
  • 98. New Type of Possible Query #1 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> Select drugs related to PREFIX biopax2: <http://www.biopax.org/release/biopax- level2.owl#> PREFIX uniprot: <http://purl.uniprot.org/core/> asthma that are linked to PREFIX drugbank: <http://www4.wiwiss.fu- berlin.de/drugbank/resource/drugbank/> a curated molecular SELECT DISTINCT ?fullname ?drugname interaction in the WHERE { ?interaction rdf:type biopax2:physicalInteraction . ?interaction biopax2:PARTICIPANTS ?participant . literature where the ?participant biopax2:PHYSICAL-ENTITY ?physicalEntity . ?physicalEntity skos:exactMatch ?protein . protein is known to cause ?protein uniprot:classifiedWith <http://purl.uniprot.org/go/0006954>. inflammatory response ?protein uniprot:recommendedName ?name. ?name uniprot:fullName ?fullname . ?target skos:exactMatch ?protein . ?drug drugbank:target ?target . ?drug drugbank:genericName ?drugname . ?drug drugbank:indication ?indication . } The red graph patterns indicate the usage of mapping rules. Linked Data Management Aug 2010 #98
  • 99. New Type of Possible Query #2 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> Select all located in Y- PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX gene: chromosome, human <http://linkedlifedata.com/resource/entrezgene/> PREFIX core: <http://purl.uniprot.org/core/> genes with known PREFIX biopax2: <http://www.biopax.org/release/biopax- level2.owl#> PREFIX lifeskim: molecular interactions, <http://linkedlifedata.com/resource/lifeskim/> PREFIX umls: <http://linkedlifedata.com/resource/umls/> which are analysed with PREFIX pubmed: <http://linkedlifedata.com/resource/pubmed/> 'Transfection' SELECT distinct ?genedescription ?prefLabel ?pmid WHERE { ?interaction rdf:type biopax2:interaction . ?interaction biopax2:PARTICIPANTS ?p . ?p biopax2:PHYSICAL-ENTITY ?protein . ?protein skos:exactMatch ?uniprotaccession . ?uniprotaccession core:organism <http://purl.uniprot.org/taxonomy/9606> . ?geneid gene:uniprotAccession ?uniprotaccession . ?geneid gene:description ?genedescription . ?geneid gene:pubmed ?pmid . ?geneid gene:chromosome 'Y' . ?pmid lifeskim:mentions ?umlsid . ?umlsid skos:prefLabel 'Transfection' . ?umlsid skos:prefLabel ?prefLabel . } Linked Data Management Aug 2010 #99
  • 100. New Type of Possible Query #3 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> Select all participating in PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> interactions human genes PREFIX gene: <http://linkedlifedata.com/resource/entrezgene/> PREFIX core: <http://purl.uniprot.org/core/> PREFIX biopax2: <http://www.biopax.org/release/biopax- which are drug target and level2.owl#> PREFIX lifeskim: <http://linkedlifedata.com/resource/lifeskim/> are analysed with PREFIX umls: <http://linkedlifedata.com/resource/umls/> PREFIX pubmed: <http://linkedlifedata.com/resource/pubmed/> PREFIX drugbank: <http://www4.wiwiss.fu- 'Transfection' berlin.de/drugbank/resource/drugbank/> SELECT distinct ?genedescription ?prefLabel ?drugname ?pmid WHERE { ?interaction rdf:type biopax2:interaction . ?interaction biopax2:PARTICIPANTS ?p . ?p biopax2:PHYSICAL-ENTITY ?protein . ?protein skos:exactMatch ?uniprotaccession . ?uniprotaccession core:organism <http://purl.uniprot.org/taxonomy/9606> . ?geneid gene:uniprotAccession ?uniprotaccession . ?geneid gene:description ?genedescription . ?geneid gene:pubmed ?pmid . ?pmid lifeskim:mentions ?umlsid . ?umlsid skos:prefLabel 'Transfection' . ?umlsid skos:prefLabel ?prefLabel . ?target skos:closeMatch ?geneid. ?drug drugbank:target ?target . ?drug rdfs:label ?drugname . } Linked Data Management Aug 2010 #100
  • 101. THE “MODIGLIANI TEST” FOR THE SEMANTIC WEB Linked Data Management Aug 2010 #101
  • 102. The tipping point for the Semantic Web • http://www.readwriteweb.com/archives/the_modigliani_test_semantic_ web_tipping_point.php Linked Data Management Aug 2010 #102
  • 103. The tipping point for the Semantic Web (2) • Richard McManus (ReadWriteWeb) – “…the tipping point for the Semantic Web may be when one can … deliver – using Linked Data – a comprehensive list of locations of original Modigliani art works …” (Apr, 2010) • FactForge was the first system to pass the Modigliani test – Using data from 3 different datasets – Neither DBPedia, not Freebase alone can pass the test Linked Data Management Aug 2010 #103
  • 104. Passing the test with FactForge Linked Data Management Aug 2010 #104
  • 105. Passing the test with FactForge (2) PREFIX fb: <http://rdf.freebase.com/ns/> PREFIX dbpedia: <http://dbpedia.org/resource/> PREFIX dbp-prop: <http://dbpedia.org/property/> PREFIX dbp-ont: <http://dbpedia.org/ontology/> PREFIX umbel-sc: <http://umbel.org/umbel/sc/> PREFIX ff: <http://factforge.net/> SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_loc ?city_db_cit WHERE { ?painting fb:visual_art.artwork.artist dbpedia:Amedeo_Modigliani ; fb:visual_art.artwork.owners [fb:visual_art.artwork_owner_relationship.owner ?ow]; ff:preferredLabel ?painting_l . ?ow ff:preferredLabel ?owner_l . OPTIONAL { ?ow fb:location.location.containedby [ff:preferredLabel ?city_fb_con]} . OPTIONAL { ?ow dbp-ont:city [ff:preferredLabel ?city_db_cit] } . OPTIONAL { ?ow dbp-prop:location ?loc . ?loc rdf:type umbel-sc:City ; ff:preferredLabel ?city_db_loc } } Linked Data Management Aug 2010 #105
  • 106. Summary of this module • Linked Data is a set of principles that allows publishing, querying and browsing of RDF data, distributed across different servers – similar to the way HTML is currently published & consumed • Linked Data principles – Unambiguous identifiers for objects (resources) – Use the structure of the web – Make is easy to discover information about an object (resource) – Link the object (resource) to related objects Linked Data Management Aug 2010 #106
  • 107. Summary of this module (2) • As of Sep 2010 the Linked Open Data cloud includes 180+ interlinked datasets with 20+ billion triples • Open Government Data is still not RDF-ized completely • Linked Data open issues – Data quality, end-point reliability & performance, licensing • FactForge and LinkedLifeData integrated subsets of the LOD cloud in order to provide better data quality, query performance and additional inference Linked Data Management Aug 2010 #107