SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
Linked Data Basics



           Anja Jentzsch, Freie Universität Berlin


                       17 April 2012
Tutorial: Practical Cross-Dataset Queries on the Web of Data
                 WWW2012, Lyon, France



                                                               1
Architecture of the classic Web
Single global document space
                                               Web             Search
                                             Browsers          Engines


Small set of simple standards
1. HTML as document format
2. HTTP URLs as
                                      HTML              HTML             HTML
  •   globally unique IDs                     hyper-
                                               links
  •   retrieval mechanism
3. Hyperlinks to connect everything

                                       A                  B                C




                                                                                2
Web 2.0 APIs and Mashups
No single global data space


Shortcomings
1. APIs have proprietary interfaces                    Mashup

2. Mashups are based on a fixed set of data
   sources
3. No hyperlinks between data items within   Web   Web     Web   Web
                                             API   API     API   API
   different APIs



                                             A     B        C    D



                                                                       3
Web APIs slice the Web into Walled Gardens




Image: Bob Jagensdorf, http://flickr.com/photos/darwinbell/, CC-BY   4
Linked Data
Extend the Web with a single global data space
1. by using RDF to publish structured data on the Web
2. by setting links between data items within different data sources



        RDF            RDF            RDF             RDF              RDF


        RDF            RDF            RDF             RDF              RDF


                RDF           RDF             RDF             RDF
               Links         Links           Links           Links



          A            B              C                D               E


                                                                             5
Linked Data Principles
Set of best practices for publishing structured data on the Web in
accordance with the general architecture of the Web.

1.   Use URIs as names for things.
2.   Use HTTP URIs so that people can look up those names.
3.   When someone looks up a URI, provide useful RDF information.
4.   Include RDF statements that link to other URIs so that they can discover
     related things.
                           Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2006




                                                                                              6
The RDF Data Model

           rdf:type
pd:chris              foaf:Person
 foaf:name Chris Bizer
 foaf:based_near
                 dbpedia:Berlin




                                    7
Data Items are identified with HTTP URIs

             rdf:type
  pd:chris              foaf:Person
   foaf:name Chris Bizer
   foaf:based_near
                   dbpedia:Berlin
pd:chris = http://www.bizer.de#chris
dbpedia:Berlin = http://dbpedia.org/resource/Berlin



                                                      8
Resolving URIs over the Web

           rdf:type
pd:chris              foaf:Person
 foaf:name Chris Bizer                     3.450.889
 foaf:based_near               dp:population
                 dbpedia:Berlin
                              skos:subject
                                    dp:Cities_in_Germany



                                                           9
Dereferencing URIs over the Web

           rdf:type
pd:chris              foaf:Person
  foaf:name Chris Bizer                     3.450.889
  foaf:based_near                dp:population
                  dbpedia:Berlin
                               skos:subject
                    skos:subject
 dbpedia:Hamburg                    dp:Cities_in_Germany
                    skos:subject
dbpedia:Muenchen

                                                           10
RDF
• RDF is just a data model, it requires a serialization format
   • For transmission over the network
   • For storage as files
• Multiple serialization formats have been defined
   • RDF/XML
   • Turtle
   • N-Triples
   • RDFa
   • ...
• It’s all triples!
    •   Syntax doesn’t matter much and can be chosen case-by-case for
        pragmatic reasons
                                                                        11
Properties of the Web of Linked Data
•   Global, distributed data space build on a simple set of standards
    •   RDF, URIs, HTTP
•   Entities are connected by links
    •   creating a global data graph that spans data sources and
    •   enables the discovery of new data sources
•   Provides for data-coexistence
    •   Everyone can publish data to the Web of Linked Data
    •   Everyone can express their personal view on things
    •   Everybody can use the vocabularies/schema that they like


                                                                        12
W3C Linking Open Data Project
•   Grassroots community effort to
    •   publish existing open license datasets as Linked Data on the Web
    •   interlink things between different data sources




                                                                           13
LOD Data Sets on the Web: May 2007




•   12 data sets
•   Over 500 million RDF triples
•   Around 120,000 RDF links between data sources   14
LOD Data Sets on the Web: November 2007




•   28 data sets
                                      15
LOD Data Sets on the Web: September 2008




•   45 data sets
•   Over 2 billion RDF triples         16
LOD Data Sets on the Web: July 2009




•   95 data sets
•   Over 6.5 billion RDF triples              17
LOD Data Sets on the Web: September 2010




•   203 data sets
•   Over 24,7 billion RDF triples
•   Over 436 million RDF links between data sources   18
LOD Data Sets on the Web: September 2011




•   295 data sets
•   Over 31 billion RDF triples
•   Over 504 million RDF links between data sources   19
LOD Data Set statistics as of 09/2011




LOD Cloud Data Catalog on CKAN
•   http://www.ckan.net/group/lodcloud
More statistics
•   http://lod-cloud.net/state/
                                              20
Uptake in the Government Domain




• The EU is pushing Linked Data (LOD2, LATC, Eurostat)
• W3C Government Linked Data (GLD) Working Group
Uptake in the Libraries Community
•   Institutions publishing Linked Data
    •   Library of Congress (subject headings)
    •   German National Library (PND dataset and subject headings)
    •   Swedish National Library (Libris - catalog)
    •   Hungarian National Library (OPAC and Digital Library)
    •   British National Library
    •   Europeana project




                                                                     22
Uptake in the Libraries Community
•   W3C Library Linked Data Incubator Group (2010)
•   OKFN Working Group on Bibliographic Data (2010)


•   Goals:
    •   Integrate Library Catalogs on global scale
    •   Interconnect resources between repositories (by topic, by location, by
        historical period, by ...)




                                                                                 23
Uptake in the Media Industry
                •   Publish data as RDF or embed as
                    RDFa
                •   Goal: Drive traffic to websites via
                    search engines




                                                     24
schema.org




•   jointly proposed vocabularies for embedding data into HTML pages (Microdata)
•   available since June 2011                                                      25
Linked Data Applications

  Linked Data               Linked Data                 Search
   Browsers                  Mashups                    Engines




Thing           Thing           Thing           Thing             Thing


Thing           Thing           Thing           Thing             Thing

        typed           typed           typed            typed
        links           links           links            links



  A              B               C               D                 E

                                                                          26
27
28
29
Lower Data Integration Costs
The overall data integration effort is split between
the data publisher, the data consumer and third parties.
• Data Publisher
    • publishes data as RDF
    • sets identity links
    • reuses terms or publishes mappings
• Third Parties
    • set identity links pointing at your data
    • publish mappings to the Web
• Data Consumer
    • has to do the rest
    • using record linkage and schema matching techniques   30
Is your data 5 star?

★              Make your stuff available on the Web (whatever format) under
               an open license.

★★             Make it available as structured data (e.g., Excel instead of image
               scan of a table) so that it can be reused.

★★★            Use non-proprietary, open formats (e.g., CSV instead of Excel).

★★★★           Use URIs to identify things, so that people can point at your stuff
               and serve RDF from it.

★ ★ ★ ★ ★ Link your data to other data to provide context.

                       Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2010
                                                                                               31
How to publish Linked Data
Tasks:
1.   Make data available as RDF via HTTP
2.   Set RDF links pointing at other data sources
3.   Make your data self-descriptive
4.   Reuse common vocabularies




Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data
Space
http://linkeddatabook.com/
                                                                               32
Make Data available as RDF via HTTP
•Ready to use tools (examples)
    • D2R Server
       • provides for mapping relational
         databases into RDF and for
         serving them as Linked Data
   • Pubby
       • Linked Data Frontend for
         SPARQL Endpoints
   • More tools
       • http://esw.w3.org/TaskForces/
         CommunityProjects/
         LinkingOpenData/PublishingTools
                                           33
Set RDF links to other data sources
• Examples of RDF links

     <http://dbpedia.org/resource/Berlin> owl:sameAs <http://
                     sws.geonames.org/2950159> .


  <http://richard.cyganiak.de/foaf.rdf#cygri> foaf:topic_interest
            <http://dbpedia.org/resource/Semantic_Web> .


 <http://example-bookshop.com/book006251587X> owl:sameAs <http://
      www4.wiwiss.fu-berlin.de/bookmashup/books/006251587X> .




                                                                    34
How to generate RDF links?
• Pattern-based approaches
   • Exploit naming conventions within URIs (for instance ISBNs, ISINs, …)
• Similarity-based approaches
   • Compare items within different data sources using various similarity metrics


• Ready to use tools (Examples)
   • Silk Link Discovery Framework
       • provides a declarative language for specifying link conditions
         which may combine different similarity metrics
   • More tools
       • http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/
         EquivalenceMining
                                                                                35
Make your Data Self-Descriptive
• Increase the usefulness of your data and ease data integration
• Aspects of self-descriptiveness
    •   Enable clients to retrieve the schema
    •   Reuse terms from common vocabularies
    •   Publish schema mappings for proprietary terms
    •   Provide provenance metadata
    •   Provide licensing metadata
    •   Provide data-set-level metadata using voiD
    •   Refer to additional access methods using voiD


                                                                   36
Enable Clients to retrieve the Schema
  Clients can resolve the URIs that identify vocabulary terms in
          order to get their RDFS or OWL definitions.


                                                Some data on the Web
             <http://richard.cyganiak.de/foaf.rdf#cygri>
                       foaf:name "Richard Cyganiak" ;
               rdf:type <http://xmlns.com/foaf/0.1/Person> .


             Resolve unknown term http://xmlns.com/foaf/0.1/Person

                                              RDFS or OWL definition
                 <http://xmlns.com/foaf/0.1/Person>
                              rdf:type owl:Class ;
                              rdfs:label "Person";
              rdfs:subClassOf <http://xmlns.com/foaf/0.1/Agent> ;
             rdfs:subClassOf <http://xmlns.com/wordnet/1.6/Agent> .

                                                                       37
Reuse Terms from Common Vocabularies
• Common Vocabularies
   • Friend-of-a-Friend for describing people and their social network
   • SIOC for describing forums and blogs
   • SKOS for representing topic taxonomies
   • Organization Ontology for describing the structure of organizations
   • GoodRelations provides terms for describing products and business entities
   • Music Ontology for describing artists, albums, and performances
   • Review Vocabulary provides terms for representing reviews

• Common sources of identifiers (URIs) for real world objects
   • LinkedGeoData and Geonames locations
   • GeneID and UniProt life science identifiers                                   38
Linked Data Sets: Distribution of used
            vocabularies




                                         39
Conclusion
• Linked Data provides a standardized data access interface
• Linked Data allows for the development of a variety of tools to integrate,
  enhance and and view the data
• The Web of Data is growing rapidly

    •   There are active deployment communities in different domains
•   Web search is evolving into query answering
    •   Search engines will increasingly rely on structured data from the Web




                                                                                40
Thanks
                               Questions?

                              Email: anja@anjeve.de
                              Twitter: @anjeve


References
• Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space
   http://linkeddatabook.com/
• Christian Bizer, Tom Heath, Tim Berners-Lee: Linked Data – The Story So Far
   http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf
• Linking Open Data Project Wiki
   http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData


                                                                                41

Más contenido relacionado

La actualidad más candente

Linked Data Integration and semantic web
Linked Data Integration and semantic webLinked Data Integration and semantic web
Linked Data Integration and semantic web
Diego Pessoa
 
Web of Data Usage Mining
Web of Data Usage MiningWeb of Data Usage Mining
Web of Data Usage Mining
Markus Luczak-Rösch
 
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
National Information Standards Organization (NISO)
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked Data
EUCLID project
 

La actualidad más candente (20)

Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
Linked Data and OCLC
Linked Data and OCLCLinked Data and OCLC
Linked Data and OCLC
 
Linked Data as an enabling framework for resource discovery across libraries,...
Linked Data as an enabling framework for resource discovery across libraries,...Linked Data as an enabling framework for resource discovery across libraries,...
Linked Data as an enabling framework for resource discovery across libraries,...
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Linked Data Integration and semantic web
Linked Data Integration and semantic webLinked Data Integration and semantic web
Linked Data Integration and semantic web
 
Linked Data for Libraries: Experiments between Cornell, Harvard and Stanford
Linked Data for Libraries: Experiments between Cornell, Harvard and StanfordLinked Data for Libraries: Experiments between Cornell, Harvard and Stanford
Linked Data for Libraries: Experiments between Cornell, Harvard and Stanford
 
RDF and Open Linked Data, a first approach
RDF and Open Linked Data, a first approachRDF and Open Linked Data, a first approach
RDF and Open Linked Data, a first approach
 
WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
The state of the art in Linked Data
The state of the art in Linked DataThe state of the art in Linked Data
The state of the art in Linked Data
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Web of Data Usage Mining
Web of Data Usage MiningWeb of Data Usage Mining
Web of Data Usage Mining
 
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
 
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti... NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 
Linked Data Management
Linked Data ManagementLinked Data Management
Linked Data Management
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked Data
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
Thompson 6-jun15-final
Thompson 6-jun15-finalThompson 6-jun15-final
Thompson 6-jun15-final
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarship
 

Similar a Linked Data Basics

Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
Anja Jentzsch
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
Anja Jentzsch
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Cory Lampert
 
Link Sets And Why They Are Important (EDF2012)
Link Sets And Why They Are Important (EDF2012)Link Sets And Why They Are Important (EDF2012)
Link Sets And Why They Are Important (EDF2012)
Anja Jentzsch
 
Vila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-reduxVila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-redux
LIS EPI Meeting
 

Similar a Linked Data Basics (20)

Linked Data
Linked DataLinked Data
Linked Data
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
 
Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...
Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...
Semantic Web (IS 535 presentation) by ITRL students Deborah Ratliff and Maril...
 
Linked Data
Linked DataLinked Data
Linked Data
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the Software
 
Finding Data Sets
Finding Data SetsFinding Data Sets
Finding Data Sets
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Linked Data
Linked DataLinked Data
Linked Data
 
Linked data 20171106
Linked data 20171106Linked data 20171106
Linked data 20171106
 
Link Sets And Why They Are Important (EDF2012)
Link Sets And Why They Are Important (EDF2012)Link Sets And Why They Are Important (EDF2012)
Link Sets And Why They Are Important (EDF2012)
 
Using Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case studyUsing Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case study
 
Vila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-reduxVila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-redux
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data Support
 
ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2
 
GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
 

Último

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
 

Último (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

Linked Data Basics

  • 1. Linked Data Basics Anja Jentzsch, Freie Universität Berlin 17 April 2012 Tutorial: Practical Cross-Dataset Queries on the Web of Data WWW2012, Lyon, France 1
  • 2. Architecture of the classic Web Single global document space Web Search Browsers Engines Small set of simple standards 1. HTML as document format 2. HTTP URLs as HTML HTML HTML • globally unique IDs hyper- links • retrieval mechanism 3. Hyperlinks to connect everything A B C 2
  • 3. Web 2.0 APIs and Mashups No single global data space Shortcomings 1. APIs have proprietary interfaces Mashup 2. Mashups are based on a fixed set of data sources 3. No hyperlinks between data items within Web Web Web Web API API API API different APIs A B C D 3
  • 4. Web APIs slice the Web into Walled Gardens Image: Bob Jagensdorf, http://flickr.com/photos/darwinbell/, CC-BY 4
  • 5. Linked Data Extend the Web with a single global data space 1. by using RDF to publish structured data on the Web 2. by setting links between data items within different data sources RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF Links Links Links Links A B C D E 5
  • 6. Linked Data Principles Set of best practices for publishing structured data on the Web in accordance with the general architecture of the Web. 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful RDF information. 4. Include RDF statements that link to other URIs so that they can discover related things. Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2006 6
  • 7. The RDF Data Model rdf:type pd:chris foaf:Person foaf:name Chris Bizer foaf:based_near dbpedia:Berlin 7
  • 8. Data Items are identified with HTTP URIs rdf:type pd:chris foaf:Person foaf:name Chris Bizer foaf:based_near dbpedia:Berlin pd:chris = http://www.bizer.de#chris dbpedia:Berlin = http://dbpedia.org/resource/Berlin 8
  • 9. Resolving URIs over the Web rdf:type pd:chris foaf:Person foaf:name Chris Bizer 3.450.889 foaf:based_near dp:population dbpedia:Berlin skos:subject dp:Cities_in_Germany 9
  • 10. Dereferencing URIs over the Web rdf:type pd:chris foaf:Person foaf:name Chris Bizer 3.450.889 foaf:based_near dp:population dbpedia:Berlin skos:subject skos:subject dbpedia:Hamburg dp:Cities_in_Germany skos:subject dbpedia:Muenchen 10
  • 11. RDF • RDF is just a data model, it requires a serialization format • For transmission over the network • For storage as files • Multiple serialization formats have been defined • RDF/XML • Turtle • N-Triples • RDFa • ... • It’s all triples! • Syntax doesn’t matter much and can be chosen case-by-case for pragmatic reasons 11
  • 12. Properties of the Web of Linked Data • Global, distributed data space build on a simple set of standards • RDF, URIs, HTTP • Entities are connected by links • creating a global data graph that spans data sources and • enables the discovery of new data sources • Provides for data-coexistence • Everyone can publish data to the Web of Linked Data • Everyone can express their personal view on things • Everybody can use the vocabularies/schema that they like 12
  • 13. W3C Linking Open Data Project • Grassroots community effort to • publish existing open license datasets as Linked Data on the Web • interlink things between different data sources 13
  • 14. LOD Data Sets on the Web: May 2007 • 12 data sets • Over 500 million RDF triples • Around 120,000 RDF links between data sources 14
  • 15. LOD Data Sets on the Web: November 2007 • 28 data sets 15
  • 16. LOD Data Sets on the Web: September 2008 • 45 data sets • Over 2 billion RDF triples 16
  • 17. LOD Data Sets on the Web: July 2009 • 95 data sets • Over 6.5 billion RDF triples 17
  • 18. LOD Data Sets on the Web: September 2010 • 203 data sets • Over 24,7 billion RDF triples • Over 436 million RDF links between data sources 18
  • 19. LOD Data Sets on the Web: September 2011 • 295 data sets • Over 31 billion RDF triples • Over 504 million RDF links between data sources 19
  • 20. LOD Data Set statistics as of 09/2011 LOD Cloud Data Catalog on CKAN • http://www.ckan.net/group/lodcloud More statistics • http://lod-cloud.net/state/ 20
  • 21. Uptake in the Government Domain • The EU is pushing Linked Data (LOD2, LATC, Eurostat) • W3C Government Linked Data (GLD) Working Group
  • 22. Uptake in the Libraries Community • Institutions publishing Linked Data • Library of Congress (subject headings) • German National Library (PND dataset and subject headings) • Swedish National Library (Libris - catalog) • Hungarian National Library (OPAC and Digital Library) • British National Library • Europeana project 22
  • 23. Uptake in the Libraries Community • W3C Library Linked Data Incubator Group (2010) • OKFN Working Group on Bibliographic Data (2010) • Goals: • Integrate Library Catalogs on global scale • Interconnect resources between repositories (by topic, by location, by historical period, by ...) 23
  • 24. Uptake in the Media Industry • Publish data as RDF or embed as RDFa • Goal: Drive traffic to websites via search engines 24
  • 25. schema.org • jointly proposed vocabularies for embedding data into HTML pages (Microdata) • available since June 2011 25
  • 26. Linked Data Applications Linked Data Linked Data Search Browsers Mashups Engines Thing Thing Thing Thing Thing Thing Thing Thing Thing Thing typed typed typed typed links links links links A B C D E 26
  • 27. 27
  • 28. 28
  • 29. 29
  • 30. Lower Data Integration Costs The overall data integration effort is split between the data publisher, the data consumer and third parties. • Data Publisher • publishes data as RDF • sets identity links • reuses terms or publishes mappings • Third Parties • set identity links pointing at your data • publish mappings to the Web • Data Consumer • has to do the rest • using record linkage and schema matching techniques 30
  • 31. Is your data 5 star? ★ Make your stuff available on the Web (whatever format) under an open license. ★★ Make it available as structured data (e.g., Excel instead of image scan of a table) so that it can be reused. ★★★ Use non-proprietary, open formats (e.g., CSV instead of Excel). ★★★★ Use URIs to identify things, so that people can point at your stuff and serve RDF from it. ★ ★ ★ ★ ★ Link your data to other data to provide context. Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2010 31
  • 32. How to publish Linked Data Tasks: 1. Make data available as RDF via HTTP 2. Set RDF links pointing at other data sources 3. Make your data self-descriptive 4. Reuse common vocabularies Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space http://linkeddatabook.com/ 32
  • 33. Make Data available as RDF via HTTP •Ready to use tools (examples) • D2R Server • provides for mapping relational databases into RDF and for serving them as Linked Data • Pubby • Linked Data Frontend for SPARQL Endpoints • More tools • http://esw.w3.org/TaskForces/ CommunityProjects/ LinkingOpenData/PublishingTools 33
  • 34. Set RDF links to other data sources • Examples of RDF links <http://dbpedia.org/resource/Berlin> owl:sameAs <http:// sws.geonames.org/2950159> . <http://richard.cyganiak.de/foaf.rdf#cygri> foaf:topic_interest <http://dbpedia.org/resource/Semantic_Web> . <http://example-bookshop.com/book006251587X> owl:sameAs <http:// www4.wiwiss.fu-berlin.de/bookmashup/books/006251587X> . 34
  • 35. How to generate RDF links? • Pattern-based approaches • Exploit naming conventions within URIs (for instance ISBNs, ISINs, …) • Similarity-based approaches • Compare items within different data sources using various similarity metrics • Ready to use tools (Examples) • Silk Link Discovery Framework • provides a declarative language for specifying link conditions which may combine different similarity metrics • More tools • http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/ EquivalenceMining 35
  • 36. Make your Data Self-Descriptive • Increase the usefulness of your data and ease data integration • Aspects of self-descriptiveness • Enable clients to retrieve the schema • Reuse terms from common vocabularies • Publish schema mappings for proprietary terms • Provide provenance metadata • Provide licensing metadata • Provide data-set-level metadata using voiD • Refer to additional access methods using voiD 36
  • 37. Enable Clients to retrieve the Schema Clients can resolve the URIs that identify vocabulary terms in order to get their RDFS or OWL definitions. Some data on the Web <http://richard.cyganiak.de/foaf.rdf#cygri> foaf:name "Richard Cyganiak" ; rdf:type <http://xmlns.com/foaf/0.1/Person> . Resolve unknown term http://xmlns.com/foaf/0.1/Person RDFS or OWL definition <http://xmlns.com/foaf/0.1/Person> rdf:type owl:Class ; rdfs:label "Person"; rdfs:subClassOf <http://xmlns.com/foaf/0.1/Agent> ; rdfs:subClassOf <http://xmlns.com/wordnet/1.6/Agent> . 37
  • 38. Reuse Terms from Common Vocabularies • Common Vocabularies • Friend-of-a-Friend for describing people and their social network • SIOC for describing forums and blogs • SKOS for representing topic taxonomies • Organization Ontology for describing the structure of organizations • GoodRelations provides terms for describing products and business entities • Music Ontology for describing artists, albums, and performances • Review Vocabulary provides terms for representing reviews • Common sources of identifiers (URIs) for real world objects • LinkedGeoData and Geonames locations • GeneID and UniProt life science identifiers 38
  • 39. Linked Data Sets: Distribution of used vocabularies 39
  • 40. Conclusion • Linked Data provides a standardized data access interface • Linked Data allows for the development of a variety of tools to integrate, enhance and and view the data • The Web of Data is growing rapidly • There are active deployment communities in different domains • Web search is evolving into query answering • Search engines will increasingly rely on structured data from the Web 40
  • 41. Thanks Questions? Email: anja@anjeve.de Twitter: @anjeve References • Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space http://linkeddatabook.com/ • Christian Bizer, Tom Heath, Tim Berners-Lee: Linked Data – The Story So Far http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf • Linking Open Data Project Wiki http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData 41