1. Linked Data & Semantic Web Technology
Linked Data Usecases
Dr. Myungjin Lee
2. Linked Data & Semantic Web Technology
Agenda
• Introduction of the Linked Data
• Linked Data for Cross-Domain
• Linked Geographic Data
• Linked Government Data
• Linked Media Data
• Linked Data for User Generated Content
• Linked Publication Data
• Linked Life Science Data
2
3. Linked Data & Semantic Web Technology3
Introduction of
the Linked Data
4. Linked Data & Semantic Web Technology
What is Linked Data?
• a method of publishing structured data so that
data can be interlinked and become more useful
• based on standard Web technologies such as
HTTP, RDF and URIs.
• to share information in a way that can be read
automatically by computers.
4
5. Linked Data & Semantic Web Technology
Stack and Requirements for Linked Data
5
an elemental syntax
for content structure
within documents
a simple language
for expressing data models,
which refer to objects ("resources")
and their relationships
a vocabulary for describing
properties and classes
of RDF-based resources
a protocol and query language
for semantic web data sources
a string of characters used to identify a name or a resource
6. Linked Data & Semantic Web Technology
Four Principles of Linked Data
1. Use URIs to identify things.
2. Use HTTP URIs so that these things can be referred
to and looked up ("dereferenced") by people and
user agents.
3. Provide useful information about the thing when its
URI is dereferenced, using standard formats such as
RDF/XML.
4. Include links to other, related URIs in the exposed
data to improve discovery of other related
information on the Web.
6
7. Linked Data & Semantic Web Technology
5 Star Linked Data
7
★ Available on the web (whatever format) but with an
open licence, to be Open Data
★★ Available as machine-readable structured data (e.g.
excel instead of image scan of a table)
★★★ as (2) plus non-proprietary format (e.g. CSV instead
of excel)
★★★★ All the above plus, Use open standards from W3C
(RDF and SPARQL) to identify things, so that people
can point at your stuff
★★★★★ All the above, plus: Link your data to other people’s
data to provide context
8. Linked Data & Semantic Web Technology
The Linking Open Data cloud diagram
8
9. Linked Data & Semantic Web Technology9
Media
User Generated Content
Publications
Government
Geographic
Cross-Domain
Life Sciences
Domain Number of datasets Triples (Out-)Links
Media 25 18,4185,2061 5044,0705
Geographic 31 61,4553,2484 3581,2328
Government 49 133,1500,9400 1934,3519
Publications 87 29,5072,0693 1,3992,5218
Cross-domain 41 41,8463,5715 6318,3065
Life Sciences 41 30,3633,6004 1,9184,4090
User-generated Content 20 1,3412,7413 344,9143
Total 295 316,3421,3770 5,0399,8829
10. Linked Data & Semantic Web Technology10
Linked Data for
Cross-Domain
11. Linked Data & Semantic Web Technology
DBPedia
• a project aiming to extract structured content
from the information created as part of the
Wikipedia project
• as of September 2011, more than 3.64 million
things, more than 6.5 million interlinks, and over 1
billion pieces of information (RDF triples)
11
13. Linked Data & Semantic Web Technology
The DBpedia Information Extraction Framework
• Source
– an abstraction over a source of Media Wiki pages
• WikiParser
– a parser which transforms an Media Wiki page source into an Abstract
Syntax Tree (AST)
• Extractor
– a mapping from a page node to a graph of statements about it
• Destination
– an abstraction over a destination of RDF statements
13
14. Linked Data & Semantic Web Technology
Freebase
• a large collaborative knowledge base consisting of
metadata composed mainly by its community
members
• as of May 2012, approximately 22 million topics
14
"Freebase is the bridge between the
bottom up vision of Web 2.0 collective
intelligence and the more structured
world of the semantic web."
16. Linked Data & Semantic Web Technology
OpenCyc
• Cyc
– an artificial intelligence project that attempts to assemble a
comprehensive ontology and knowledge base of everyday
common sense knowledge
• OpenCyc
– mainly taxonomic assertions, not the complex rules
available in Cyc
– 239,000 concepts, 2,093,000 facts, and 69,000
owl:sameAs links to external (non-Cyc) semantic data
– the RDF-compatible content extracted from OpenCyc using
the open source Texai
16
17. Linked Data & Semantic Web Technology17
Linked Geographic Data
18. Linked Data & Semantic Web Technology
GeoNames
• a geographical database available and accessible
through various web services, under a Creative
Commons attribution license
• over 10,000,000 geographical names
corresponding to over 7,500,000 unique features
18
20. Linked Data & Semantic Web Technology
LinkedGeoData
• an effort to add a spatial dimension to the Web of
Data / Semantic Web collected by the OpenStreetMap
project according to the Linked Data principles
20
Dataset #Triples
Ontology 8K
RelevantNodes 66Mio
RelevantWays 65Mio
RelevantWayNodes 74Mio
RelevantNodePositions 60Mio
DBpedia Interlinks 101K
GeoNames Interlinks 487K
22. Linked Data & Semantic Web Technology
etc.
• Linked Sensor Data
– an RDF dataset containing expressive descriptions of ~20,000 weather
stations in the United States
• U.S. Census
– Basic geographic data for the U.S., the states, counties, cities, ZCTAs,
and congressional districts.
– 1,016,219 triples in N3 format
22
<http://www.rdfabout.com/rdf/usgov/geo/us/sc/counties/hampton_county>
rdf:type usgovt:County ;
usgovt:fipsCountyCode "049" ;
usgovt:fipsStateCountyCode "45:049" ;
dc:title "Hampton County" ;
dcterms:isPartOf <http://www.rdfabout.com/rdf/usgov/geo/us/sc> ;
geo:lat 32.796299 ;
geo:long -81.131622 ;
census:population 21386 ;
census:households 8582 ;
census:landArea "1449823309 m^2" ;
census:waterArea "7369890 m^2" ;
census:details
<http://www.rdfabout.com/rdf/usgov/geo/us/sc/counties/hampton_county/censustables> .
<http://www.rdfabout.com/rdf/usgov/geo/us/sc>
dcterms:hasPart
<http://www.rdfabout.com/rdf/usgov/geo/us/sc/counties/hampton_county> .
23. Linked Data & Semantic Web Technology23
Linked Government Data
24. Linked Data & Semantic Web Technology
Open Government Data
• By “open”, “open” data is
free for anyone to use, re-
use and re-distribute.
• By “government data” we
mean data and information
produced or commissioned
by government or
government controlled
entities.
24
Open
GovData
Open
Data
Open
Gov
Data
Gov
Open
Gov
Data
25. Linked Data & Semantic Web Technology
United States
• Data.gov
– "The purpose of Data.gov is to increase public access to high value,
machine readable datasets generated by the Executive Branch of the
Federal Government.“
– "a repository for all the information the government collects"
– over 250,000 datasets
• Data-gov Wiki
– a project investigating open government datasets using semantic web
technologies
– to translate datasets into RDF, to get them linked to the linked data
cloud, and to develop interesting applications on linked government
data
– Dataset Statistics
• 417 RDFlized datasets and 6.46 billion RDF triples
• 35 Non-Data.gov Datasets and 0.9 billion more RDF triples
25
27. Linked Data & Semantic Web Technology
United Kingdom
• Data.gov.uk
– a UK Government project to make available non-personal
UK government data as open data
– over 9,000 datasets
– the use of Linked Data standards for flexible and easy reuse
– Dataset
• Environment, Finance, Legislation, Location, Reference, Statistics,
Transport, etc.
27
29. Linked Data & Semantic Web Technology
All around the world
29
Country Official? Rating Datasets
Sweden N ★★ few
New Zealand Y ★★ many
Ireland Y ★★★ few
Canada Y ★★★ many
United States Y ★★★★ many
Spain N ★★★★★ few
United Kingdom Y ★★★★★ many
Korea ? ? ?
30. Linked Data & Semantic Web Technology
Korea
• 공공데이터포털
– 국가가 보유하고 있는 다양한 공공정보를 국민에 개방
하여 이를 편리하고 손쉽게 활용할 수 있도록 지원
– 1,717 datasets and 242 Open APIs
– http://www.data.go.kr
• 공공DB 피디아
– 24 Datasets and 50,184 Resources
– http://lod.data.go.kr
30
31. Linked Data & Semantic Web Technology31
<rdf:RDF
xmlns:ns1="http://lod.data.go.kr/sample/schema#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:ns0="http://lod.data.go.kr/schema/dataset#" >
<rdf:Description rdf:about="http://lod.data.go.kr/sample/data/DS-0501">
<ns0:sampleResource rdf:resource="http://lod.data.go.kr/sample/data/DS-0501/SportFacility/SD10209PUEF"/>
</rdf:Description>
<rdf:Description rdf:about="http://lod.data.go.kr/sample/data/DS-0501/SportFacility/SD10209PUEF">
<ns0:prefLabel>자전거체험장</ns0:prefLabel>
<ns0:nodeLabel>자전거체험장</ns0:nodeLabel>
<ns1:phone>02-2204-7634</ns1:phone>
<ns1:name>자전거체험장</ns1:name>
<ns1:manageOrg>성동구도시관리공단</ns1:manageOrg>
<ns1:description>남녀노소 모두 편하게 이용할 수 있는 자전거체험장</ns1:description>
<ns1:address>서울특별시 성동구 마장동 802-2 마장2교 ~ 사근램프 사이</ns1:address>
<rdf:type rdf:resource="http://lod.data.go.kr/sample/schema#SportFacility"/>
</rdf:Description>
</rdf:RDF>
32. Linked Data & Semantic Web Technology
Seoul, Korea
• 서울 열린 데이터 광장
– 서울시의 공공정보를 민간에 공개하고 소통함으로써 공익성, 업무효
율성, 투명성을 높이고 시민의 자발적 참여로 새로운 서비스와 공공의
가치를 창출
– http://data.seoul.go.kr
• 서울 열린 데이터 광장 Linked Data Beta 서비스
– 행정동 기준 행정구역 및 문화시설과 문화재 13,600여종
– http://lod.seoul.go.kr
32
34. Linked Data & Semantic Web Technology
KDATA (Linked Data for Korea)
• W3C의 시맨틱 웹 표준 기술로 Linked Data를 구
현한 공개 기반 데이터
• http://kdata.kr
• http://www.li-st.com
34
35. Linked Data & Semantic Web Technology35
Domain Triples
국가코드 3,899
엔터테인먼트 44,278
행정구역 2,969
초중고등학교 126,469
교육청 1,130
대학교 2,833
사회적 기업 5,539
서울시 개방 화장실 47,340
야구선수 및 팀 228,872
지하철역 4,450
역사 5,392
행정데이터표준용어 109,101
한옥마을 1,155
공공 WiFi설치정보 1,671
KDATA 분류용어 808
전통시장 4,535
국립공원 10,605
문화재 80,156
공공체육시설 49,799
생물분류 3,256
문화시설 9,418
공원정보 및 프로그램 2,429
가격안정모범업소 16,212
가격안정모범업소 상품목록 14,300
공공시설물 인증제품 6,931
제설함 위치정보 39,218
야생동식물정보 115,099
야생동식물 출현정보 139,608
합계 1,077,472
37. Linked Data & Semantic Web Technology37
Linked
Media Data
38. Linked Data & Semantic Web Technology
MusicBrainz
• MusicBrainz
– a project that aims to create an open content music database
– information about 750,000 artists, 1 million releases, and
12 million recordings
• LinkedBrainz
– to help MusicBrainz publish its database as Linked Data
– mapped to concepts in the Music Ontology
38
39. Linked Data & Semantic Web Technology
Music Ontology
• main concepts and properties for describing music (i.e.
artists, albums, tracks, but also performances,
arrangements, etc.) on the Semantic Web
39
40. Linked Data & Semantic Web Technology
Linked Data on BBC
• Problems
– lot of data (broadcast between 1,000 and 1,500 programs a
day)
– hand-crafted, customized sites
– often not maintained
– often not persistent
• build upon Open Data Repositories
– such as MusicBrainz and Wikipedia
40
41. Linked Data & Semantic Web Technology41
Data from Wikipedia
Data from MusicBrainz
43. Linked Data & Semantic Web Technology
BBC Ontologies
• Programmes Ontology
– every programme brand, series and episode broadcast by the BBC
– the Programmes Ontology to expose data following the Linked Data
approach, enabling the interchange of programme information on the
Semantic Web
• Wildlife Ontology
– a simple vocabulary for describing biological species and related taxa
– terms for describing the names and ranking of taxa, as well as
providing support for describing their habitats, conservation status, and
behavioural characteristics, etc
• Curriculum Ontology
– a core data model for formally describing the national curricula across
the UK
– to provide a model of the national curricula across the UK
43
44. Linked Data & Semantic Web Technology
LinkedMDB
• publishing the first
open semantic web
database for movies,
including a large
number of interlinks
to several datasets
44
45. Linked Data & Semantic Web Technology45
Linked Data
for User Generated Content
46. Linked Data & Semantic Web Technology
flickr™ wrappr
• to extend DBpedia with RDF links to photos
posted on flickr
• to generate a collection of flickr photos for each of
the 1.95 million DBpedia concepts
46
48. Linked Data & Semantic Web Technology
Revyu.com
• a web site where you can review and rate things
48
49. Linked Data & Semantic Web Technology
Open Graph Protocol
• to integrate web pages into the facebook’s social graph
based on RDFa
49
<html xmlns:og="http://opengraphprotocol.org/schema/"
xmlns:fb="http://www.facebook.com/2008/fbml">
<head>
<meta property="og:url" content="http://www.imdb.com/title/tt1285016/" />
<meta property='og:image' content='http://ia.media-imdb.com/…140_.jpg'>
<meta property='og:type' content='movie' />
<meta property='fb:app_id' content='115109575169727' />
<meta property='og:title' content='The Social Network (2010)' />
<meta property='og:site_name' content='IMDb' />
...
50. Linked Data & Semantic Web Technology50
Linked Life Science Data
51. Linked Data & Semantic Web Technology
BIO2RDF
• a Biological
database using the
Semantic web
technologies to
provide interlinked
life science data
51
52. Linked Data & Semantic Web Technology
Linked Life Data
• a semantic data integration platform for the
biomedical domain
• Search and explore over RDF statements from
various sources including UniProt, PubMed,
EntrezGene and so forth
52
53. Linked Data & Semantic Web Technology53
Select drugs related to asthma that are linked to a molecular interaction
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX biopax2: <http://www.biopax.org/release/biopax-level2.owl#>
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX drugbank: <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/>
SELECT distinct ?fullname ?drugname ?indication
WHERE {
?physicalEntity skos:semanticRelation ?protein .
?protein uniprot:recommendedName ?name.
?name uniprot:fullName ?fullname .
?target skos:exactMatch ?protein .
?drug drugbank:target ?target.
?drug drugbank:genericName ?drugname.
?drug drugbank:indication ?indication.
filter(regex(?indication, "asthma", "i"))
}